v7 EditionDiscussion of Free and Open Source Text-to-Image/Video ModelsPrev: >>107001451https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/sd-scripts/tree/sd3https://github.com/derrian-distro/LoRA_Easy_Training_Scriptshttps://github.com/tdrussell/diffusion-pipe>WanXhttps://comfyanonymous.github.io/ComfyUI_examples/wan22/https://github.com/Wan-Video>Chromahttps://huggingface.co/lodestones/Chroma1-BaseTraining: https://rentry.org/mvu52t46>Neta Luminahttps://civitai.com/models/1790792?modelVersionId=2298660https://gumgum10.github.io/gumgum.github.io/https://huggingface.co/neta-art/Neta-Lumina>Illustrious1girl and Beyond: https://rentry.org/comfyui_guide_1girlTag Explorer: https://tagexplorer.github.io/>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkBakery: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/b/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debo
what is opinion of pony v7, krea video, lightx2v lora nwe
Blessed thread of frenship
>>107006468>not a single anime girl:(
>>107006514anime died with pony
>>107006514>but there's a clown girla massive upgrade, then.
>>107006326It is more associated with "pointy ears" than elf, as I can get it with "demon girl" or "fairy" as in pic related.>>107006523Wow those do look a lot like the earrings I keep getting, especially when not trying to prompt around it. But does Chroma know "Frieren"? I thought it had characters and styles removed.
>It isn't that half bad desuit makes sense that the guy who shills netadogshit would think that ponyv7 isn't half bad. horrendous shit taste.
my prompts are too strong for you, anon
>>107006544>But does Chroma know "Frieren"? I thought it had characters and styles removed.Yeah chroma knows many characters and styles, just need longer prompting than just name
anon, i tell you i am going to 1girl, and i want only your strongest prompts
(you) can't handle my strongest 1girl
>>107006585masterpiece, loli, photorealistic
>>107005962Same reason Princess Peach always has an Iron Man core embedded in her chest no matter what clothes she's wearing, if almost 100% of the examples of a given object have that particular feature then as far as that model is concerned it is an inherent aspect of that object. It's not like these models make any inherent distinction between clothing and body parts.
so did astralite comment about why v7 shat the bed so hard or is he pivoting straight to another grift
>>107006647Pony v6 was simply a fluke.When he announced v7 and what was going on with it that already was a clear sign that the model is going to be a failure.
>>107006545>singular
Anybody who defends NetaLumina or Ponyv7 is deranged and cannot be trusted.
>>107006647The latter
>>107006629Well I know there are plenty of elf images that don't have earrings like that. And other models like the SDXL-based ones can make earring-less elves easily enough. It might be that I am pass in a cartoon style image in the workflow, that does not itself have earrings but this nudges the model as well. Perhaps it can realistic pointy earsor other styles just fine without sticking earrings on. Or perhaps if I fed it a empty latent image it wouldn't have the problem. Haven't tested it. Just noticed it was st range. At any rate, I have added frieren to negatives (along with earrings) but I don't start losing the earrings until the cfg gets all the way up to around 5.0 at which point it is looking rather fried so I guess it is not going to fix the problem.
>>107006704> It might be that I am passing in a cartoon style image*and> Perhaps it can do realistic pointy ears or other styles just fine*
>>107006704bretty kino gen
>>107006647>pivoting straight to another griftthis, as to why anyone would support him after v7, the amount of retards far outweigh people with common sense.
>>107006544The annoying thing is I can get the earrings to go away by turning down cfg below the correct value (1), but then the image fails to denoise correctly.
>>107006544I finally succeeded. I was uhh only trying to accomplish no earrings, nothing else in particular
>>107006793prompt?
>>107006793Nice job and nice tits. Did you do anything in particular to make them go away or was it just a lucky gen?>>107006731Thanks, you can get some weird results when you use wd14 to interrogate an image, then use its prompt to make a new picture.
>>107006829>boobas, bazoongas, over the shoulder boulder holdersone can only imagine
No finetune is going to fix that trash pony v7 model. Pic unrelated
>>107006840Turning cfg down to ~0.85 had a lot to do with it, but largely just luck>>107006856In this case it was "tall and voluptuous" plus "[her] big flabby sagging breasts are tightly bound in her fraying smock and squeezed together for a ton of cleavage"I found this was um necessary to make the earrings go away
Local models should unironically be banned
>nigbobumping this thread
is the info of style clustering ANY FUCKING WHERE for ponyv7? I searched in the HF and civitai page, NOT a single fucking link to check these fucking clusters.Yes I know that pony is shit, but I still wanted to experiment a bit with this new toy.FML
>>107006947and what about the comic style name?
>>107006840Actually I don't know why I said it was only luck, I did a lot otherwise to try to make it happen (luck was still important of course)I added no-makeup hashtags, removed anything signifying richness or ornateness, used a lot of words like "plain" "natural" "rustic" "barefaced" etc., tried to force a feral/pauper/tattered appearance, tried to get a retro pulp fantasy aesthetic to avoid modern "character design" slop, described the character as boyish and avoided things suggesting a stately older elf, etc.But all those things failed until I turned cfg down a little bit. Now some of the gens have earrings and some don't.>>107007052"A blurry grainy scan of an old pulp fantasy illustration from 1957." I'm sure there's a lot of room for improvement there.
res2s, beta57, 20 steps, 4 cfg.man these fucking handsmy maximum permitted gen time is around 100s, this gen took 110s.
>>107007058>I'm sure there's a lot of room for improvement there.E.g., I am now going to try "pulp adventure" instead of fantasy because the word fantasy is too closely associated with modern slop
how do i gen girlfailures?
>>107006975>t. /de3/ vramlet jelly of /ldg/ chads' epic booba genslmao
>>107007076adding this negative:deformed hands, bad anatomy, extra limbs, poorly drawn hands, poorly drawn face, mutation, deformed, extra eyes, extra arms, extra legs, malformed limbs, fused fingers, too many fingers, long neck, cross‑eyed, bad proportions, missing arms, missing legs, extra digit, fewer digitsseems to have fixed some of the issues actually. I prefered the older image overall composition and tone tho.
Can't make Minthy/Rouwei-T5Gemma-adapter_v0.2 work. Provided workflow requires full Gemma so I add a gguf loader node, but then I get a picrel error. LLM SDXL nodepack updooted to 3.0.1.
Pony v7 q8 , fp32 vae and clip,official comfyui workflow, 30 stepsstress teststyle_cluster_1610, score_9, Detailed photograph RAW of seven smiling friends of different races that are at a nightclub concert with dim lighting that is shining on their faces, behind them is a crowd of people dancing while fighting with large swords, everyone is holding a sword in their left hand and an intricate beer glass with differently colored beer in the right hand. Far behind them above the DJ there is a sign which has "Minimum drinKing age 021!" written on it in stylized cursive letters.
>>107007058I tried some of those but in my case I got the earrings still and even lowering the denoising to 0.5 didn't get rid of it. Interestingly, I don't have the problem with the non flash chrome models, such as Chroma-DC-2K-T2-SL4-bf16
>>107007022that v6 tagmine spreadsheet wasnt created by the author so
>>107007243Different seed
>>107007256since I've made that post I've read on the colab the styles groups go from 1 to 2048
>>107007257Different seed and without "style_cluster_1610" in the prompt
>>107007243>>107007257>>107007269takes me back to good ol' SD1.4 days
the style_cluster thing for sure is cumbersome, but atleast it has artists in it in some form. I don't mind if I have to look up a table. wish we had that in chroma instead of fucking NOTHING AT ALL.
>>107007269Different seed and without "style_cluster_1610, score_9" in the prompt
>>107007297i assume the clusters do not allow for prompting individual artists which is a huge fucking kick in the nuts for no reason other than muh morals
>>107007243>>107007257Goddamn, the sd1.5 and chroma merge lookin fire
Stop it with the Pony posts that's like seeing gore
>>107007307what the fuck? I assumed that was the whole point of clusters. Maybe I should actually read the docs.
>>107007338You must have knowledge of the turd to appreciate the beauty of better models like Pixart Sigma.
>>107007297Chroma is a base model. Why should a base model have ridiculous style tags?Chroma as can be tuned by anyone for any purpose. That's what makes it special.
>>107007299score_9, medieval magical intricate and detailed world, princess taking a selfie in a pink ball dress, long ginger hair, pale skin, huge breasts, smile
>>10700736130 steps is too low, try 40
>>107007361Same seed without "score_9"
randomizing cluster styles now
ponyv7 is atrociously bad what the actual fuck
>>107007361>>107007369horrible
>>107007341the author has a strange heretic perversion to releasing models to the public which can be prompted with artist names. his own secret versions however do not have this problem what a faggot
>>107007375same seed
>>107007385>perversion *aversion
>>107007388
>>107007405this model is so fucking bad.gradually losing ALL hope
>>107007243Netalumina v3.5, without style and scorelol
>>107007385if he released the artist ids, I bet people would even overlook the massive flaws
>>107007353Wasn't it supposed to have them but they fucked up the captioning or something? I don't know, just something I read. You think we're gonna get a chroma finetune? As a dumb user, I don't really care what type the model is. All I know is chroma with artist styles would be sweet.
>>107007443it's not something you can mess up by mistake
>>107007369>>10700736540 steps score_9, Attractive medieval princess taking a selfie in a pink ball dress, long ginger hair, pale skin, large breasts, smile. She is at the top of a tall stone tower, with a large window behind her that overlooks a huge and crowded medieval city at sunrise.
https://xcancel.com/JustinLin610/status/1982052327180918888#m>Alibaba's CEO is asking himself why Open Source doesn't have udio at homebe the change you want to see, make Qwen Audio or something lol
is this stupid faggot going to post every single gen he makes? fuck off already
https://github.com/fal-ai/flashpackThen do it Comfy, I'd like to load my models faster, especially with Wan 2.2 when this model is all about unloading/reloading between the HIGH and the LOW model
>>107007499>we r working on it and it won't be far. i am just curious about the statusWhy talk? Talk is cheap. Give me something that is Udio tier, Apache 2 licensed or I sleep. We don't want another Songbloom or ACE Step.
>>107007244Yeah flash is harder, which is partly what makes it fun.As frustrating as models like that can be, fighting against them feels more like a game. Whereas with something more broad like Chroma base it's hard to know what you can do other than wait and get lucky
>>107007538this, they can definitely do it, do it chinks!
>>107007427
>>107007467Same except 1536x1536, which takes ~7s per step on a 3090, making this take 4-5 minutes per image. Almost the same time it takes to generate a full coherent 5s 32fps video today with Wan 2.2 lightx2v.Unless the model will somehow be saved with "proper" prompting to take out the style knowledge which will also somehow fix the detail gore and almost make it into a completely new and better model too, it's sadly DOA.
butiful
>>107007572Different seed.
so what's the best wan 2.2 lora combo with the new loras?
>>107007590New HIGH:https://huggingface.co/Kijai/WanVideo_comfy/blob/main/LoRAs/Wan22_Lightx2v/Wan_2_2_I2V_A14B_HIGH_lightx2v_MoE_distill_lora_rank_64_bf16.safetensorsOld LOW:https://huggingface.co/Kijai/WanVideo_comfy/blob/main/LoRAs/Wan22-Lightning/old/Wan2.2-Lightning_I2V-A14B-4steps-lora_LOW_fp16.safetensors4 steps, cfg 1, unipc
>>107005507Nice, did anyone try these SVI loras with wan2.2? What weight did you use? How did you make it work for longer videos?
>>1070075981 strength for both?
>>107007076>res2sres3m should be superior and faster too
>>107007603Yes
>>107007499good if he makes something
My friend's cousin works for OpenAI and he says they have a secret internal model not ready for public release yet, it's so powerful that you can type in your street address and it will show you pictures of your house, you can even prompt inside and you'll see yourself
>>107007619i will finally know what my oneitis' vagina looks like
>sky, up in the clouds, heaven, pearly gates, the kingdom of heaven
>>107007536Does that also apply to gguf? Or files in general?
>>107007619My uncle works at Nintendo and he said the next Zelda is gonna be fully dynamically generated by a next-gen GPT model that runs on a VR brain implant
>>107007590>lora combothe one with the stuff you want in your video
>>107007536Model load is the most frustrating thing about comfyui...>wan2.1>takes minutes at the sampler then starts genning or clip keeps offloading then loads forever or memory leaks after 5 gens where I have to force close comfy>all-in-slop>constantly offloads the entire fucking model and have to wait another 10 minutes for it to all load again>wan 2.2>while the fastest and least pain in the ass, constant and increasing pausing in between high and low generation
>>107007647it would still be better than nu-Open World slop zelda
>>107007666how could you do this to me?>>107007665
>>107007526Discussion of free and open source models, faggot
>>107007643>ggufNo.>>107007643>files in generalNo, it's for safetensor files and you need to convert them to a flashpack format.
>>107007565Local has a lot of work to do.Audio inpainting. Audio upscaling/etc... The bar is literally just a decent model that can do it all.
>>107007715>you need to convert them to a flashpack format.Comfy said you don't need to convert them, just use their methods on safetensors (look at pircel) >>107007536
>>107007715Can't wait to load my sdxl models super fast!
>>107007718udio is the sota on this, but it has the same limitations as any other closed models : - you can't ask it to make music "in the style of" (make me music like michael jackson "man in the mirror" -> moderation backend and blocked), though now you can send music to it, but it's not the same.- you can't train it, make specialized "loras".- anything sexual is moderated (think a sensual song).
>>107007724Would be nice but I'm not sure anyone would work on that.It can shave off quite a lot of time with complex multistage samplers setups.
>>107007754>- anything sexual is moderated (think a sensual song).that's why it's the most hated audio software of female rappershttps://www.youtube.com/watch?v=1Gt9TTjAMvw
>>107007777Sure, but that's not sensual, that's just crude and vulgar, never been into these songs. Zero eroticism.
>>107007777yeah I guess kek
Are they ever gonna make shorter GPUs? I can't fit anything longer than 300mm in my midtower so I'm stuck with 10 VRAM
So does anyone have replacement recommendations to this?https://github.com/1038lab/ComfyUI-JoyCaptionIt refuses to use CUDA and does inference on the CPU. Taking a whole minute.
>>107007536kek
>>107007849any VLM model will use cuda, if you're gonna figure out how to make it work for one then it may as well be joycap
>>107007687he's talking to himself and spamming the same 1girl, kill yourself, no discussion is being had
>>107007843With the way things are going, I doubt it.Well they will, but you'll get the less powerful stuff.
if I gen a 10s (161 frames) video on wan, is there a way to prompt it to do one thing then another without the second taking over immediatly?"she types on a computer for 3 seconds, then she gets up and walks away"
>>107007890what if the second thing is waiting, then the actual second thing becomes the third thing
>>107007849the joycaption repo has a gradio interface right?
>>107007843no, we reached the limits on the size of a transistor, so the only way for them to get more powerful gpus is to make them bigger, the gold rush is over
Does anyone here know about superesolution models?I want to train a model with my own dataset, because my dataset shares the same colours, patterns and style, but it has low resolution images, so I want to upscale them as faithfully as possible.Please somebody help me
>>107007882the only options for me are the ada 48GB which is not worth it and theres one 5070 ti thats like 285mm but also not worth it. I guess I'll have to wait because I happen to think the 5090 is also a bad investment
>>107007879I have no idea what you are trying to say.It doesn't load anything to VRAM, has no GPU usage, CPU at 100% and is slow.Maybe some other bug or whatever but it's not working as intended.I asked for alternatives for joy caption inference.>>107007914If you are referring to hugging face one that has usage limits.I am trying to mass tag images for lora training.That's why I am trying to set up local.
>>107007920just play with seedvr2 to upscale them
>>107007927>If you are referring to hugging face one that has usage limits.no, I mean the github repo
>>107007754I do recall people making stuff in the same style just by inputting lyrics back when that was allowed.Look at thishttps://www.404media.co/listen-to-the-ai-generated-ripoff-songs-that-got-udio-and-suno-sued/Obviously there's more, including a popular onehttps://www.udio.com/songs/nDKNwPUB6GrMhEfvM6v2u1Though it's more like a cover
>>107007904ok worth a try, thanks anon
>>107007934Yeah, this is where local would shine.
>>107007933Gradio interfaces are typically hosted at hf and github repo links to hf for online demo as well.Unless you are referring to something else.
>>107007967a1111 and all it's forks are using gradio locally. you are being retarded
If anons here use torch nightly wheels, when I updated from the one in the beginning of October to the 22nd one (2.10.0.dev20251022+cu130), suddenly sage attention broke completely, it looks like an issue where everything defaulted to CPU instead of CUDA, making the sampler throw an error.Going back to the 1002 version made it work fine again.
>>107007979Oh you mean this?https://github.com/fpgaminer/joycaption/tree/main/gradio-appI guess I can try that.When you said it like that I expected some sort of link to somewhere.
>>107008007I am too lazy to look for you but you figured it out. gold star for you
>monthly pytorch mismatch between custom nodes that requires a reinstallhere we go
>50s/it WAN with random crashes on ROCM 7>100s/it WAN with guaranteed stability on ROCM 6suffering
>>107008079>he broughtered'ed AMDwhy??
>>107008096Because fuck nvidia. Also gaming under Linux is less of a hassle with AMD.It's fine, I don't have a fried attention span. I can cope.
>>107008111>It's fine, I don't have a fried attention span. I can cope.I'm not sure you cope this well, you literally complained about the lack of speed here lool
>>107007598thank you anon the pajeet doesnt deserve your grace
>>107007994I had the same issue, if it's this : https://github.com/pytorch/pytorch/issues/166104then it's "working as expected" apparently, so it means we need to get sage attention team to update or be stuck with early october torch
>>107008111Why did you make a post kvetching about speed and stability if you were going to immediately get defensive and coping lol.
>>107007931That's not what I need, I want to train a resolution model with my own database.The idea is to have pairings of images and teach the model what pairings are a correct upscaling.
>>107007499>make Qwen Audio or something lolthey will do it, but for api, kek
Does ComfyUI patch lora weights into the model by default? Doesn't seem so, why isn't this the case? Wouldn't it help a lot for vram size for multiple loras? Can it be enabled somehow?
>>107007849skill issue literally. installa a llama-cpp-python version that has CUDA compiled in it, otherwise manually build the wheel using the correct compile flags (literally contained in this node repo through a script):https://github.com/1038lab/ComfyUI-JoyCaption/blob/main/llama_cpp_install/llama_cpp_install.pyyou're fucking retarded and should kys unironically retarded faggot brown
>you still have to wait a few minutes to OOM on the first comfy video gen before the second allocates properly and works from then onAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>>107008334stop being poor and buy a proper video card faggot
>>107008242>The idea is to have pairings of images and teach the model what pairings are a correct upscaling.Elaborate
>>107008364Write proper memory alloc code faggot
>>107008376>24gb>not vramlet cope tierstop being poor jamal, youre embarassing yourself
>>107008393Write proper memory alloc code faggot
i..I'M GOING TO OOM, AHHHHHHHH
>>107008393>0% utilizationembarassing indeed
>>107008393>all that compute for 1girl, standing, ai-generated
>>107007599didn't work at 5 strength for me so i gave up because wan 2.1 is shit and wan 2.2 5b will be even worse because wan 5b has no speed up lora's as far as i know, second wan2.2 vae is fucking slow and it oom sometime unless --reserve-vram 1.0 is used, oh an 5b produces shit t resolutions below 1280 x 704 or what ever. total waste of time unless they train it on wan 2.2 14b 720p or heck even 480p, what's more is we don't even know if we are using it correctly. Yeah i think it will be forgotten about and never heard of again.
>>107007599well reading that post maybe it will be fixed and working, but i won't hold my breath for it because its already limited in what it can actually do with wan 2.1 and its limited to 832 x 480 or it is slow as fuck.
>>107008393>600 watts idlewaow
>>107008425im actually running REAL LLMs in here, sadly at q8 quant (GLM 4.6, 400gb~), fully in VRAM unlike you poors who make do with q2ks poverty tier quants and offload to CPU anyway lmao. I use the spare 200gb~ to load FULL precision video/audio/image models to delive a superior and immersive chat experience, with SOTA imagen/textgen/audiogen/voicegen all happening automatically as I rp with my waifus.>>107008448these are rented in a datacenter, no way I can run this shit at home. Also memed my company into doing it, bunch of clueless retard, im feeding them shit from bedrock itself while using the real cluster for myself.
>>107008261you can minimize the vram use by merging, but I don't think it's possible to do on the fly so we are stuck with model + lora sizes
>>107008475enjoy it while it lasts bruh
>>107008475maybe if you put in as much effort into real life you could have a real girl friend?
Every now and again I dream about picking up a data center GPU for fire sale prices after the crash happens and then I remember the insane power draw and the fact that they apparently use total loss water cooling.
>>107008334I don't have the issue, what video gen resolution and how many frames? Can you share your wf?
>>107008475>Also memed my company into doing it, bunch of clueless retard, im feeding them shit from bedrock itself while using the real cluster for myself.They don't even see they pay twice?
>>107008508>after the crash happensKeep dreaming
>>107008509Maximum resolution and frames, Q8https://civitai.com/models/1818841/wan-22-workflow-t2v-i2v-t2i-kijai-wrapper
>>107008430>total waste of time unless they train it on wan 2.2 14b 720p or heck even 480p, what's more is we don't even know if we are using it correctly. Yeah i think it will be forgotten about and never heard of again.They released the training code so maybe some rich anon will do it.
>>107008430>>107008443>>107008529OK I guess we'll wait then.
>>107008519>believing larpers
>>107008524>Maximum resolution and frames720p and 81 frames? Do you blockswap? Try block swapping 5 blocks for example, see if it works. As long as you have enough ram, the difference in speed is minimal.
>>107008545wtf bfl, this was almost unsafe
>>107008541sorry for the black pill bro but i've done some testing the base wan 2.2 5b model today. it don't work with lightx speed lora's so its actually slower than just using wan 2.2 high and low on my machine. The vae is a pain in the ass as well, its really slow to decode on my rtx 3060. it might be alright some people but the quality is shit if using resolutions below 1280 x 720so yeah its slow and i really don't know why people with lower vram use it when they could just use Q4 gguf high and low models and get much better quality and faster due to speed lora's. So I'm not gonna bother when they release the wan2.2 5B version. tl;dr is was DOA
>>107008564we definitely keep it safe around here>https://files.catbox.moe/aasfd1.png
>>107008597wish it was good at making filled used condoms
wan gen : [Subject Description] + [Scene Description] + [Motion Description] + [Aesthetic Control] + [Stylization]Example: "A young woman in a red dress (subject), standing in a bustling neon-lit city street at night (scene), walks forward then stops to look up at the rain, slow motion tracking shot (motion), cinematic lighting, moody atmosphere (aesthetic), film noir style (stylization)"
>>107007598forgot to set size for the vid but it still worked pretty good.the anime girl runs to the left out the door and closes it.ty anon
>>107008475How many wan frames can you load and how fast is your wan gens?
>>107008614I just wish wan could handle longer prompts for more actions. It hardly ever works for me even when using context windows and 161+ frames at 81 frame chunks with overlap.
>>107008079My ROCM 7 has been rock solid after upgrading to the official stuff. ComfyUI amd memory management update also swagged my shit out.50s/it seems pretty good for wan with amd, which card you got?
>>107007599https://github.com/kijai/ComfyUI-WanVideoWrapper/issues/1519#issuecomment-3440759925I tried telling anons this when it first dropped, it can't be as easy as just 1 lastframe because base wan does not have any concept of the previous genned video. It treats each as a new video, so to me it looks more like how wananimate works in taking 5 previous frames to continue the motion since 1 frame isn't enough to continue motion with.People were making videos with it but those would have been placebo gens.
>>1070086809070 xt. Not sure what you mean by "official stuff", I'm using the nightly URL from the official Pytorch page. I haven't upgraded Comfy in a while either.
>>107008508>and the fact that they apparently use total loss water coolingis this one of those cases of turbo-niggering the environment to save $0.01?
>>107008684How to even feed 5 frames to next video? I don't think it's possible right now, last time I asked it was only 1 frame starting the next video.So on top of their lora, we need some node to "inject" 5 frames instead of 1 as latent into a sampler.Basically picrel but 5 frames.
>>107008710Gigawatt in equals gigawatt out. All that energy has to eventually turn into heat, and there ain't an air conditioning system on this earth that can dissipate 1 GW. That being said, I don't know the specifics, only the basic laws of physics.
>>107008719feed first 5 frames 0 denoise then the rest on the next sampler
>>107008710it's water in water out, they don't inject wastes from the Gange in it anon
>>107008699you should upgrade your comfy to at least 0.3.65, very good amd improvements https://github.com/ROCm/TheRockI think it's these but I see you're on linux so nevermind. Official windows support I meant.
>>107008726Images in wan aren't processed in series, all the frames are processed at the same time, it's just that the first one is "fixed" in latent, rest is sent as noise.What we need is to "fix" the 5 first instead.
>>107007598willing to give this a try, i'm assuming scheduler simple?
>>107008752Yes
>>107008604train a lora man, it takes like two hours
>>107008726won't work, this is shit we tried months ago I'm sure, it just ignores them. and how you even gonna do this? de-noise in advanced sampler the first 5? it won't matter because wan does all frames at once as a new video. it won't magically know there are 5 frames already done. KJ was wrong to assume it needed no codes changes, we gonna need a new node.
>>107008748>>107008788vibe code it
>>107008619the new 2.2 MoE high lora works so much better for motion/fluidity. ty kijai for fixing it
>>107008769thanks, but why KJ lora's? Is there something different about them? Or are just extracted from their model?
>>107008792I tested different loras and this combination had the best result
>>107008788>we gonna need a new node.Yep, and even better if we do import latent corresponding to the last 5 images instead of degraded images through vae decode, but I'm not even sure that's possible.
>>107008846it started so well, but we didn't get glorious cleavage bouncing
>>107008792NTA, but the LoRAs released by Lightx2v were extracted wrong initially so you had to use KJ extracted LoRAs. They Lightx2v ones were then re-uploaded with correctly working versions at some point. KJ didn't extract the newest I2V Lightx2v LoRAs because they actually did it right on the first try this time. https://huggingface.co/lightx2v/Wan2.2-Distill-Loras/tree/main
>>107008797well its no where near as good as my settings and setup, its fucking blurry at 720 x 720
>>107008853lol its still shit i will prove it...
>>107008791
>>107008878im using q8 2.2 with https://civitai.com/models/1818841/wan-22-workflow-t2v-i2v-t2i-kijai-wrapper
Ok needed to figure out how to integrate CUDA Toolkit into my docker setup but Joy Caption is working now with GPU acceleration, 4 times faster.You are the thread schizo who regularly shits it up. As such I won't give a (You), but credit where due thank you, bastard.
>>107008900>>107008890>>107008878wait a minute i forgot to change the god damn steps start and end... This has probably been why its so blurry lol. I'll check it again.
>>107008913I tried those LoRAs as well and I got some weird hyperspace zoom effect, but I'm just using whatever quants of WAN22 come with Comfy's default workflow.
I was able to run the LongCat Video demo. This is the stock prompt:>prompt = "In a realistic photography style, a white boy around seven or eight years old sits on a park bench, wearing a light blue T-shirt, denim shorts, and white sneakers. He holds an ice cream cone with vanilla and chocolate flavors, and beside him is a medium-sized golden Labrador. Smiling, the boy offers the ice cream to the dog, who eagerly licks it with its tongue. The sun is shining brightly, and the background features a green lawn and several tall trees, creating a warm and loving scene.">negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"Generation is in 3 stages (initial, distilled, refined) that each output a video. It took 24 minutes to generate this and 74gb of vram (FP32). Going to try the long video (1min) generation next and expecting it to take hours.
>>107008927I'm a quality improving study tonight, so we will see which is best.
>>107008959That is good quality, how long did it take?
>>107008966>It took 24 minutes to generate this
>>107008966nvm i didn't read full post 24 mins 74GB vram I'm gonna cry :(
>>107008959>Generation is in 3 stages (initial, distilled, refined)ok that might mean it could run on smaller cards?
>>107006468can you guys make me some realistic apustajas?
>>107008984https://civitai.com/models/175781/apu-apustaja-model-sd-xlhttps://civitai.com/models/679189/apu-apustaja
>>107008959thanks for doing this anon, i ran oom on my 3090 multiple times before giving up
>>107008992scully?
>>107008983It already uses 55gb on the first pass, but keep in mind it's at FP32. At Q8 the 74gb peak should be down to 18.5, so it would work on a 24gb card.The first two passes aren't really meant to be used as-is, anyway. First stage here.
Kind of annoying how the lightx2v ruins videos with end frames. It distorts right at the ending but without the lora it works fine
>>107009003allison brie you fucking cretin
>>107009009>>107008983Second stage (distill)
>>107009009>letting your dog lick chocolate syrupfucking retarded kid
>>107009014Everyone complaining about lightx2v color distortion, flickering, or blurryness or anything else is a workflow issue, i never had any of those with >>107008900>>107007598
>>107009016she looked better younger
>>107009030I don't have them either with latest version, pretty nice.
>>107009009the first stage looks alright desu.>At Q8 the 74gb peak should be down to 18.5, so it would work on a 24gb card.Yeah I think we will be eating good again soon.
>>107009038they all do
Running the LongCat 1min demo. It generates 11 segments and chains them together. I'm guessing it'll take about 4.5 hours if it doesn't fail. Here's the initial step of the first 11th.>prompt = "realistic filming style, a person wearing a dark helmet, a deep-colored jacket, blue jeans, and bright yellow shoes rides a skateboard along a winding mountain road. The skateboarder starts in a standing position, then gradually lowers into a crouch, extending one hand to touch the road surface while maintaining a low center of gravity to navigate a sharp curve. After completing the turn, the skateboarder rises back to a standing position and continues gliding forward. The background features lush green hills flanking both sides of the road, with distant snow-capped mountain peaks rising against a clear, bright blue sky. The camera follows closely from behind, smoothly tracking the skateboarder’s movements and capturing the dynamic scenery along the route. The scene is shot in natural daylight, highlighting the vivid outdoor environment and the skateboarder’s fluid actions.">negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
captchas are failing, 4chan is going down
>>107009152[audience] wooooOOO
>>107008892
>>1070089596 second gen took 24 mins to do? that's rough
>>107009180Yeah but so is wan with out speed loras.now imagine what this thing could do in future.
>>107009243what model?
the man in the blue shirt turns and fires a blue energy beam at the plane, which explodes into fire and smoke.live action dragonball. used unipc instead of euler this time.
>>107007598I'm guessing you mean 8 total steps 4/4 ? because with only 4 total steps its blurry, I'm now trying with 8 total steps with your settings.
>>107009301chroma
>>107009316No it's 4 steps total, unipc, 720x1280, 81 frames, q8 wan, umt5 bf16
>>107009330catbox?
>>107009330>ChromaEw
>>107009333its not enough for 720 x 720 that's for sure, its looking a lot better using 8 steps total but I am using q4, umt5 16fp
>>107009364Wan was trained primarily for 1280x720 and 720x1280, and Q4 is too low even for full res anyway
>>107009378>low even for full res anywayi think not :)