Discussion of Free and Open Source Text-to-Image/Video ModelsPrev: >>107710110https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/musubi-tunerhttps://github.com/kohya-ss/sd-scriptshttps://github.com/tdrussell/diffusion-pipe>Z Image Turbohttps://huggingface.co/Tongyi-MAI/Z-Image-Turbo>WanXhttps://github.com/Wan-Video/Wan2.2https://comfyanonymous.github.io/ComfyUI_examples/wan22/>NetaYumehttps://civitai.com/models/1790792?modelVersionId=2485296https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>Illustrioushttps://rentry.org/comfyui_guide_1girlhttps://tagexplorer.github.io/>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe|https://litterbox.catbox.moe/GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkBakery: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/r/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg
is the spam filter down? can we have a thread at last?
What light lora combo are people using with Wan 2.2 SVI? Everything I try gives me the dreaded slow motion effect, even a combo that works flawlessly without SVI.Also, haven't posted in awhile and what the FUCK are these new aids tier captchas?
>>107718348>What light lora combo are people using with Wan 2.2 SVI?frame editing in actual software
Only wan2.2 low noise can output this kind of anatomy perfection... and it's a video model FFS u_u
HOLY BLURRY OVER EXPOSED SLOPPY DOPPY
>>107718306thanks for the cozy bread anon
>>107718390what loras tho? I can get ok-ish stuff with the wan remix workflow, but nowhere near that detailed
which one of these is the bitcoin miner
>>107718446comfyui
>>107718451this
>>107718446Check your RAM and disk usage when genning, especially during the initial load, it's probably spilling onto your pagefile. Beware that it's not only painfully slow, it also rapes your SSD with constant writes
>>107718461peak comfyui gen
>>107718461>it also rapes your SSD with constant writesgreat software comfy. what a great feature
>lol just ask on discord
>>107718513the guy that writes that shit doesn't even know how to use comfy in the first place. the whole org is a shit show
>>107718471That's just how memory management in Bimbows works, it'll start offloading to pagefile long before you hit max memory limit. The reason why Comfy is a hack is he didn't bother with implementing partial splits between available CUDA devices, especially if you have multiple GPUs available. It'll just try to load the whole thing into RAM and then move everything (or most of it at least) into VRAM. Or maybe it's a pytorch limitation, idk I'm not too tech savvy
>>107718571what's up with comfy users? why do they generate kids so often?
>>107717857Hey i was at work thanks for answering, try to look into wan animate than
>>107718390>le flux chindisgusting
this place is dead as fuck, hoooly.anyway, qwen just curb stomped slam dunked z-img it seems. will find out though, ggoof still downloading.https://x.com/Alibaba_Qwen/status/2006294325240668255
>>107718988>leaving to celebrate new year in 10minAt least it's not base.
>>107718988DFloat11 when?
>>107718988>curb stompedThey're on the same team lmao>>107719022>DFloat11 when?If you have the memory to run it, you almost certainly have the memory to quantise it to DF11 yourself
Low key just waiting for LTX 2 next month.
>>107719076>They're on the same teamthere are no teams when it's chinese bloodsports!
okay yeah this kinda blows compared to z-img. and that's a TURBO tarded not even fully trained model. that said, it's nice. better than flux at least.
>>107718988> woman lying on grassdont bother downloading
>>107719143i mean it could just be the lightning lora isn't fully compatiblebut it ain't lookin' good chief.https://huggingface.co/lightx2v/Qwen-Image-Lightning/tree/main
>>107719022>At batch size = 1, inference is approximately 2× slower than the original BF16 model, but the performance gap narrows significantly with larger batches.DFloat11 is snake oil. If you have low vram you're trading 30% reduction in weight size for 2x slower inference, offloading would be faster. If you have the vram to run larger batches you don't need a 30% reduction in weight size. There's no valid use case.
>meanwhile a month ago in 4 steps on z-img turbo>>107719162also this, in general, just offload. i've been copemaxxing with low q quants or fp8 when this entire time i could've just offloaded a few niggerbytes on a full q8 model or fp16 if it's reasonably sized. i'm running the new qwen at q8.
that DoF looks horrific. oil pastel turbocharged
>>107719152>>107719143 was without lightning lorapicrel with
>>107719183yikes buddy. well time to free up that 20 gigs. (and the lightning loras)
>>107719076>>107719162I was only pretending to be retarded! Q8/BF16 fits in my 4090 just fine. 2512 froze around 30 steps though and went from ~3.4it/s to 44+ for the rest of the 40 steps. Took 568s for this image. Way slower than I remember the edit models being.
>>107718390catbox?
>>107719079>just waiting for LTX 2 next month.Same. I tried wan 2.6 on their website wan.video (they're kinda trying to be like sora) and it was disappointing. LTX2 is also looking a little disappointing but once something is out people will finetune it to be better >>107719162>>107719170The valid use case for DF11 is large models run at a large batch size by corps because the inference hit is negated at a higher batch size iirc So if you're serving an LLM, or you're Google serving a massive autoregressive edit model it makes sense Most people should use q8_0 for most tasks on most models (vae excluded, text encoder preferably excluded). In fact, even if you can fit bf16 in memory you should still use the q8_0 if you value your time
>>107719311>because the inference hit is negated at a higher batch size iircaaahh i see.
>>107719183>mfw i lay on the grass and realize the back of my head is now full of shit stains
>>107719079i have mixed feeling toward ltx-2 now that multiple Chinese models have come out this month with combined audio generation with 10-15 second prompt adherence. Even seedance 1.5 has video+audio combined generation and it's uncensored. LTX-2 is very pricey when it comes to using it online with light trick's own credit system and it's censored with the ken/barbie doll bodies. hopefully LTX-2 isn't too vram hungry and will be able to run on 24gb-32gb vram. This is the make it or break it for light tricks to get their shit together. The chinks are competing for the 3rd and 4th place in the ai video generation market with kling2.6, minimax hailuo2.3, wan2.6 and seedance 1.5. There is also runway, luma, vidu and pika in the competition for the ai video generation space. LTX-2 needs to be really good from the start to overthrow the popular dominance that wan 2.2 has over the open source scene. here are some ltx 2 video if anyone's curious https://files.catbox.moe/gz8lao.mp4https://files.catbox.moe/hn9uyw.mp4https://files.catbox.moe/7vweqv.mp4https://files.catbox.moe/clx4i3.mp4https://files.catbox.moe/abb430.mp4here are some seedance 1.5 video(closed source)https://files.catbox.moe/b71h5f.mp4https://files.catbox.moe/adkp98.mp4https://files.catbox.moe/adkp98.mp4
>>107719433Is wan not coomer friendly/to ressource intensive of why i barely see stuff?
this dude gets off to asmr and you know it
>>107719311Still not convinced. There's a point where taking the batch size higher would yield diminishing returns due to compute constraints and overall response times would be a factor, as in, what's the point of squeezing an extra few responses into the batches if it makes all the responses take longer than users want
>>107719447holy eslyou're too late, the peak wan era was this summer>>107719433yeah I have nothing to look forward to with the audio in these two models. Sad, but a sora competitor will exist sometime in 2026 for sure >>107719575>Still not convinced.Just taking what I read from the orange sneddit discussion on DFloat11 https://news.ycombinator.com/item?id=43796935But there's also people in that thread that say shit like>Blackwell GPUs support dynamic FP4 quantization with group size 16. At that group size it's close to lossless (in terms of accuracy metrics).which sounds too retarded to believe. I'll check that myself using something like t5-small in a few hours
>>107719433why dont you post this on reddit, fucking subhuman retard?
>>107718348>and what the FUCK are these new aids tier captchas?read what its asking you to do. I use this btwhttps://civitai.com/models/2190659/dasiwa-wan-22-i2v-14b-tastysin-v8-or-lightspeed-or-gguf?modelVersionId=2466604Its the best I've used, it has the speed up lora's merged in as well as a bunch of nsfw lora's and other stuff.
>>107719962does it show assholes?
>>107720148Only if you prompt it to show Piers Morgan
>>107719962why can't any of these retards document their shit properly? the workflow he provides requires patreon models and the info on the left side fails to answer the question on how to use this thing. and it's a gguf which is like a different breed of species. the link sends you to a high version but apparently you need a low too? pure retardation. my normal wan 2.2 i2v just works.
>>107718348>Wan 2.2 SVIit like base wan does not do scene cuts very well, it will change characters appearance too much imo. It handles moments when their face is hidden between clips but an actual | cut to still changes the person's face some what, probably something to do with wan it self, i suspect wan goes into T2V mode when doing a cut scene, oh well its just best to avoid them. all actions should be performed one at a time and use simple prompts but you probably already know that. At least that is the case for I2V. I don't know the right combo of light lora's but lets just say i went back to using base wan and light lora's to see if they would be any better and it completely fucking mangles shit, so i don't know how or what this guy did.https://civitai.com/models/2190659/dasiwa-wan-22-i2v-14b-tastysin-v8-or-lightspeed-or-gguf?modelVersionId=2466604so that is why i recommend it because it just works so well.
>>107720379hey, i'll help you out later in the day, i'm working on an also perfect looping workflow for SVI as in no long chaining of prompts and samplers. I'm not getting quality loss despite using the same seed between clips that others are reporting so i'm on to something good here. The only issue I'm getting is micro jumps between clips when i merge them using an external ffmpeg script because that is faster... But I see in the SVI workflow provided here https://www.reddit.com/r/NeuralCinema/comments/1pyeoci/svi_20_pro_wan_22_84step_infinite_video_workflow/its using some kj node to combine clips with some kind of 5 frame over lap that i don't understand what its doing, so maybe its interpolation of sorts i need to figure out. the jumps are very subtle but still noticeable like jump edits people do on youtube videos.