[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1.jpg (1.88 MB, 3684x2282)
1.88 MB
1.88 MB JPG
Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107710110

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2485296
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg
>>
is the spam filter down? can we have a thread at last?
>>
What light lora combo are people using with Wan 2.2 SVI? Everything I try gives me the dreaded slow motion effect, even a combo that works flawlessly without SVI.

Also, haven't posted in awhile and what the FUCK are these new aids tier captchas?
>>
>>107718348
>What light lora combo are people using with Wan 2.2 SVI?
frame editing in actual software
>>
File: ComfyUI_00021_.png (1.91 MB, 1080x1920)
1.91 MB
1.91 MB PNG
Only wan2.2 low noise can output this kind of anatomy perfection... and it's a video model FFS u_u
>>
HOLY BLURRY OVER EXPOSED SLOPPY DOPPY
>>
>>107718306
thanks for the cozy bread anon
>>
>>107718390
what loras tho?
I can get ok-ish stuff with the wan remix workflow, but nowhere near that detailed
>>
File: 1763688331719309.png (10 KB, 301x230)
10 KB
10 KB PNG
which one of these is the bitcoin miner
>>
>>107718446
comfyui
>>
>>107718451
this
>>
File: ComfyUI_00004_.png (1.1 MB, 1152x832)
1.1 MB
1.1 MB PNG
>>107718446
Check your RAM and disk usage when genning, especially during the initial load, it's probably spilling onto your pagefile. Beware that it's not only painfully slow, it also rapes your SSD with constant writes
>>
>>107718461
peak comfyui gen
>>
>>107718461
>it also rapes your SSD with constant writes
great software comfy. what a great feature
>>
File: 1756435294175495.png (54 KB, 1100x523)
54 KB
54 KB PNG
>lol just ask on discord
>>
>>107718513
the guy that writes that shit doesn't even know how to use comfy in the first place. the whole org is a shit show
>>
File: ComfyUI_00007_.png (1.11 MB, 1152x832)
1.11 MB
1.11 MB PNG
>>107718471
That's just how memory management in Bimbows works, it'll start offloading to pagefile long before you hit max memory limit. The reason why Comfy is a hack is he didn't bother with implementing partial splits between available CUDA devices, especially if you have multiple GPUs available. It'll just try to load the whole thing into RAM and then move everything (or most of it at least) into VRAM. Or maybe it's a pytorch limitation, idk I'm not too tech savvy
>>
>>107718571
what's up with comfy users? why do they generate kids so often?
>>
>>107717857
Hey i was at work thanks for answering, try to look into wan animate than
>>
>>107718390
>le flux chin
disgusting
>>
this place is dead as fuck, hoooly.
anyway, qwen just curb stomped slam dunked z-img it seems. will find out though, ggoof still downloading.
https://x.com/Alibaba_Qwen/status/2006294325240668255
>>
>>107718988
>leaving to celebrate new year in 10min

At least it's not base.
>>
>>107718988
DFloat11 when?
>>
>>107718988
>curb stomped
They're on the same team lmao
>>107719022
>DFloat11 when?
If you have the memory to run it, you almost certainly have the memory to quantise it to DF11 yourself
>>
Low key just waiting for LTX 2 next month.
>>
>>107719076
>They're on the same team
there are no teams when it's chinese bloodsports!
>>
okay yeah this kinda blows compared to z-img. and that's a TURBO tarded not even fully trained model. that said, it's nice. better than flux at least.
>>
File: file.png (1.67 MB, 720x1280)
1.67 MB
1.67 MB PNG
>>107718988
> woman lying on grass
dont bother downloading
>>
>>107719143
i mean it could just be the lightning lora isn't fully compatible
but it ain't lookin' good chief.
https://huggingface.co/lightx2v/Qwen-Image-Lightning/tree/main
>>
>>107719022
>At batch size = 1, inference is approximately 2× slower than the original BF16 model, but the performance gap narrows significantly with larger batches.
DFloat11 is snake oil. If you have low vram you're trading 30% reduction in weight size for 2x slower inference, offloading would be faster. If you have the vram to run larger batches you don't need a 30% reduction in weight size. There's no valid use case.
>>
>meanwhile a month ago in 4 steps on z-img turbo

>>107719162
also this, in general, just offload. i've been copemaxxing with low q quants or fp8 when this entire time i could've just offloaded a few niggerbytes on a full q8 model or fp16 if it's reasonably sized. i'm running the new qwen at q8.
>>
that DoF looks horrific. oil pastel turbocharged
>>
File: ComfyUI_00062_.png (1.3 MB, 720x1280)
1.3 MB
1.3 MB PNG
>>107719152
>>107719143 was without lightning lora
picrel with
>>
>>107719183
yikes buddy. well time to free up that 20 gigs. (and the lightning loras)
>>
File: ComfyUI_05184.png (2.42 MB, 2160x1168)
2.42 MB
2.42 MB PNG
>>107719076
>>107719162
I was only pretending to be retarded! Q8/BF16 fits in my 4090 just fine. 2512 froze around 30 steps though and went from ~3.4it/s to 44+ for the rest of the 40 steps. Took 568s for this image. Way slower than I remember the edit models being.
>>
>>107718390
catbox?
>>
>>107719079
>just waiting for LTX 2 next month.
Same. I tried wan 2.6 on their website wan.video (they're kinda trying to be like sora) and it was disappointing. LTX2 is also looking a little disappointing but once something is out people will finetune it to be better

>>107719162
>>107719170
The valid use case for DF11 is large models run at a large batch size by corps because the inference hit is negated at a higher batch size iirc
So if you're serving an LLM, or you're Google serving a massive autoregressive edit model it makes sense

Most people should use q8_0 for most tasks on most models (vae excluded, text encoder preferably excluded). In fact, even if you can fit bf16 in memory you should still use the q8_0 if you value your time
>>
File: ComfyUI_00212_.png (1.09 MB, 752x1392)
1.09 MB
1.09 MB PNG
>>107719311
>because the inference hit is negated at a higher batch size iirc

aaahh i see.
>>
>>107719183
>mfw i lay on the grass and realize the back of my head is now full of shit stains
>>
File: 00008-2054810216.png (2.4 MB, 1824x1248)
2.4 MB
2.4 MB PNG
>>107719079
i have mixed feeling toward ltx-2 now that multiple Chinese models have come out this month with combined audio generation with 10-15 second prompt adherence. Even seedance 1.5 has video+audio combined generation and it's uncensored. LTX-2 is very pricey when it comes to using it online with light trick's own credit system and it's censored with the ken/barbie doll bodies. hopefully LTX-2 isn't too vram hungry and will be able to run on 24gb-32gb vram. This is the make it or break it for light tricks to get their shit together. The chinks are competing for the 3rd and 4th place in the ai video generation market with kling2.6, minimax hailuo2.3, wan2.6 and seedance 1.5. There is also runway, luma, vidu and pika in the competition for the ai video generation space. LTX-2 needs to be really good from the start to overthrow the popular dominance that wan 2.2 has over the open source scene.
here are some ltx 2 video if anyone's curious
https://files.catbox.moe/gz8lao.mp4
https://files.catbox.moe/hn9uyw.mp4
https://files.catbox.moe/7vweqv.mp4
https://files.catbox.moe/clx4i3.mp4
https://files.catbox.moe/abb430.mp4

here are some seedance 1.5 video(closed source)
https://files.catbox.moe/b71h5f.mp4
https://files.catbox.moe/adkp98.mp4
https://files.catbox.moe/adkp98.mp4
>>
>>107719433
Is wan not coomer friendly/to ressource intensive of why i barely see stuff?
>>
this dude gets off to asmr and you know it
>>
>>107719311
Still not convinced. There's a point where taking the batch size higher would yield diminishing returns due to compute constraints and overall response times would be a factor, as in, what's the point of squeezing an extra few responses into the batches if it makes all the responses take longer than users want
>>
>>107719447
holy esl
you're too late, the peak wan era was this summer

>>107719433
yeah I have nothing to look forward to with the audio in these two models. Sad, but a sora competitor will exist sometime in 2026 for sure

>>107719575
>Still not convinced.
Just taking what I read from the orange sneddit discussion on DFloat11 https://news.ycombinator.com/item?id=43796935

But there's also people in that thread that say shit like
>Blackwell GPUs support dynamic FP4 quantization with group size 16. At that group size it's close to lossless (in terms of accuracy metrics).
which sounds too retarded to believe. I'll check that myself using something like t5-small in a few hours
>>
>>107719433
why dont you post this on reddit, fucking subhuman retard?
>>
>>107718348
>and what the FUCK are these new aids tier captchas?
read what its asking you to do.

I use this btw

https://civitai.com/models/2190659/dasiwa-wan-22-i2v-14b-tastysin-v8-or-lightspeed-or-gguf?modelVersionId=2466604

Its the best I've used, it has the speed up lora's merged in as well as a bunch of nsfw lora's and other stuff.
>>
>>107719962
does it show assholes?
>>
>>107720148
Only if you prompt it to show Piers Morgan
>>
File: img_00048_.jpg (595 KB, 1288x1656)
595 KB
595 KB JPG
>>
File: img_00051_.jpg (548 KB, 1288x1656)
548 KB
548 KB JPG
>>
>>107719962
why can't any of these retards document their shit properly? the workflow he provides requires patreon models and the info on the left side fails to answer the question on how to use this thing. and it's a gguf which is like a different breed of species. the link sends you to a high version but apparently you need a low too? pure retardation. my normal wan 2.2 i2v just works.
>>
>>107718348
>Wan 2.2 SVI
it like base wan does not do scene cuts very well, it will change characters appearance too much imo. It handles moments when their face is hidden between clips but an actual | cut to still changes the person's face some what, probably something to do with wan it self, i suspect wan goes into T2V mode when doing a cut scene, oh well its just best to avoid them.

all actions should be performed one at a time and use simple prompts but you probably already know that. At least that is the case for I2V.

I don't know the right combo of light lora's but lets just say i went back to using base wan and light lora's to see if they would be any better and it completely fucking mangles shit, so i don't know how or what this guy did.

https://civitai.com/models/2190659/dasiwa-wan-22-i2v-14b-tastysin-v8-or-lightspeed-or-gguf?modelVersionId=2466604

so that is why i recommend it because it just works so well.
>>
>>107720379
hey, i'll help you out later in the day, i'm working on an also perfect looping workflow for SVI as in no long chaining of prompts and samplers. I'm not getting quality loss despite using the same seed between clips that others are reporting so i'm on to something good here. The only issue I'm getting is micro jumps between clips when i merge them using an external ffmpeg script because that is faster... But I see in the SVI workflow provided here

https://www.reddit.com/r/NeuralCinema/comments/1pyeoci/svi_20_pro_wan_22_84step_infinite_video_workflow/

its using some kj node to combine clips with some kind of 5 frame over lap that i don't understand what its doing, so maybe its interpolation of sorts i need to figure out. the jumps are very subtle but still noticeable like jump edits people do on youtube videos.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.