[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1.jpg (1.88 MB, 3684x2282)
1.88 MB
1.88 MB JPG
Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107710110

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2485296
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg
>>
is the spam filter down? can we have a thread at last?
>>
What light lora combo are people using with Wan 2.2 SVI? Everything I try gives me the dreaded slow motion effect, even a combo that works flawlessly without SVI.

Also, haven't posted in awhile and what the FUCK are these new aids tier captchas?
>>
>>107718348
>What light lora combo are people using with Wan 2.2 SVI?
frame editing in actual software
>>
File: ComfyUI_00021_.png (1.91 MB, 1080x1920)
1.91 MB
1.91 MB PNG
Only wan2.2 low noise can output this kind of anatomy perfection... and it's a video model FFS u_u
>>
HOLY BLURRY OVER EXPOSED SLOPPY DOPPY
>>
>>107718306
thanks for the cozy bread anon
>>
>>107718390
what loras tho?
I can get ok-ish stuff with the wan remix workflow, but nowhere near that detailed
>>
File: 1763688331719309.png (10 KB, 301x230)
10 KB
10 KB PNG
which one of these is the bitcoin miner
>>
>>107718446
comfyui
>>
>>107718451
this
>>
File: ComfyUI_00004_.png (1.1 MB, 1152x832)
1.1 MB
1.1 MB PNG
>>107718446
Check your RAM and disk usage when genning, especially during the initial load, it's probably spilling onto your pagefile. Beware that it's not only painfully slow, it also rapes your SSD with constant writes
>>
>>107718461
peak comfyui gen
>>
>>107718461
>it also rapes your SSD with constant writes
great software comfy. what a great feature
>>
File: 1756435294175495.png (54 KB, 1100x523)
54 KB
54 KB PNG
>lol just ask on discord
>>
>>107718513
the guy that writes that shit doesn't even know how to use comfy in the first place. the whole org is a shit show
>>
File: ComfyUI_00007_.png (1.11 MB, 1152x832)
1.11 MB
1.11 MB PNG
>>107718471
That's just how memory management in Bimbows works, it'll start offloading to pagefile long before you hit max memory limit. The reason why Comfy is a hack is he didn't bother with implementing partial splits between available CUDA devices, especially if you have multiple GPUs available. It'll just try to load the whole thing into RAM and then move everything (or most of it at least) into VRAM. Or maybe it's a pytorch limitation, idk I'm not too tech savvy
>>
>>107718571
what's up with comfy users? why do they generate kids so often?
>>
>>107717857
Hey i was at work thanks for answering, try to look into wan animate than
>>
>>107718390
>le flux chin
disgusting
>>
this place is dead as fuck, hoooly.
anyway, qwen just curb stomped slam dunked z-img it seems. will find out though, ggoof still downloading.
https://x.com/Alibaba_Qwen/status/2006294325240668255
>>
>>107718988
>leaving to celebrate new year in 10min

At least it's not base.
>>
>>107718988
DFloat11 when?
>>
>>107718988
>curb stomped
They're on the same team lmao
>>107719022
>DFloat11 when?
If you have the memory to run it, you almost certainly have the memory to quantise it to DF11 yourself
>>
Low key just waiting for LTX 2 next month.
>>
>>107719076
>They're on the same team
there are no teams when it's chinese bloodsports!
>>
okay yeah this kinda blows compared to z-img. and that's a TURBO tarded not even fully trained model. that said, it's nice. better than flux at least.
>>
File: file.png (1.67 MB, 720x1280)
1.67 MB
1.67 MB PNG
>>107718988
> woman lying on grass
dont bother downloading
>>
>>107719143
i mean it could just be the lightning lora isn't fully compatible
but it ain't lookin' good chief.
https://huggingface.co/lightx2v/Qwen-Image-Lightning/tree/main
>>
>>107719022
>At batch size = 1, inference is approximately 2× slower than the original BF16 model, but the performance gap narrows significantly with larger batches.
DFloat11 is snake oil. If you have low vram you're trading 30% reduction in weight size for 2x slower inference, offloading would be faster. If you have the vram to run larger batches you don't need a 30% reduction in weight size. There's no valid use case.
>>
>meanwhile a month ago in 4 steps on z-img turbo

>>107719162
also this, in general, just offload. i've been copemaxxing with low q quants or fp8 when this entire time i could've just offloaded a few niggerbytes on a full q8 model or fp16 if it's reasonably sized. i'm running the new qwen at q8.
>>
that DoF looks horrific. oil pastel turbocharged
>>
File: ComfyUI_00062_.png (1.3 MB, 720x1280)
1.3 MB
1.3 MB PNG
>>107719152
>>107719143 was without lightning lora
picrel with
>>
>>107719183
yikes buddy. well time to free up that 20 gigs. (and the lightning loras)
>>
File: ComfyUI_05184.png (2.42 MB, 2160x1168)
2.42 MB
2.42 MB PNG
>>107719076
>>107719162
I was only pretending to be retarded! Q8/BF16 fits in my 4090 just fine. 2512 froze around 30 steps though and went from ~3.4it/s to 44+ for the rest of the 40 steps. Took 568s for this image. Way slower than I remember the edit models being.
>>
>>107718390
catbox?
>>
>>107719079
>just waiting for LTX 2 next month.
Same. I tried wan 2.6 on their website wan.video (they're kinda trying to be like sora) and it was disappointing. LTX2 is also looking a little disappointing but once something is out people will finetune it to be better

>>107719162
>>107719170
The valid use case for DF11 is large models run at a large batch size by corps because the inference hit is negated at a higher batch size iirc
So if you're serving an LLM, or you're Google serving a massive autoregressive edit model it makes sense

Most people should use q8_0 for most tasks on most models (vae excluded, text encoder preferably excluded). In fact, even if you can fit bf16 in memory you should still use the q8_0 if you value your time
>>
File: ComfyUI_00212_.png (1.09 MB, 752x1392)
1.09 MB
1.09 MB PNG
>>107719311
>because the inference hit is negated at a higher batch size iirc

aaahh i see.
>>
>>107719183
>mfw i lay on the grass and realize the back of my head is now full of shit stains
>>
File: 00008-2054810216.png (2.4 MB, 1824x1248)
2.4 MB
2.4 MB PNG
>>107719079
i have mixed feeling toward ltx-2 now that multiple Chinese models have come out this month with combined audio generation with 10-15 second prompt adherence. Even seedance 1.5 has video+audio combined generation and it's uncensored. LTX-2 is very pricey when it comes to using it online with light trick's own credit system and it's censored with the ken/barbie doll bodies. hopefully LTX-2 isn't too vram hungry and will be able to run on 24gb-32gb vram. This is the make it or break it for light tricks to get their shit together. The chinks are competing for the 3rd and 4th place in the ai video generation market with kling2.6, minimax hailuo2.3, wan2.6 and seedance 1.5. There is also runway, luma, vidu and pika in the competition for the ai video generation space. LTX-2 needs to be really good from the start to overthrow the popular dominance that wan 2.2 has over the open source scene.
here are some ltx 2 video if anyone's curious
https://files.catbox.moe/gz8lao.mp4
https://files.catbox.moe/hn9uyw.mp4
https://files.catbox.moe/7vweqv.mp4
https://files.catbox.moe/clx4i3.mp4
https://files.catbox.moe/abb430.mp4

here are some seedance 1.5 video(closed source)
https://files.catbox.moe/b71h5f.mp4
https://files.catbox.moe/adkp98.mp4
https://files.catbox.moe/adkp98.mp4
>>
>>107719433
Is wan not coomer friendly/to ressource intensive of why i barely see stuff?
>>
this dude gets off to asmr and you know it
>>
>>107719311
Still not convinced. There's a point where taking the batch size higher would yield diminishing returns due to compute constraints and overall response times would be a factor, as in, what's the point of squeezing an extra few responses into the batches if it makes all the responses take longer than users want
>>
>>107719447
holy esl
you're too late, the peak wan era was this summer

>>107719433
yeah I have nothing to look forward to with the audio in these two models. Sad, but a sora competitor will exist sometime in 2026 for sure

>>107719575
>Still not convinced.
Just taking what I read from the orange sneddit discussion on DFloat11 https://news.ycombinator.com/item?id=43796935

But there's also people in that thread that say shit like
>Blackwell GPUs support dynamic FP4 quantization with group size 16. At that group size it's close to lossless (in terms of accuracy metrics).
which sounds too retarded to believe. I'll check that myself using something like t5-small in a few hours
>>
>>107719433
why dont you post this on reddit, fucking subhuman retard?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.