[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1762840306016417.jpg (1.67 MB, 2190x2355)
1.67 MB
1.67 MB JPG
Discussion of Free and Open Source Diffusion Models

Prev: >>107851707

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>NetaYume
https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0
https://nieta-art.feishu.cn/wiki/RZAawlH2ci74qckRLRPc9tOynrb

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg
>>
>>107855134
Whoaa that first pic is super duper realistic o_0
>>
>>107855134
>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
there is no qwen inpaint, right? you have to use mask and crop/stitch?
>>
>>107855138
They're still around it seems
>https://github.com/nunchaku-ai/ComfyUI-nunchaku
>v1.2.0 Released! Enjoy a 20–30% Z-Image performance boost, seamless LoRA support with native ComfyUI nodes, and INT4 support for 20-series GPUs!

Safe to say wan is officially abandoned
>>
>>107855189
Yes. I just used that lol.
But they are not training anything right now.
No pull requests or discussion of anything being in the works somewhere.
ZiT's PR was open for a while before they merged and released it.
>>
Sorry if this is a dumb question but I’m looking to do realistic nsfw gens with loras, I have plenty of Loras for flux, do these work with chroma? What’s the best chroma checkpoint? Should I be using something better than chroma? Should I retrain the Loras with chroma somehow? Been kind of out of the loop for a bit.
>>
>>107855268
Did you not like the answer anon?
>>107855181
>>
File: file.png (6 KB, 326x105)
6 KB
6 KB PNG
Beijing time tracker anon here
it's 7:30AM there, soon they will wake up and be preparing to drop the GLM Image model on us
>>
>>107855277
ah sorry, i didnt see it, thought it got swept up before new thread.

so SDXL is STILL the best for nsfw?? im kinda shocked, i got much better results with flux overall.
>>
>>107854980
i've been downloading basically everything since XL was released so i'm good on that. I have practically every lora for SDXL, ZIT, Qwen, Chroma, Wan2.1 & Wan2.2.
>>
File: 1757214287690012.png (1.28 MB, 1728x1149)
1.28 MB
1.28 MB PNG
LTX2 is amazing. source image here:

Spongebob Squarepants grabs a rifle and says "Hey Patrick, my memory costs five thousand dollars, lets take over the data center!". Patrick says "okay, Spongebob!".

https://files.catbox.moe/r8v8q1.mp4
>>
what do we do when comfyui goes ipo?
>>
>>107855372
Short it.
>>
>>107855372
straight into the S&P500, your retirement fund will be 20% comfyui stocks
>>
https://files.catbox.moe/w8qizx.mp4
>>
>>107855372
sell immediately before comfy is allowed to
>>
>tranibake
>>
>>107855444
can you guys agree to a truce this week? two models might drop and it'd be a shame if the discussion is drowned out by nonsense
>>
>>107855453
we're not getting base so it really doesn't matter.
>>
unblessed thread
>>
File: 1739643256259775.jpg (68 KB, 978x1094)
68 KB
68 KB JPG
>>107855453
>might
>>
File: unsloth vs nunchaku zit.jpg (2.96 MB, 3584x4608)
2.96 MB
2.96 MB JPG
Unsloth seems to have released higher quality quants for some image models a few days ago. Certain layers are kept at slightly higher quants like Q_5 and Q_6 to boost quality. I think it does a good job for Q4 quant of a 6B model. Shame the quantization implementation for diffusion models suck. It runs slower than bf16 and q8. Use case for this seems very niche. It still runs similarly slower at high resolutions too, so no can't use it for that neither.
Much better quality than nunchaku. I almost wonder if they fucked up the nunchaku implementation somehow? 32 rank one is completely fucking raped, blurry, unusable mess. 128 and 256 rank ones are better, but still blurry and Q_4_K_M is noticeably closer to the original image. Although Nunchaku has the advantage of running three times faster, it's not worth it with the current quality imo. I wonder if they made a schizo 1024 rank model it would perform well. Should still run faster than normal quants and use less memory than q8, if my assumptions are correct.
Tested with 3060 12gb. And no I am fine with bf16 for ZiT too, tested out of curiosity and to see if it will be useful for the base.
Thanks for reading my blog.
>>107855453
>if
It WILL get drowned out by that.
>>
>>107855347
rich fucker
>>
>>107855302
Flux has higher quality, but it doesn't know nsfw.
Mixing multiple flux loras together to do nsfw is very iffy.
Chroma knows NSFW and has far higher maximum quality than SDXL, but is several times slower and likes to shit out worse than SDXL anatomy commonly.
As I said, if you want low quality but fast and reliable gens go with SDXL, if you want to play seed lottery with slow gen times, but occasionally get great gens, go with Chroma.
That's more or less all there is to it.
>>
>>107855134
Why is AniStudio not in OP?
>>
>>107855478
yeah ZiT is so small that i'm surprised people bother with quants
the video models are a different story. what's interesting is some anon in the wsg thread found that the Q8 quants have a big difference to bf16 outputs which usually isn't the case
>>
>>107855498
it's less than 20TB. you can buy a single 24TB HDD for $500 and fit nearly all of it. now if you wanted to keep all loras on sdds, then yeah be prepared to spend thousands
>>
>>107855506
I meant to type worse than SD 1.5
>>107855523
In the current thread? Curious about precisely how.
>>
>>107855536
>In the current thread? Curious about precisely how.
yeah it's the current thread and i forgot to somehow mention i was talking about LTX. anon did a few comparison video between different quants
>>>/wsg/6069549
>>
I almost miss working with SDXL
The better the models get, the more its about seed lottery and "prompt engineering" but I am tired of doing so many iterations its cluttering my SSD
>>
>>107855564
what was sdxl about unc?
>>
>>107855353
Spongebob speaks the truth
>>
>>107855574
fast gens, mixing loras, masking and photoshop edits, because the prompt comprehension was atrocious
newer models like Flux 2, Qwen2511 have superior comprehension but you need to do a lot of prompt engineering to unlock their potential
>>
>>107855564
for me? it's the wild variation unets offered. all dit models take everything too literally sometimes and leaves little room for a more exciting result
>>
>>107855564
Wasn't the meta with sdxl is gen a boatload and pick from the best one from the monstrosities lol
>>
so since they cancelled z base, what is next on the horizon?
>>
>>107855599
ltx 2.1
>>
>>107855552
Possibly a bug?
The model is new + new major backend stuff (comfy-kitchen) has been merged recently.
It's possible that something is implemented wrongly, handling the quantized data incorrectly, rather than quant sucking so much.
Q8 of a 19b model different that much from the baseline warrants some search for a decent explanation.
>>
What's the best model for pixel art?
I mean both sprites and portrait stuff, I need placeholder art for my game so prototyping is the closest thing to the final stuff.
I remember seeing really good sprites uploaded here by an anon a long ago.
>>
>>107855615
lmao >>>/wsg/6071832
>>
>>107855510
see>>107855175
>>
What's the difference between a text encoder and a llm?
>>
>>107855506
so qwen and zimage arent contenders for nsfw anymore? im just so confused at the discourse, it seems like it should be clear what the best image generators are in each area, what about for inpainting and such? i just tried the newest qwen image edit 2511 and it was still terrible, and flux inpainting still seems to be the best for nsfw? i just wish there was somewhere i could get up to date info on this stuff
>>
>>107855652
Nothing, same thing.
>>
>>107855652
due to there generally only one LLM being trained against a diffusion model, it's the model's text encoder

it might output a strange space of concepts rather than text too
>>
>>107855660
if you want nsfw use SDXL. Illustrious finetunes are the most coherent SDXL models. My personal favorite finetune is UncannyValley.
>>
I started sending my ai slop to some of my friends and I'm starting to worry because most of them don't realize it is AI
>>
File: 1753961513639870.png (1.56 MB, 1584x1056)
1.56 MB
1.56 MB PNG
>>107855353
also for fun, a qwen edit 2511 edit

give them ak-47s:
>>
>>107855660
you can do a limited amount of nsfw with qwen, zimage, hyimage, whatever.

but illustrious/noob and chroma are the models that are more broadly nsfw trained, they understand vastly more in that regard
>>
>>107855617
not saying this is the best but you can try a few of the loras i've published:
https://civitai.com/user/n1eze
>>
>>107855660
Neither Qwen nor ZiT know NSFW out of the box. NSFW loras either barely exist (Qwen) or are in shit quality and don't mix with other loras (Z-Image).
There is no expectation that anyone will make a major NSFW fientune of a big model like Qwen, but Z-Image will likely have people attempt to beat NSFW into it properly once the base version releases (current version is distilled and sucks for finetuning.). We will see if it will work out, but current Chroma and SDXL are your two options for nsfw.
I don't know jack shit about inpainting and I am tired of typing paragraphs.
>>
>>107855693
qwean can do nsfw fine if you use an ablited text encoder
>>
>>107855694
can you make a basedjack lora
>>
>>107855706
some other anon already made one for Z image and published it, i don't have a dataset for that
>>
>>107855709
benchod
>>
>>107855714
??? idk what that means
>>
File: 1755646012483881.png (1.44 MB, 1584x1056)
1.44 MB
1.44 MB PNG
>>107855690
ice cream:
>>
>>107855699
It still doesn't know what genitalia or sex actions look like.
Don't bother with this.
>>
>>107855727
sex is when penis goes into vagina
>>
>>107855696
ZIT loras are abysmal. Almost every one changes the subject too much or just gives poor results.
>>
>>107855749
Or poophole
>>
File: 1740476363934013.png (14 KB, 976x207)
14 KB
14 KB PNG
What do you use to load the new LTX-2 vae by itself? Using picrel standard vae loader just gives me a black screen.
>>
>>
>>107855189
literally making nunchaku where it's the least needed
>>
the porn loras I've seen trainings on for ltx2 look like the most basic stuff without any dataset filtering to get some nice girls out of it, do you think the model will be able to generalize beyond "40-45 yo heavy smoker milf"?
>>
>>107855775
you need to use kijai vae loader
>>
anyone know a good inpainting comfyui workflow for nsfw? so far the best one i've found is still flux kontext, there has to be something better that this point isnt there? i've been fairly frustrated with this so far, if anyone can help i would really appreciate it
>>
File: 8986.png (1.21 MB, 1048x1048)
1.21 MB
1.21 MB PNG
>>
>>107855808
flux kontext is not inpainting, inpainting is when you use a masking node and its usually done with SDXL. attaching controlnets helps
>>
>>107855808
>>107855825
Flux fill I believe is the dedicated inpainting variant.
You can use kontext with masks, though I am not sure about the quality.
>>
>>107855798
That worked, thanks.
>>
>>107855882
any time
>>
>>107855825
i use it for inpainting and it seems to do better than flux fill, especially for nsfw

>>107855839
the kontext inpainting is the best i've seen so far, seems to outperform qwen but i do need to test a qwen inpainting nsfw lora i found


anyone have anything better?
>>
>>107854466
nice
>>
>>107855723
kek
>>
imagine contributing to a software because you believe in foss and the guy sells out
lmao
>>
At what strength do you use the detailler lora in ltx-2 in i2v?
>>
saars
https://huggingface.co/zai-org/GLM-Image
>>
>>107856008
I'll wait for GoFuckem to test it first and iron out all the kinks.
>>
>>107855928
I took some kpin ok I wasnt paying atention
>>
>>107856008
Seems promising if example images aren't heavily cherry picked.
I feel like I should wait until proper cumfart support + workflows + quants appear though.
>>
>>107856008
this is some unholy qwen distill isnt it
>>
File: zimg_00112.png (1.46 MB, 960x1280)
1.46 MB
1.46 MB PNG
downloadin... wish me luck
>>
an autorregressive model just flew over my house
>>
>>107856051
Should have zoomed in and looked for more than two seconds, feeling emberassed rn.
Sloppy look.
It's doa unless it runs very fast and/or trains extremely well.
>>
File: 1762048031180424.png (60 KB, 1376x314)
60 KB
60 KB PNG
>already in damage control mode
>>
my senile uncle keeps falling for picrel type AI slop. anyone know which services people are using to make these vids? i want to make a video of him saying ridiculous shit so that maybe he'll believe me that this type of shit isn't real.
>>
plastic? check
brown tint? check
generic showcase of outdated boring prompts? check
slower yet worse? check
chinkshit? check
it's culture time
>>
https://files.catbox.moe/firn4v.mp4
https://files.catbox.moe/ue5iur.mp4
https://files.catbox.moe/swlmta.mp4
>>
>>107856051
>>107856070
To adding on the fact that they never advertised images and the fact that it's not available on an API like FAL right now mean that they had no confidence in it.
Probably just wanted the investors to tell "we made an image model" in the next quarterly.
Will still give it a shot once the workflows are out.
>>107856081
Not local models, doesn't belong to this thread.
But to answer your question, such videos are primarily made with veo or sora.
>>
how to get less terrible sound with ltx2?
>>
>>107856081
this was generated with comfyui so it belongs in this thread
>>
File: glm image.png (1.6 MB, 1385x786)
1.6 MB
1.6 MB PNG
Take the giraffe for example. You would want an autoregressive model that "reasons and iterates on your prompt" to add coherent fancy details like decide on a nice cover for its book and quirky background posters that fit the anthropomorphic animal in daily life theme. And yet it's all gibberish sloppa. Its outfit isn't wet despite being out in the rain neither.
>>
File: 00002-855848590.png (3.19 MB, 1536x1536)
3.19 MB
3.19 MB PNG
>>107855302
yes sdxl is the best for nsfw stuff.
>>
>>107856143
Either continue from previous audio or wait for zeeb to drop the fixes
>>
>>107856143
Seed, gen at higher quality or upscale. It will always be bad honestly until fix just less bad at higher res.
>>
>>107855785
>tfw we got cultured with wanchaku before chinese culture

holy shit, i dont know whether to kek or cry
>>
File: IMG_0265.jpg (165 KB, 1206x1347)
165 KB
165 KB JPG
>do you like our open source model?
>>
>>107856179
>>107856186
damn, ok
>>
ltx2 update (minor) if you got the kijai distilled q8 or whatever:

13th of January 2026 update !!IMPORTANT!!
Turns out the video VAE in the initial distilled checkpoints has been wrong one all this time, which (of course) was the one I initially extracted. It has now been replaced with the correct one, which should provide much higher detail

at this moment this requires using updated KJNodes VAELoader to work correctly
Kijai/LTXV2_comfy · Hugging Face

https://huggingface.co/Kijai/LTXV2_comfy

https://huggingface.co/Kijai/LTXV2_comfy/blob/main/VAE/LTX2_video_vae_bf16.safetensors
>>
File: creations.jpg (816 KB, 2846x1438)
816 KB
816 KB JPG
>>107856116
Today's creations, not many were worth posting.
>>
File: 1753201761898368.png (19 KB, 936x196)
19 KB
19 KB PNG
Do people use this? It's not used in the default workflow.
>>
>Because the inference optimizations for this architecture are currently limited, the runtime cost is still relatively high. It requires either a single GPU with more than 80GB of memory, or a multi-GPU setup.
How would it even work with multiple GPUs?
>The target image resolution must be divisible by 32. Otherwise, it will throw an error.
So 1024 times compressed latent space? No wonder why the images look shit.
>Guidance scale rather than CFG in the example inference code
Looks like the anon who guessed Qwen distill might be right>>107856053
>>
>bro just autoregressively generate 4096 tokens and only then do you get to feed it into a diffusion model to actually make your image
Have I missed something or is this thing destined to be insanely slow? What is the tokens per second of a 9b LLM on a 5090, like 50 or something at best? It's gonna be well over a minute just for those tokens to gen, then you have a whole ass 7b diffusion model on top of that.
>>
>>107856239
i downloaded this and keep running OOM because i did NOT read the manual lel
>>
File: 00186-578589517.png (2.75 MB, 1824x1248)
2.75 MB
2.75 MB PNG
>>
It requires 80gb because all of these retarded inference scripts load everything at once and keep it loaded during every stage.
>>
File: 00200-4040628510.png (3.03 MB, 1728x1344)
3.03 MB
3.03 MB PNG
>>
File: output_t2i.png (964 KB, 1152x1024)
964 KB
964 KB PNG
i was able to gen a glm image locally with a 3090 24GB and cpu offloading, (i have 128GB).

took about 2 minutes for an image though.
>>
>>107856353
I also have 128GB, what did you use? Comfy?
>>
>>107856353
Step 2 sounds like something pol would say.
>>
>>107856353
did you write all the text in the final image?
>>
File: output_t2i.png (745 KB, 768x768)
745 KB
745 KB PNG
attempt two is naturally, 1girl
30 steps, 768x768, it takes a long time before starting to gen just beating the shit out of my ram. this one only took 40 secs to gen after though. it clearly does not like the smaller resolution.

>>107856355
i used the inference script from the model page. attempting to drop the resolution and steps to see what kinda timing i get but it'll probably look like ass.

>>107856364
the prompt is from the official page
>>
>>107856378
OK thanks anon, I wonder if it'll be able to do llamacpp style offloading to ram in comfy.
>>
>>107856417
dropping the res also made it eat shit, going back to defaults and starting again. all the offloading stuff is already present in the gen pipeline so i don't see why it wouldn't be in comfy, it's all python. it's a big as fuck offload though.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.