[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Discussion of Free and Open Source Diffusion Models

Prev: >>107846749

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>NetaYume
https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0
https://nieta-art.feishu.cn/wiki/RZAawlH2ci74qckRLRPc9tOynrb

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
Blessed thread of frenship
>>
>>107849302
>I tried a lora and it destroyed the quality of zit
don't worry dude, once we get z-image base we'll be able to do actual loras
https://files.catbox.moe/x20yb0.mp4
>>
File: 1741770053106317.png (316 KB, 495x541)
316 KB
316 KB PNG
>>107850075
why does ltx2 has the tendancy to do some MewMaxxing mode on humans, as if it was only trained on gigachad or something lol
>>
File: 144727-2005253962.jpg (30 KB, 384x384)
30 KB
30 KB JPG
>>107850075
>once we get z-image base
>>
File: 00071-1641228464.png (2.98 MB, 1248x1824)
2.98 MB
2.98 MB PNG
>>
File: file.png (27 KB, 670x193)
27 KB
27 KB PNG
>>
>>107850112
time to be dissapointed by another mid image model, Autoregressive models have always been really shit
>>
>>107850102
cutie
>>
>>107850112
>model support gets added
>model gets never released
chinese culture
>>
>>107850119
b-but, random twitter chinese man said it's gonna be released this week!1!!11
>>
>https://github.com/bytedance/ATI
>wan ati
use case?
>>
>>107850112
I have github commit fatigue
>>
>>107850126
he did not say that blackie, he said soon, learn to read
>>
>>107850144
>he said soon
he said "next week" last week brownie
https://xcancel.com/bdsqlsz/status/2009911175019168215#m
>>
>>107850146
>he did not in fact specify which model
hm
>>
>>107850089
kek >>>/wsg/6071667
>>
>>107850148
I love chinese culture
>>
>>107850112
is there any image output made with that model? we don't even know what it's capable of
>>
File: n_Kr6l7Y.mp4 (1.64 MB, 480x640)
1.64 MB
1.64 MB MP4
>>107850177
>>
File: 62103526204.jpg (545 KB, 960x960)
545 KB
545 KB JPG
>>107850075
Chinese culture
>>
File: img_00162_.jpg (791 KB, 1520x1728)
791 KB
791 KB JPG
>>
>>107850195
some speculation from the previous thread:
>>107847181
>>107847267
>>
>>107850112
>>107850117
>Autoregressive models have always been really shit
the glm team is far from being a mid company, their LLMs are really really good, if they can compete against Alibaba (Qwen) on that, I think they also can on image models
>>
File: Z_00034_.png (1.55 MB, 1024x1024)
1.55 MB
1.55 MB PNG
>>
File: img_00170_.jpg (320 KB, 1520x1152)
320 KB
320 KB JPG
>>
>>107850212
Anything on the size, is it gonna be trainable without a supercomputer or some abomination that even a 5090 can only run quantized?
>>
>>107850102
fuck skirts women should always go like that.
>>
>>107850267
it's a 9b model
https://github.com/huggingface/diffusers/blob/6cfc83b4abc5b083fef56a18ec4700f48ba3aaba/docs/source/en/api/pipelines/glm_image.md
>Autoregressive generator: a 9B-parameter model initialized from GLM-4-9B-0414,
>>
>>107850135
git commit -m 'suicide'
>>
>train influencer lora of girl who has somewhat crooked or unique teeth
>picks up everything else about the likeness but the teeth are the normal perfect ZIT teeth.
I didn't caption it thinking it'd simply pick it up along other things, do I have to caption for it? Or is it a too small detail to be picked up? It learned other things that even showed up in just one image like hair styles and such.
>>
File: Untitled.png (19 KB, 736x229)
19 KB
19 KB PNG
>>
>>107850330
dont caption it, add a few close up images with teeth visible, train at 64 rank and train longer maybe until you get it
>>
>>107850332
>Redditors not understanding Chinese Culture
not surprising
>>
File: img_00192_.jpg (443 KB, 1264x1672)
443 KB
443 KB JPG
>>
File: 00096-482262277.png (2.64 MB, 1824x1248)
2.64 MB
2.64 MB PNG
>>
>>107850399
>>
why is this retard lodestones training Z-Chroma on base ZIT and not the dedistilled version
>>
>>107850474
be respectful
>>
>>107850474
The man has no forward thinking. Someone slips him a script and he runs it.
>>
File: img_00232_.jpg (648 KB, 1264x1672)
648 KB
648 KB JPG
>>
File: hayao3-250953126.jpg (65 KB, 672x672)
65 KB
65 KB JPG
>>107850370
can someone explain this aspect of chinese culture to me, there's no point in keeping Z SAAS because while good compared to local models the other SAAS models are much better and it wouldn't be a competition. So why not release it, after all ZIT is the proof the base model exists, there's no point in keeping it private outside of blueballing randos on the internet, I don't get it
>>
>>107850492
it takes time to train models and training a model to be good at editing is much different and harder than finetuning a model to be good at realism
>>
>https://github.com/Tencent-Hunyuan/HY-WorldPlay
has anyone tried this?
>>
>>107850506
>training a model to be good at editing
Z-Base is not an edit model
>>
>>107850513
Omni can do edit
>>
>>107850516
Wehre did they say that, I remember them saying they are separate.
>>
File: img_00239_.jpg (667 KB, 1624x1840)
667 KB
667 KB JPG
>>
>>107850492
What this >>107850506 user doesn't understand is that the base model was done long before zit was even released. What we are seeing now is an aspect of Chinese culture I suggest you all get familiar with. That in particular being the fact they cannot release the model even though they probably intended to and now are incrementally rolling back expectations to save face. That's why we get some inference code updates periodically so nobody outright confronts them on their bullshit. But this is classic Chinese face saving behavior.

If you're familiar with izzat, it's like that but less... destructive and more trouble avoidance.
>>
>>107850532
what's the point of their team posting things like "Your patience will be rewarded" then, that's the opposite of rolling back expectations
>>
(-`ω´- )人 wafu
>>
>>107850549
moron
>>
https://huggingface.co/Kijai/LTXV2_comfy/tree/main/VAE

Heads up. LTX shipped a fake and gay vae with ltx distilled and kijai uploaded the good one here.
>>
>>107850491
which model is that?
>>
>>107850541
>what's the point of their team posting things like "Your patience will be rewarded" then, that's the opposite of rolling back expectations

There is no point. It's just buying time until they can clear their hands of the burden altogether. What part of saving face don't you understand?
>>
>>107850554
is that link the best place for quants?
>>
>>107850554
it's the same, they're just separated. makes for better memory management though. and you have to load the audio one with kijai's VAELoader KJ node
>>
>>107850532
Yeah, I mean the reason why they aren't releasing the base, not for the face saving shenigans. It's obviously existing and finished, so why not release it? Did the CCP forbid them or what?
>>
>>107850559
dude they merged a commit on diffusers and moderscope, if they wanted to say no they wouldn't have done all this effort, they didn't pretend anything when they ended up not releasing Wan 2.5, they just went on with their lifes
>>
File: 1757655596819636.png (2.06 MB, 1440x1563)
2.06 MB
2.06 MB PNG
>>107850578
yu don nied mowe than Qwen Image gwello!
>>
File: Untitled.png (35 KB, 879x160)
35 KB
35 KB PNG
>>107850568

Either I'm misunderstanding something or you are, but the file was just uploaded 20 minutes ago and the size is different to the previous separated vae.
>>
>>107850585
are they fucking serious? lmaoo
>>
>>107850578
All I can say to you is Chinese culture.
>>
>>107850585
oh shit you're right dawg, he switched out the file with the newer one
sorry i jumped to conclusions because i was already using his separated VAE but the older one
>>
>>107850589
you're courting death
>>
>>107850492
>he other SAAS models are much better
z-image turbo being a distilled model limits it in terms of finetuning and loras, however not in usage, at least on the level that you can use SAAS models. It maybe doesn't know some concepts but on SAAS models you can't prompt celebrities and you get censor slapped into your face when you try to generate smut. That's even true for Grok who was pretty lewd but is now getting censored more and more every day since normies found on twitter that you can undress thods with it right on their profile and went batshit crazy about it.
>>
File: 1756774900506439.png (3.41 MB, 2261x1131)
3.41 MB
3.41 MB PNG
>>107850585
left is the new version of the vae and right is the old one
>>
>>107850600
Actually huge.
>>
File: img_00254_.jpg (613 KB, 1520x1824)
613 KB
613 KB JPG
>>107850556
ZImageTurbo
>>
>>107850600
kek, how did they not notice this
>>
File: 1755544431186849.png (42 KB, 924x381)
42 KB
42 KB PNG
>>107850585
https://www.reddit.com/r/StableDiffusion/comments/1qbq4mz/updated_ltx2_video_vae_higher_quality_more_details/
>EDIT : You will need to update KJNodes to use it (with VAE Loader KJ) , as it hasn't been updated in the Native Comfy VAE loader at the time of writing this
with this node then? if it's not compatible with native comfyui I guess they changed the architecture a bit?
>>
>>107850600
right is so much better
>>
reposting

I have spent hours trying to get a WAI controlnet looking right, but got very few usable results because loras and tags for each character end up conflicting and creating messy results.

Just to experiment, I tried using Qwen3 Image Edit, giving it reference images for each character and a natural language description of what I want. It produced some very reasonable results, except the original artstyles get lost and replaced with generic anime baked into the model.

But then I took one of those outputs, passed it into a WAI image2image workflow using an artstyle lora, along with all the tags that would be associated with this base image. Results actually came out really well, obviously with some detailing work that needed to be done. Much more consistent than the controlnet workflow.

I was under the impression that image2image wasn't recommended for multi-character compositions, but for me it worked.
>>
>>107850576
>Did the CCP forbid them or what?
Porn is illegal in China.
>>
>>>/wsg/6071705
>>
>hunyuan video 1.5
verdict?
>>
>>107850639
It was already considered dead on release.
>>
>>107850629
>I was under the impression that image2image wasn't recommended for multi-character compositions
That's just incorrect.
>loras and tags for each character end up conflicting and creating messy results.
You have to use regional prompting and lora clip.
>>
>>107850584
love this meme. Now give Yuu a CCP uniform and write "Chinese culture" on it to make it perfect.
>>
File: 1757562481214563.png (69 KB, 951x487)
69 KB
69 KB PNG
>>107850585
>>107850600
https://files.catbox.moe/oi2b95.mp4
it definitely looks more detailled and less slopped, and btw, you can run this new vae with comfyui's native vae loader it still works fine
>>
remember to pull KJNodes if you want to use the new vae, there was a commit 45min ago that you need otherwise you will get a terrible result
>>
>>107850630
But it can't generate porn, and chinese teams also released other image or video models, so why ban this one
>>
>>107850645
>You have to use regional prompting and lora clip.
I was.
Tags and loras applied to each region still conflict with each other. One lora used for one region ends up heavily affecting the artstyle and quality of both regions (and the entire image). Also tags applied to one region sometimes get applied to characters in a different region. For example, character 1 would be wearing gloves, and character 2 would not be wearing gloves, even though only character 2 had the gloves tag.
This is all using the rentry guide workflow, except with "Load Lora" integrated.
>>
>>107850639
it was slightly worse than Wan 2.2, so it's completly useless
>>
>>107850657
https://files.catbox.moe/men7yy.mp4

I'll never doubt you again.
>>
>>107850658
>But it can't generate porn
It can. Or can be made to.
>chinese teams also released other image or video
Chinese fuck things up. Are you new to this planet? Why would you think that they announce the release of a model and then don't. Chinese culture.
>>
>>107850666
>One lora used for one region ends up heavily affecting the artstyle and quality of both regions
Ideally properly made SDXL loras shouldn't do that if using lora CLIP but you're right this is an issue. Regional lora usage actually exists but I've never needed it. https://blog.comfy.org/p/masking-and-scheduling-lora-and-model-weights
>tags applied to one region sometimes get applied to characters in a different region. For example, character 1 would be wearing gloves, and character 2 would not be wearing gloves, even though only character 2 had the gloves tag.
I don't have that issue, try using non-overlapping regions, that's a common issue causing this kind of bleed I think.
>This is all using the rentry guide workflow, except with "Load Lora" integrated.
I don't know what you mean exactly but "Load Lora" does not use Lora clip and therefore causes bleed (fully applies without need for keyword)
>>
>>107850666
That's because the loras apply to the whole latent space and not just the regions, satan.
>>
>>107850685
>Why would you think that they announce the release of a model and then don't.
Black Forest Labs did that with Flux Video

G E R M A N C U L T U R E
>>
>>107850689 (me)
>"Load Lora" does not use Lora clip
I was tripping there, disregard this part it's incorrect.
>>
File: img_00266_.jpg (690 KB, 1520x1824)
690 KB
690 KB JPG
Media Assets panel keeps shitting itself. It stops showing new gens after a while.
>>
File: silent.mp4 (1.1 MB, 548x720)
1.1 MB
1.1 MB MP4
>>107850491
>>107850608
>>
File: 1614090364424.jpg (60 KB, 576x581)
60 KB
60 KB JPG
>>107850711
>>
>>107850692
>Flux Video
It's still in development. They didn't say "next week" or something. They also didnt release a distilled demo or something. Instead they went with Flux 2.
Spot the difference?
>>
>>107850711
most of the plastic skin is gone, that's good
>>
>>107850711
fuck you for removing the text after the first few frames.
>>
>>107850723
>It's still in development.
you're still coping? it's been almost 2 years you know
>>
>>107850732
>anon can't memorize 4 spots
lol
>>
>>107850689
I was actually asking about overlap yesterday but didn't get an answer at the time. If two characters bodies are overlapping in a reference image, should I only mask the visible parts of the character being overlapped? If two arms are overlapping directly onto each other and only a thin section of character 2's arm is visible, should I be chiseling a thin mask around it in the mask editor and be careful not to mask the arm in front(chatacter 1)?

I was under the impression masking in a controlnet isn't required to be precise and that I should just focus on the general region, but I suspect it's not that clear for more complex poses with a lot of crossover and overlap.
>>
>>107850711
why is she smiling on Q6 and not Q8 at the end? did you prompt her to do that?
>>
>>107850744
Nope, all the same prompt:
"The girl sucks juice out of the cup using a straw"
>>
>>107850733
So? They don't have 1.5 billion Chinks in Germany.
>>
>>107850744
Same seed too obviously
>>
>>107850740
fuck you for making me to
>>
SDXL until Earth gets swallowed by the sun growing into a red giant
>>
>>107850740
anon OOM's in real life, cursed gtx 1060 gene
>>
>have a 4090 and 64gb lying around as backup
>5090 in current pc
>a new system would only be 2k usd

I am cockblocked when I gen stuff for work, but if I get a second pc, I'd just use that to gen as well.. Conundrum.
>>
>>107850748
Israel has less people than Germany yet they made LTX2 kek, I guess the aryans are ultimately the inferior race compared to the kikes, kek
>>
File: overlap.png (319 KB, 392x594)
319 KB
319 KB PNG
>>107850741
Yeah, for something like pic related you'll have a lot of trouble giving a glove to the character you want, normally you'd be making a region out of this whole arm + the hand underneath and just tag what you want the pov arm to be, + "holding hands". If you have a region for the girl on her back then the model in most cases should be smart enough to understand that e.g. "tan skin" on the pov arm region would not apply to the girl's hand, but something like a glove, that can be trouble.

You can try making more exact masks (in an image editor perhaps, transparent pixel = mask), but when you do that, you're limited to using higher ControlNet strengths otherwise it's kinda useless.
>>
>>107850740
Context too smal sry



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.