[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107529397

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
>all these 1girls
WE WON!
>>
Wretched thread of mental illness
>>
Blessed thread of 1girl posting
>>
>>107536415
wow, this thread desperately wants to be /adt/ but suck at making kino
>>
https://github.com/Tongyi-MAI/Z-Image?tab=readme-ov-file#-community-works
>SGLang-Diffusion brings SGLang's state-of-the-art performance to accelerate image and video generation for diffusion models, now supporting Z-Image.
another snakeoil or?
>>
>>107536436
>page 7
why don't go there?
>>
>>107536443
sglang is just an engine, it's pretty popular with LLMs (for enterprises, like VLLM)
>>
>>107536443
Chinese culture. Anything but the base model.
>>
>>107536436
>kino
Is that a codeword for pedophilic images?
>>
Arousing thread of 1girl gooning
>>
>>
>>107536443
a z image gen takes like 10 seconds on modern cards. how fast do you need it to be?
>>
>>107536436
>oh no! the niche and hyper specific thread of 3 troons feels threatened when someone posts anime in the god's chosen general
Are you that insecure?
>>
>>107536469
we won't be using turbo forever, lot of steps + cfg (2x slower) will come back on base
>>
>>107536471
you say this but this thread was baked by a dramatic troon
>>
>>
File: z-image omni base.png (53 KB, 369x387)
53 KB
53 KB PNG
Possibly posted here before, I don't check every thread but they seem to have updated their website
https://tongyi-mai.github.io/Z-Image-blog/
Unified ti2 and i2i? Wth does this mean?
I am inclined to believe that they aren't doing a complete rug pull and will release something, but I have no idea what that is shaping up to be.
>>
>>107536436
go back there, why are you here seething?
>>
>>107536498
I like how the team didn't do free advertisement for cumfart
>>
>>107536498
if I understand well, base will be able to do edit while z-image edit will be something finetuned to be really good at that?
>>
>>
File: WHEN???.png (94 KB, 224x224)
94 KB
94 KB PNG
>>107536498
>the base model will be able to do edit as well
you guys have no idea how powerful this shit will be, an unslopped model that can make realistic shit and edit, apache 2.0, small, this is literally the dream model, those fucker brought the fire on me, WHEN RELEASE???
>>
>>107536498
So, if you finetune such model you have to finetune it on both image and edit pair examples?
>>
>>107536519
I am thinking it might mean something like that but it doesn't make sense to beat that into the base model, edit models have a different training loop and why burn money doing that, and degrade finetuning and non-edit use case capabilities of the model, if you are already going to release a dedicated edit model?
>>
>>107536511

>107536471 (You)
>107536436
>Samefag btw

Too tired to take a screencap and then you mention that I used console inspect element
>>
>>107536530
Hehe. You're about to be a victim of Chinese culture.
>>
>>107536498
that's the first time they changed the readme to provide some news about base, it's a big sign it'll be released soon
>>
>>107536537
If that's what they are implying then yes.
>>
>>107536538
maybe they found a way to not kill the edit's capabilities if you only finetune on imagegen, I know that's naive wishful thinking but I like that approach
>>
>>
File: z-image_00156_.png (2.34 MB, 1408x1408)
2.34 MB
2.34 MB PNG
Is it really worth it /g/?
>>
>>107536555
Is it powered by her piss or shit?
>>
>>107536562
yes
>>
File: Wanimate_00145.mp4 (903 KB, 544x960)
903 KB
903 KB MP4
>>
>>107536391
>>107536426
>>107536477
>>107536511
actual samefag btw
>>
>>107536537
>>107536545
I guess it would be possible to finetuning on just images, if you don't mind frying edit capabilities away.
We have no idea what it actually is so it's just a guessing game at this point.
>>
>>
>>107536449
>Anything but the base model.
not the right time to say that kek >>107536498
>>
>>107536498
that's the first time I've heard that base is actually able to do edit, I thought you put the edit capabilitie on a model through finetuning, not during pretraining
>>
>>107536582
I am quietly confident the model will not becoming. You can kekkaroo increasingly nervously as the weeks go by but I just understand the culture better than you.
>>
>he spams reports on different ips
reminds me of that randall (the Jewish snitch) from recess
>>
File: comfydogfart.png (56 KB, 943x433)
56 KB
56 KB PNG
He could've just posted "the inpaint part is currently missing and will be implemented later"

but nooo he had to add his petty opinions too
>>
>>
File: 31313.png (489 KB, 1587x888)
489 KB
489 KB PNG
first time trying 3d. and it made an object good enough for 3d-printing on first try. only one reference image. very impressive
>>
File: 1761923227031455.png (74 KB, 279x181)
74 KB
74 KB PNG
>>107536594
>You can kekkaroo increasingly nervously as the weeks go by
you're the one being nervous, a lot of signs point to an actual release, we're so back
>>
File: image.jpg (60 KB, 630x630)
60 KB
60 KB JPG
>>107536622
Mind sharing workflow?
>>
is cumfartorg and this thread some kind of mental asylum that uses shock therapy to make you gay?
>>
>>107536622
>>107536633
desu, I just use the script from the GitHub. cumfart is too annoying when comfy breaks everything. workflows suck
>>
File: Doom guy.png (319 KB, 445x404)
319 KB
319 KB PNG
>>107536498
>A foundation model designed for easy fine-tuning
That's Chinese Culture speech to say that you'll only be able to finetune through API, trust the doom.
>>
>>107536612
He's probably pissed at having to implement any new models. Just doing it to keep up appearances. Normal ZIT is fucked up too, genning times are all over the place and require a restart after a while.
Probably wishes they would all go API, much easier to handle.
>>
>>107536656
>He's probably pissed at having to implement any new models.
instead of hiring Ui jeets who are obviously doing nothing but pretending to be working by removing stop buttons, he should train them to learn how to implement new models
>>
>>107536498
Actually it seems like they have put this up 10 hours ago but no one seems to have noticed kek
https://github.com/Tongyi-MAI/Z-Image-blog/commit/e67bafb673fa19d301f903ac62de26c48b4cc1c4
If you scroll down there added hints about the difference between the base model and the dedicated edit one? (It has better prompt adherence and it is more creative?)
>>
>>
File: 1755630714880557.png (67 KB, 997x865)
67 KB
67 KB PNG
>>107536498
it's comming, they hadn't touched the blog for 2 weeks, the model is probably finished
>>
>>107536656
he just can't understand new models at all anymore. everything recently has been an improper implementation and actively ooms on dumb shit
>>
>>107536612
Slower = increased cloud cost = less profit
They need to do the needful and retrain it immediately
>>
I feel so conflicted. I want to believe base will be released. It's like being interested in a girl and getting mixed signals.
>>
>>107536436
>post some 1girl, anime in highlights
>entire general is now meaningless
>>
>>107536698
Embrace the understanding of Chinese culture.
>>
File: 1758643090553139.png (2.14 MB, 3202x1422)
2.14 MB
2.14 MB PNG
>>107536666
>If you scroll down there added hints about the difference between the base model and the dedicated edit one? (It has better prompt adherence and it is more creative?)
you can see it on the blog yeah
>>
>>107536703
Can you define this for me? I see people saying it all the time but I don't know what they mean.
>>
File: ComfyUI_00001_.png (1.15 MB, 1024x1024)
1.15 MB
1.15 MB PNG
>>107536498
>>107536666
Anyway here is the 1girl of celebration.
Doomers on their last supply of copium.
>>
>>107536699
some lost /adt/ posts are the most prevalent images made itt
>>
>>107536710
it's not that deep, he's saying that chinese people are snakes and that "chinese culture" is actually the default way of acting for them (lying, cheating and so on)
>>
File: 4444444444.png (97 KB, 293x416)
97 KB
97 KB PNG
>>107536633
I just used this template and dragged in the png. just wanted to try it and it worked much better than expected
>>
>>107536720
I see. Thank you for the explanation.
>>
>>107536721
does it do texture extraction? would be ultrakino
>>
>>107536720
Basically, but they will string you on as along as necessary to achieve their own means at the lowest personal cost to themselves. This is why you need to leave no wiggle room to cheat when doing business. If they can go back on a condition of a deal and get away with it, they will.
The concept of good faith business is a joke to them.
>>
File: Can I hope?.png (228 KB, 640x377)
228 KB
228 KB PNG
>>107536498
>>107536666
>>107536707
I want to believe boys...
>>
You can test with the control_refiner_layers for noise_refiner hints and it's only marginally faster, it's just generally slower because of increased control embedder dimensions and adding hints to more layers, also requires more steps. Probably they did an initial experiment with control_refiner_layers for noise_refiner and found it doesn't work as good as just using control_layers twice, then forgot to remove the code so the untrained weights are in the released checkpoint
Also you have to concatenate zeros to fill the expected dimensions to make t2i work anyway so not implementing inpaint doesn't even make sense considering it's just concatenating the init image and mask instead of zeros
>>
File: ComfyUI_00003_.png (1005 KB, 1024x1024)
1005 KB
1005 KB PNG
>>107536710
>>
>>107536498
Do you think they delayed the release because they wanted to make the base model unified? it's the first time they ever said base would be something like that
>>
>>107536718
Hope your next post is proof of what you are saying, also, why are you shilling your general here if we don't step in yours?
>>
File: 1753105486026549.mp4 (1.45 MB, 720x1080)
1.45 MB
1.45 MB MP4
>>107536744
>>
>>107536756
no I am just ashamed of the quality around here. allowing slopstyle is a farce
>>
>>107536588
did you forget that flux2 can do that? I don't blame you to be desu
>>
File: Wanimate_00146.mp4 (1.3 MB, 528x832)
1.3 MB
1.3 MB MP4
>>
>>107536759
>instabitch makes a prediction
kek, now there's a 100% chance it'll be released
>>107536767
flux 2 isn't a base model though, it's a finetuned model, like Qwen Image Edit
>>
>>107536721
Thanks, will try some anime girl with white background
>>
File: 1750052244981548.png (86 KB, 1781x366)
86 KB
86 KB PNG
>>107536666
>https://github.com/Tongyi-MAI/Z-Image-blog/commit/e67bafb673fa19d301f903ac62de26c48b4cc1c4
>Single-to-Single
as expected, Z-image edit will only be able to handle one image input, but they could do like what they did on Qwen Image Edit, finetune it further to make it able to edit multiple images
>>
>>107536763
And the photorealistic ones aren't also 'slopstyle', why your fixation with anime gens here? Also if anons here post kino anime gens, wouldn't that be taking away the little sense your general has left?
>>
>>107536799
>plastic skin isn't slopstyle
>>
>>107536707
I have meant it in the sense that they have also changed edit model's description, that's why I highlighted the commit
>>107536744
Enjoy the hopium!
>>107536754
Likely yeah.
There aren't too many good explanation why you distill an unfinished model otherwise.
>>107536798
Actually single-to-single got deleted
That means they might already be finetuning the edit model to do that.
But that might be too much hopium.
>>
>>107536823
>There aren't too many good explanation why you distill an unfinished model otherwise.
since turbo can't do edit, it means it was distilled from an early version of base that couldn't do edit as well
>>
>>107536812
There's an entire thread waiting for you, you can go there.
>>
>>107536833
Anything to cope with the fact the base is done and they either cant/won't release it.
>>
>>
File: 1414.png (537 KB, 1582x1105)
537 KB
537 KB PNG
>>107536731
dont see any texture. took four screenshots of a very complex model on sketchfab and it remade it pretty good. at least good enough as a base mesh or to get proportions right in your modelling
>>
>>107536861
Now try an API img 2 3D model and die of hopelessness.
>>
>>107536861
Cool, which other anime girls can do this model too? Can I export it to 3D rendering software? I need 4 different angles, right?
>>
File: Wanimate_00147.mp4 (1.14 MB, 960x544)
1.14 MB
1.14 MB MP4
>>
File: 1736073357907263.png (437 KB, 3802x1518)
437 KB
437 KB PNG
>>107536842
They literally changed the github to say that it's a base model made for finetune and you still don't think it'll be local?
>>
>>107536861
Can do some random SDXL gen >>107536856 to 3d?
>>
>>107536707
No photorealistic tag on base? So will it be shit at realism or better at weebshit? Why would they drop this tag
>>
>>107536897
>and you still don't think it'll be local?
Yes. I do not think it will be release.
>>
>>
>>107536908
you don't expect a base model to look as good as a finetuned model, it's just not happening, if base models ended up like that we wouldn't need extra finetune in the first place
>>
>>107536889
>>107536902
you need four angles for best result. find models on sketchfab for instance and just take screenshots. or you can make one image and use the template for more views based on image
>>
File: Z-image turbo.png (1.45 MB, 1280x720)
1.45 MB
1.45 MB PNG
>>107536744
>I want to believe
billions must believe
>>
>107536912
i am very english too good morning
>>
who is paying ran overtime for being a dramafaggot 24/7?
>>
>>107536666
The training pipeline and model variants were already described like that in the technical report (https://arxiv.org/abs/2511.22699, section 4.3) from its first version in November. Omni pre-training covered both image generation and editing. Both Z-Image-Edit and Z-Image-Turbo (which is actually called "Z-Image" in some parts of the report) branch off from the base model after that stage. The editing variant had more pre-training specifically for editing (section 4.7).

This means there's a chance LORAs trained on base will work on the editing model, but it's not guaranteed.
>>
>>107536957
esl
>>
I really wish base gets released and the porn-niggas that made biglust and lustify fine-tune it. The slop will be so good. I hope and wish.
>>
>>107536979
this. and chinese noob niggas
>>
>>107536951
How did you prompt this style?
>>107536979
For me it is the BigASP guy moving on to Z-Image Base
>>
>>107536979
>I really wish base gets released
I'm more hyped now knowing it can do edit as well, I thought I would have to wait additional months (and the release of z-image edit) before doing that lol
>>
>>107537006
>How did you prompt this style?
I got that style prompt from the LLM rewriter >>107531631
>>
>>107536958
Animanon = Debo
Ran = Debo
It's all the same faggot
No other pathetic fuck wakes up every day, sees his dead general with the same Avatartroons himself despises but pretends to like them, then seethes and starts shitposting in other generals
>>
File: kewl.png (1.37 MB, 832x1216)
1.37 MB
1.37 MB PNG
I'm mostly happy with how my latest lora turned out
>>
>>107537044
Poor guy, you have to understand him, he hit the jackpot with the worst avatarfags from /g/
>>
File: Wanimate_00149.mp4 (2.41 MB, 960x544)
2.41 MB
2.41 MB MP4
>>
>>107537044
ani is a real person. comfyanon and catpissanon met him irl. I don't think he shits up the thread but mentally unhinged schizos try to make it appear otherwise
>>
>>107537061
What is the theme of the lora, retro? SDXL or ZiT? I want to use it!
>>
Is it possible to generate 16bit pngs in comfy for massive dynamic range, or is the gen itself limited in a way?
>>
When will the onetrainer nigga update his shit
>>
File: 1737924715806510.jpg (525 KB, 2473x1452)
525 KB
525 KB JPG
>>107536498
https://arxiv.org/pdf/2511.22699
I have a question though, what version of base will we get? the one on the far left?
>>
File: yallJEALOUS.png (65 KB, 671x287)
65 KB
65 KB PNG
>>107537102
surrrre
>>
File: qwen_edit.png (645 KB, 857x655)
645 KB
645 KB PNG
>>107537102
Sorry Debo, it must be rough watching time pass and seeing your justice league of avatarfags stay the same, and even worse, they don't improve, like they have some kind of mental illness, right?
>>
>>107537145
https://github.com/FizzleDorf
here is ani's GitHub. show us yours
>>
File: woman__.png (2.18 MB, 1024x1024)
2.18 MB
2.18 MB PNG
>>
File: erm.png (2.11 MB, 832x1216)
2.11 MB
2.11 MB PNG
>>107537103
It's based off of 100 illustrations by an artist named systemst91
The more I use it, the more I realize it's pretty flawed, and might need to be remade
I might not have the skill (or, let's be honest, the patience) to make a lora that is actually worth uploading anywhere
>>
>>107537061
link?
>>
>>107537188
bruh >>107537182
>>
>>107537182
What model is this? If it is Noob show me training settings and I might suggest some changes.
For SDXL especially, bad anatomy can also stem from "confusing" images in the dataset as well though.
>>
>>107537182
You can always share your failed lora gens in the official /Stable DiffsuionTM general/ , I'm sure they'll be above average there.
Right? >>107536958
>>
File: 00037-2881979531.png (1.94 MB, 832x1216)
1.94 MB
1.94 MB PNG
>>107537205
I'm probably going to get laughed out of the thread for using a model that is universally regarded as shitty, but...
I trained the lora on Illustrious Hassaku
>>
>>107537182
if you are using noob vpred it just looks like this. try Wai or plantmilk
>>
File: file.png (2 KB, 159x40)
2 KB
2 KB PNG
Imagine being so far up your own ass that you add an entire thing called "broken" to your code instead of just not using the part you know isn't used
>>
>>107537225
>Hassaku
based, Ikena is an honest dev
>>
>>107537225
>Illustrious Hassaku
why? you are supposed to train on base illustrious then it's compatible with every other ill model
>>
File: ComfyUI_01773_.png (1.26 MB, 1440x816)
1.26 MB
1.26 MB PNG
>>
>>107537225
Training on specific checkpoints can be good if you want to squeeze maximum quality from your lora at the expense of compatibility with other checkpoints. Though training on shitmixes come with the same caveats of using shitmixes.
You want to train on a base model like Illustrious XL v2 or better Noob v-pred v1 for SDXL anime.
Though you likely have fucked up some parameters or have too weird images in the dataset to mangle anatomy that much.
>>
File: file.png (7 KB, 354x112)
7 KB
7 KB PNG
Even if they retrained it, unless they changed something else, you'd have no way of toggling broken
>>
>>107537231
Ignore this troll.
>>
Ah, I understand now. I guess my stocking lora sucked cock because i stopped it at under 3,000 steps. This is my personal look for a (realistic) peach at 3,000 steps i let train while i slept.
last time i tried this, it failed to learn certain aspects of her attire. This time i went in with a more diverse dataset (still 20 images) and on top of that anon's suggestion of keeping all the training settings at default, it trained 100% of the character and i can change all of her attire.
oh and i can strip her nude because of the dataset, and the nudity accuracy is like 99% there.
yeah z-img turbo is really as good and trainable as everyone says. Damn. Base and edit are gonna light this scene on fire. 10/10 do recommend giving it a shot.

>>107537115
onetrainer is fucking DEAD nigga you're gonna have to get ai toolkit.
>>
>>
File: zimg_0007.png (2.09 MB, 1080x1440)
2.09 MB
2.09 MB PNG
necroresponse, someone asked about training a lora on 512px, the likeness is not bad (zoey luna). it's kinda weird that i can get results of 750 steps
>>
>>107537166
ran doesn't do anything constructive to the thread or society so of course she doesn't have one
>>
File: ComfyUI_00005_.png (2.02 MB, 1024x1024)
2.02 MB
2.02 MB PNG
>>107537284
I was gearing up to train on turbo but now I think I will just wait for base
>>
>>107537145
What is this screenshot supposed to be about?
>>
File: derp.png (1.61 MB, 832x1216)
1.61 MB
1.61 MB PNG
>>107537238
cuz I'm a retard and didn't know that
>>
>>107537305
I would still wait for base, it's not perfect. Just because it's trainable doesn't mean it's as good as it can be. Would be great if i can get it to not force the style of the dataset too but that may be a flaw of training a distilled model.
>>
>>107537284
What training settings?
>>
>>107537234
Why are people hating comfy for his behavior here?

There are way too many fucking grifters that are trying to capitalize on the Z-Image hype at any cost. A different team inside Alibaba itself trained a controlnet for Z-Image, on the distilled model (?), not once, but twice and released just a few days apart. The second version has a literal blatant typo that runs part of the model wrong, but of course being an ML model it will adapt to whatever you trained it with even if suboptimal. It is clearly broken and comfy just made the code handle that case explicitly and called it out.

Z-Image has an epidemic of shitty loras, shitty controlnets, shitty half-assed everything that people are rushing out because they want to jump on the hype train.
>>
>>
File: zimg_0010.png (2.04 MB, 1080x1440)
2.04 MB
2.04 MB PNG
>>107537295
512px isn't terrible just kinda flat on the detail i guess. 40 mins on a 3090 (1500 steps), i might push this to see if i can actually train a likeness lora on 750 steps, 512px in 20mins
>>
>>107537295
>the likeness is not bad
Flux vae preserves details a lot better even at low res, so the model can actually learn the likeness.
It still looks desperate or Indian to train at 512p though. (Not that I should judge too much as a vramlet, but that doesn't make it untrue)
>>
>>107537307
ani is more respected and talented than tRan which is why she has frequent melties and spitebakes
>>
>>107537324
Forgot your avatar image
>>
File: gggsgweweg.jpg (2.54 MB, 5000x3042)
2.54 MB
2.54 MB JPG
There's no alternative to seedvr2 is there? tiled upscaling with zit itself?
>>
File: pees.png (1.64 MB, 832x1216)
1.64 MB
1.64 MB PNG
>>107537267
This is helpful to know!
So, I'm guessing that a small dataset with very clear images is probably going to produce much better results than a large dataset with a lot of clutter in each image?
>>
>>107537324
>Z-Image has an epidemic of shitty loras, shitty controlnets
calm down that model is less than 2 weeks old, let people master this architecture
>>
>>107537324
Newbie was a lumina tune and comfy should be labelling his own code broken considering people are still memory leaking zit
>>
>>107537323
defaults as i said.
-open ai toolkit
-change paths and lora settings as needed
-change steps as needed
-start.

>>107537295
>>107537328
yep that was me. Still blown away you could even train a lora at all with that resolution. Picrel was trained at 1024.
>>
File: 1738340391810037.png (1.84 MB, 1271x1347)
1.84 MB
1.84 MB PNG
>>107537121
even the Z-image devs are shilling rewriting your prompts into a boomer prompt with LLMs
>>
>>
>>107537331
trvke
>>
>>107537342
For SDXL, for style loras, you want a lot of images typically.
Quality over quantity route works for character but SDXL struggles to learn style without too much noise from a small dataset. (Can still be done with knowhow and luck but you don't have the former.) Your initial 100 mark is good enough. I guess you can remove some low quality images but don't remove more than a few.
>>
what happened to the vae replacements? how come the new models don't use them?
>>
File: ComfyUI_01774_.png (1.09 MB, 1360x768)
1.09 MB
1.09 MB PNG
>>
>>
>>107537423
when are you releasing the Yakub ZiT lora?
>>
>>107537382
NTA I agree, 100 images is great for IL style loras. However I've had to use datasets that only had like 30 images and managed to get pretty decent results so don't be discouraged if your artist doesn't have a lot of art online or something.
>>
>>107537414
Do you refer to lodestone's claims of about pixel space diffusion?
His model is yet to (and not going to) converge into anything worth a damn to convince anyone outside of his discord.
And while not perfect flux vae is good enough in terms of quality.
>>
File: ComfyUI_01775_.png (1.04 MB, 1360x768)
1.04 MB
1.04 MB PNG
>>
cumfartorg is simultaneously toxic positivity culture and toxic corpo culture that reached the boiling point with a garbage ui library they doubled down on. 2026 really is going to be the year it all falls apart
>>
>>107537453
Yeah, whatever sad schizo
>>
DRAG AND SHOT
>>
File: Cefurktuber.png (924 KB, 1011x889)
924 KB
924 KB PNG
>>
>>107537447
no just the papers that came out a while ago. supposedly the replacements are much lighter on vram, faster and reduces noise in the output (higher quality)
>>
>>107537463
Inquiry if i may,
What if comfynigger drug and shot you?
>>
>>107537465
4h33m51s

proper documentation my furkan
>>
>>107537465
I hope this gets memed into reality. furk is a great storyteller I'd love to have my kids listen to for what these times were like
>>
File: cute couple.png (1.08 MB, 1024x1024)
1.08 MB
1.08 MB PNG
>>
>>107537486
get tf outta here with that slop

sdg is that way
>>
>>107537472
>papers that came out a while ago
Well there a lot of these.
Link to which ones you are talking about?
>>
>>107537465
>"he made a deal with the Jewish devils"
>"big mistake"
>>
File: gggsgweweg2.jpg (3.1 MB, 8000x4866)
3.1 MB
3.1 MB JPG
>>107537339
I don't know why I didn't think that tiled upscaling would work. Then just top it off with seedvr2.
>>
>>107537486
>>>/g/adt/
>>
>>107537495
>>107537508
What's wrong with it
>>
>>107537496
there were quite a few but this one came out early
https://arxiv.org/html/2510.15301v1
there are a lot looking into it but I can't link them all
>>
>>
File: ComfyUI_01776_.png (1.12 MB, 1360x768)
1.12 MB
1.12 MB PNG
>>
>>107537517
Maybe you are a newfag, but there is a specific anime general thread for that. It will be better received there than here.
>>
>me waiting for OneTrainer to implement ZiT
>>
why is he samefagging again
>>
>>
>>107537508
It's not anime though
>>107537518
Let me skim through.
But judging by 17 Oct 2025 release date I would say it is too new even if correct and worthwhile. Probably would take a few more months until it gets used in any finished model.
>>
>>107537166
How does that prove that trAni is not an unhinged faggot and should fuck off forever?
>>
File: Z-image turbo.png (1.94 MB, 1280x720)
1.94 MB
1.94 MB PNG
>>
>>107537545
it's interesting they tested on sdxl. we might see it get life support in some new ill-like model which I would be fine with. unet is still a great arch that still needs to be explored
>>
>>
File: ComfyUI_01777_.png (1.19 MB, 1360x768)
1.19 MB
1.19 MB PNG
>>
>>107537557
ani works at contributing and sharing his work with others. you shit your diaper and screech in the thread every day. I wonder which anon people want around?
>>
>>107537583
you forgot to mention ran, sonic and ben10 "anon"
>>
>>107537583
trvth nvke
>>
File: random chinese woman.png (1.27 MB, 896x1152)
1.27 MB
1.27 MB PNG
>>
>>
File: zimg_0012.png (2.07 MB, 1080x1440)
2.07 MB
2.07 MB PNG
>>107537353
it's an interesting experiment, for higher concepts i think you need a ton of steps, i think refining something the model already knows is very fast. i couldn't get a new concept in there with the default settings, i had to really change the LR, etc
>>
>>107537583
ani jacks off to shota and waits for people to implement basic features in his wrapper more like
>>
File: ComfyUI_01778_.png (1.2 MB, 1360x768)
1.2 MB
1.2 MB PNG
>>
NO you don't understand, everyone at alibaba's internal team and the reviewers on their public repo missed the TYPO, no way it was done on purpose, can't be, I'm right and EVERYONE ELSE IS WRONG
>>
File: Z-image turbo.png (1.53 MB, 1280x720)
1.53 MB
1.53 MB PNG
>>
>>107537626
link to the shota collection?
>>
>>107537626 >>107537583 >>107537557
Did you know Ani from AniStudio (Ani from Anime) is a /adt/ regular? You should post and discuss his stuff there instead, it's more relevant to that thread!
>>
>>107537640
Kek
>>
>>107537626
ani sounds pretty based ngl
>>
File: demo2_00006_ copy.jpg (3.01 MB, 4480x6000)
3.01 MB
3.01 MB JPG
How could flux blunder so hard? China nr 1.
>>
>>107537667
sorry but animanon is in the OP so it's relevant to the thread
>>
>>107537583
based
i wish schizo just stopped harassing anons who actually try to contribute. ani is the good guy here
>>
When will I be able to create 3D characters and 3D clothing/objects from generated images in a simple pipeline, then manage them in a DAZ3D-like editor and use these characters, clothing, backgrounds, and poses to auto create control nets + prompts for Z Image finetune?

I mean, can't we replace the writing somehow and make the generation process more playful?
>>
>>
>>107537632
are you 12? you seem to have a lot of free time
>>
>>107537694
bad idea since the topology is bad for anything that isn't a static object
>>
>>107537694
2 more years. 3D models are the final frontier of this tech to be honest, there's far more complexities going on there than with 2d.
which is why you never hear artists crying and pissing and shidding themselves about the 3d modeler jobs, no one cares about them.

(remember the industry hated them first)
>>
>>107537685
Sorry but his UI is focused on Anime, and there is a specific Anime general for that, you have to talk about him there.
>>
>>107537640
Comfy's ego is so high he genuinely believes that yeah
>>
>>
File: church.png (1.5 MB, 1216x832)
1.5 MB
1.5 MB PNG
>>107537678
Baby making sex
>>
File: ComfyUI_00179_.png (2.21 MB, 1520x1040)
2.21 MB
2.21 MB PNG
bet-
a57 is alight
>>
>>107537708
I think you interpreted my thoughts differently than I intended.
Auto Remesh that it's a few hundred/thousand instead of 1 million polygons and you're done. I don't see how topology would play a role here in any case.
The 3D model is more for visual feedback, as what you currently have in the prompt, and gives you a control net. You are also welcome to separate the character and background.
>>
>>107537771
thank you for convincing me that the dpm samplers are overhyped and not alight
>>
File: ComfyUI_01779_.png (1.28 MB, 1360x768)
1.28 MB
1.28 MB PNG
>>
File: three women.png (1.41 MB, 1216x832)
1.41 MB
1.41 MB PNG
>>107537771
I am a dpmpp_2m beta guy.
Though that causes issues for zit sometimes.
So I am using euler ancestral ddim unifrom for now.
No idea what I will use when base releases because I hate ancestral samplers.
>>
File: lol lmao.jpg (439 KB, 1600x896)
439 KB
439 KB JPG
>>107537794
>No idea what I will use when base releases
>>
File: Z-image turbo.png (1.63 MB, 3096x1527)
1.63 MB
1.63 MB PNG
Way closer than I expected lol
>>
>>107537777
the poly count isn't the problem, it's how the topology works with skinning+rigging. generated 3d models are already retopo'd but it's just evenly spaced quads which will just look like clipping garbage when it's used in a rig. some edge cases might be static objects like a hair ornament or a belt buckle but for the character model itself it's terrible other than for 3d printing or maybe a 3d statue in a scene
>>
>>107537794
res2m seems to be the go to sampler, with beta/beta57 scheduler.
>>
>>
>>107537803
Can you share the whole prompt for the options thingy?
>>
>>107537725
>3D models are the final frontier
world models are because it removes the need for 3d altogether
>>
File: img_00004_.jpg (558 KB, 1332x1776)
558 KB
558 KB JPG
>>
File: ComfyUI_00030_.png (1.4 MB, 832x1216)
1.4 MB
1.4 MB PNG
Convenient censorship when you didn't ask for it.
>>
File: Z-image turbo.png (1.38 MB, 1280x720)
1.38 MB
1.38 MB PNG
>>107537815
Sure
https://files.catbox.moe/nvl4h1.txt
>>
File: ComfyUI_00185_.png (2.19 MB, 1520x1040)
2.19 MB
2.19 MB PNG
>>107537794
>switched to dpmpp 2m without changing the prompt
>it removed "sde" from the text
bros?...
>>
>all this 1girls
Mistress /ldg/, can this worthless anon coom?
>>
>>107537803
Does it run local models is it hardwired to HF repos? Fuck the cloud nodes.
>>107537834
Can you try i2i upscale if the beta scheduler is still blurring it?
>>
>>107537843
DON'T CUM!
>>
>>107537853
>Does it run local models is it hardwired to HF repos?
it only runs on local models, you put your gguf on a folder and you're good to go
https://github.com/BigStationW/ComfyUI-Prompt-Manager
>>
>>107537834
Lol
Also the excessive grain you see in the image is the problem I was referring to.
No idea why that happens with zit, it works fine on many other models.
>>
File: lel.png (529 KB, 3040x1255)
529 KB
529 KB PNG
https://emma-umm.github.io/emma/
>We did it SAAR we beat Bagel!
Really? In front of my end of 2025?
>>
>>107537870
>No idea why that happens with zit, it works fine on many other models.
since they only trained on real data, there was probably a lot of compressed jpg images in there
>>
i have a text encoder in this format

qwen_2.5_vl_7b_fp8_scaled.safetensors
its a 9gb file

when i try to find abliterated versions of qwen 3 vl 2b or whatever, i cannot find a single file, instead there is the whole folder with configuration files and what not

what is the difference, and how do i make use of the verion that has multiple files in a folder in comfy ui? right now i load the qwen_2.5_vl_7b_fp8_scaled.safetensors using load clip node

i want to use it with qwen image edit
>>
>>
File: img_00008_.jpg (935 KB, 1332x1776)
935 KB
935 KB JPG
>>
You are using the correct text encoder?
Why do you want abliterated or 2b. The latter won't work at all, the former will give worse results, since the model wasn't trained on it.
>>
>>107537976
Cute!
>>
>>107537992
99% chance a coomer said it'll make better booba
>>
>>107537992
Meant to tag >>107537911
>>
File: samurai child.png (1.37 MB, 832x1216)
1.37 MB
1.37 MB PNG
If ZIT can figure out that she has to close her eye because the scar is going through her eye, not some random part of her face, it would be great.
>>
>>107537992
>>107538007
yes better bob and vagene

so its not like ill get better results with say qwen 3 4b vl ect instead of qwen_2.5_vl_7b_fp8_scaled.safetensors? im new to local ai
>>
base more like a never-ending maze
>>
>>107538034
try prompting it better. like, boomer prompt. >>107537355
>>
>>107537257
>>107537423
>>107537448
>>107537525
>>107537632
>>107537787
agartha needzs your powa frien
>>
>>107537803
Is this simply zit? It understands regional prompting like that?
>>
>>107538037
No it doesn't give better bob and vagene sir.
The text encoder isn't censored, the diffusion model can't draw bob and vagene, because it simply wasn't trained on bob and vagene.
Qwen image (edit) isn't really a coom model. There is no coom edit model yet. There are some loras for flux kontext and qwen image edit that you might use (most got jannied so you need to use civarchive) but they don't work well. Might get something usable through seed lottery though.
There are some API models that can do okay bob, but that's outside the scope of this general.
>>
so it seems, from my testing, if you can train your character lora with some nudes, absolutely do it. it 100% fixes the lack of titty training in turbo provided you gave it enough steps to work with.
down to the color of the nipples trained with that character, even. very nice.
oh and pubes.
>>
File: img_00012_.jpg (787 KB, 1332x1776)
787 KB
787 KB JPG
>>107538001
tyytyy
>>
>>107538078
it's using a visual LLM to rewrite your prompt and describe your characters from the image input, then with that prompt you put that on ZiT >>107537868
>>
File: zimg_0018.png (2.07 MB, 1080x1440)
2.07 MB
2.07 MB PNG
20 minute loras aren't great, but they sure as hell aren't bad, no wonder civit is full of low effort crap
>>
>>107538078
Pretty much any semi-decent text encoder (t5, qwen) understands regional instructions. (Though even they will occasional blend stuff.)
CLIP of SDXL and before days couldn't, because it is simply too retarded not to blend concepts from different regions together.
Flux, chroma, qwen image, etc. can all do this, nothing special to zit.
>>
>>107538111
20 minutes on which GPU? 5090?
>>
File: zimg_0023.png (2.18 MB, 1080x1440)
2.18 MB
2.18 MB PNG
>>107538133
a 3090?
>>
>>107538108
>>107538122
I'm doing llm for zit already, but was unaware of the regional thing, cool.
I'm gonna have fun with that node, thanks.
>>
>>107538139
no fucking way you can train a lora in 20min even on a 3090... capppppp
>>
>>107538139
I am curious what shortcuts you used to get it converge into something halfway usable that fast on a 3090.
Mind sharing your training settings?
>>
>>107538167
20min is pretty low but still XL is realyl fucking slow, more modern models learn way faster
>>
>>107538182
wait thats an SDXL lora?
>>
>>107538139

bruh WTF my output of the IRL I trained on was so dogwater compared to your lora and gens.
>>
File: z-image_00824_.png (1.58 MB, 1152x2048)
1.58 MB
1.58 MB PNG
>>107533983
use a black image and lower denoise a bit
>>
File: Untitled.png (98 KB, 1224x697)
98 KB
98 KB PNG
>>107538167
this is literally the whole point of the experiment to see how shit my training can be

>>107538174
- ai-toolkit
- z-img default settings with training adapter v2
- rank 64
- 18 images cropped square (various resolutions)
- no captions
- trained at 512
- 750 steps
>>
>>107538188
no I meant in comparison. XL is really slow when compared to newer models when it comes to training
>>
>>107537803
question, can this be done without hooking up a thinking LLM to my regional prompting workflow? what node do i use instead of that prompt generator?
>>
>>107538092
i was able to denude quite a few celebrities with 2509 though
although nothing amazing for coomerbrains, works for me (for now)

but i was just wondering what were the differences, theres a billion models out there
>>
>>107538205
Interesting. I thought it needed 3k steps or so.
>>
>>107538205
>356W
Just an idea if you want to experiment but I remember reading some studies that said you could undervolt your card by like 60% and get very minimal impact on inference speed.
>>
>>107538255
>theres a billion models out there
Irrelevant for you.
Diffusion models only work with the text encoders they were trained on.
>>
File: img_00018_.jpg (676 KB, 1332x1776)
676 KB
676 KB JPG
>>
>>107538255
qwen will give generic parts but you can always feed the image back into a diffusion model for inpainting after.

SAM3 can detect tits easily and for a vagina you can say "mouth".
>>
>>107538299
>and for a vagina you can say "mouth".
sounds about right.
>>
>>107538205
interesting. Im trying to make a body type lora, of this ferraira woman, but even at 3k steps, nada. fuckall results.
>>
>>107538299
lmao patchworking tits and vagene in
>>
Is there a node that uses Qwen 3 4b to refine the prompt before passing it to z-image? Seeing as you have to load Qwen 3 as the text encoder it seems like the slowdown wouldn't be too bad. I think it would work well for creating cohesive scenes with wildcards.
>>
File: Untitled-5ffffffffff.jpg (387 KB, 4800x1792)
387 KB
387 KB JPG
>>107538197
It's either on or off for me. Left is .9 denoise, right is .91.
>>
File: ComfyUI_temp_lufha_00004_.png (2.93 MB, 1280x1600)
2.93 MB
2.93 MB PNG
>>
>>107538343
lmao
>>
File: 1752428534985169.png (2.21 MB, 1120x1440)
2.21 MB
2.21 MB PNG
>>
File: ComfyUI_temp_lufha_00005_.png (2.87 MB, 1280x1600)
2.87 MB
2.87 MB PNG
>>107538343
LMAO, just released that the model generated indian men pissing on the background, I didn't prompt that, based Xi, I kneel
>>
File: ComfyUI_temp_lufha_00006_.png (3.19 MB, 1280x1600)
3.19 MB
3.19 MB PNG
>>
>>107538359
>>107538343
Needs more indians around her. Totally takes me out of the immersion with so few trying to take pictures and demand the bobs and vagene for the pay cards, saar.
>>
>>107537808
yeah res2m seems to work pretty damn well in zit .. i usually use uniform_pc for the scheduler with it
>>
>>107537803
this is very very cool

>>107538259
>>107538307
really depends how many images you are using for your dataset and the quality of them, along with your captions. if you're using default settings it shouldn't be absolutely nothing.
>>
File: zit_kmshiftest.jpg (285 KB, 3064x1024)
285 KB
285 KB JPG
>>107537870
>excessive grain
if i understand thing right shift is essentially a control for when zit shifts from low noise to high noise. the "grain" is high noise, try lowering shift. I haven't extensively tested shift values against samplers and schedules but it definitely has an impact on shitty skin texture graininess.
>>
>be trani
>see MIT licenced work with over 40 contributors
>"wait, comfy became a winner when he made an ui!"
>lightbulb.png
>vibecode a wrapper that barely works and even misses trivial features
>"now is my time to shine thehe~"
>closes shota folder
>injects the last dose of hrt juice
>slap a commercial licence on top of the MIT licenced stuff
>"it's basically as if i build everything myself, i'm such a genius"
>"they will never make fun of me again thehehe~"
>release it and spam all threads for months
...
>no one cares
>>
File: img_00022_.jpg (893 KB, 1352x1776)
893 KB
893 KB JPG
>>
>>107538421
maybe this trani person will stop doing whatever it is you don't like if you don't bring them up out of nowhere
>>
>>107538415
I'll tell u exactly how many and what kind and captions and shit.

40 images, head cropped out. insta size mostly. ie 1350.
captioned with body description.
>>
File: 4928.png (2.4 MB, 1310x1310)
2.4 MB
2.4 MB PNG
>>
>>107538205
>- rank 64
Ism't that too strong for Zit? You already run loras at sub 0.50 because of the distortions when having 2+
>>
File: ComfyUI_temp_lufha_00018_.png (2.71 MB, 1088x1856)
2.71 MB
2.71 MB PNG
>>
Please halp. Nvme drive loads clip SLOWER than my ssd drive. It takes like 5+ minutes to load wan where on sdd its almost instant..

I have the same setup on my regular ssd crucial drive and nvme m.2 drive (which is supposed to be faster), same speed boosts and models. Only difference is, there's more nodes on my nvme drive.
>>
File: ZiMG_01375_.jpg (611 KB, 1344x1728)
611 KB
611 KB JPG
gday fellas
>>
>>107538441
don't caption the body, caption everything except the body. you're basically telling the trainer that it has to learn everything else.

ie. if i want to train a body i want to caption the setting, the clothes, the jewelry so it learns only the body and not that stuff.

try again with no captions.
>>
>>107537803
Welp.
>>
>>107538435
>maybe this trani person will stop doing whatever it is you don't like
if you mean existing sign me up champ!
>>
>>107538508
you have to put the mmproj next to the gguf you tardburglar
>>
>>107538497
Hm ok, ill try once more with no captions at all.
>>
Cute wannabe butthurt schizo
>>
>>107538517
Pretend I'm retarded.
>>
>>107536612
Well, he is corrrect. Not surprising since this Z-Image control net is not from the guys who makes Z-Image.
>>
>>107538529
Now check the console. It must start the same as the model itself and end with "mmproj"
>>
File: ZiMG_01382_.jpg (514 KB, 1344x1728)
514 KB
514 KB JPG
>>107538490
>>
>>107538477
maybe your nvme is kill
>>
fresh bread
>>107538552
>>107538552
>>107538552
>>
>>107538541
damn that's great, gj anon
>>
>>107536897
Also they say it's a community model over and over, the Flux 2 shills are desperately trying to pretend this won't be released.

What's even more hilarious is that the chinks will release this great undistilled model before BFL can get their shitty small distilled Flux Klein model out.

BFL is dead, they peaked at Flux dev which was ok for art styles but 100% slopped humans and super censored.

Thanks for playing.
>>
>>107538541
kino gen
prompt?
>>
>>107536908
A Base model that is primarily made to be further trained on should have as little aesthetic and caption bias as possible.

It should have strong foundational knowledge of practically every concept, and then people can finetune the model to be extremely good at particular concepts, anime, NSFW, art styles, etc.
>>
>>107538508
you have to read this
https://github.com/BigStationW/ComfyUI-Prompt-Manager?tab=readme-ov-file#image-inputs
>>
>>107538107
catbox/prompt for this style please?
>>
>>107538312
>patchworking tits and vagene in
It's actually a good technique when you want a specific boob shape. it's really hard to prompt for a slim body with massive tits or a fat body with tiny tits.

And for some reason boob tags massively affect the way a face will look.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.