[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


A Giant Wake Up Call Edition

Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107374545

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe
https://github.com/ostris/ai-toolkit

>Z
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
https://comfyanonymous.github.io/ComfyUI_examples/z_image/

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
File: DONT STAY CALM.png (59 KB, 440x396)
59 KB
59 KB PNG
ARE YA READY???
https://www.youtube.com/watch?v=xb2fjZa_L74
>>
Chroma..
>>
>>107375744
inb4 untrainable 30b blob
>>
>>107375744
Nothing to be ready for, until the danbooru fork drops I've got nothing to do with Z
>>
File: 1763396847622155.jpg (37 KB, 948x699)
37 KB
37 KB JPG
>>107375744
>i guess
>>
File: 1747331540355386.png (116 KB, 339x380)
116 KB
116 KB PNG
why arent these included in training data? a close up of a pussy i can understand, but why not teach it what a nipple look like?
>>
>>107375744
in my experience, teased releases never happen. AceStep 1.5 will likely never get released. It's on the "roadmap".
>>
>>107375759
Nipples look fine 90% of the time.
>>
File: what.png (248 KB, 860x709)
248 KB
248 KB PNG
>>107375744
>before this week end
>we are already in the week end
>>
>>107375759
Nipples don't exist, it's an oversexualized fantasy
>>
comfy should be dragged out on the street and shot
>>
>>107375759
otoh it's useful to have models that are ignorant of certain things, like you know they won't produce certain things.
>>
File: OH MY GOD IM.gif (1.53 MB, 480x360)
1.53 MB
1.53 MB GIF
>>107375744
>>
File: Mog of the year.jpg (2.77 MB, 4626x1971)
2.77 MB
2.77 MB JPG
Remember that Z-Image Turbo's text encoder can handle up to 256k tokens, so don't hesitate to yap it likes it lool.
>>
File: s-l1200.jpg (344 KB, 900x1200)
344 KB
344 KB JPG
>>107375744
I hope they won't do this.
>>
>>107375767
the end of the weekend
>>
File: 1760025741273159.jpg (170 KB, 590x759)
170 KB
170 KB JPG
>>107375776
>implying i need more than "a photograph of a sexy woman"
>>
>>107375744
Base on what
>>
>>107375790
out of 10
>>
File: 2511.png (15 KB, 449x293)
15 KB
15 KB PNG
QIE 2511 should be coming in the next 12 hours if it hasn't been cancelled
>>
>>107375759
In zimage? I've seen proper nipples.
It's flux 2 that has no idea mammals have nipples.
>>
>>107375801
>QIE 2511 should be coming in the next 12 hours if it hasn't been cancelled
I have a feeling the 2 teams will release their models at the same exact time just for the fun of it lmao
>>
>>107375801
"this weekend" means "next weekend" in mandarin
>>
File: file.png (35 KB, 464x494)
35 KB
35 KB PNG
i've joined the z club
>>
Reminder that it is Chinese culture to go back on promises if doing so will cause enough emotional pain.
>>
>>107375822
>fp8
poorfag
>>
>>107375818
2511 means November 2025 anon
>>
File: file.png (2.5 MB, 1024x1552)
2.5 MB
2.5 MB PNG
>>107375744
>>
>>107375826
i am in your club, theres nothing you can do about it
>>
>>107375822
>stable_diffusion
that's lumina 2 no?
https://comfyanonymous.github.io/ComfyUI_examples/z_image/
>>
File: 1759912877895480.png (558 KB, 705x573)
558 KB
558 KB PNG
>>107375805
>I've seen proper nipples.
>>
>>107375844
i have no idea what im doing, im just following along this video

https://www.youtube.com/watch?v=itOSk0woXo8
>>
>>107375852
just download his workflow (his image) and load it on comfyui anon >>107375844
>>
>>107375850
I remember one time I hooked up with this girl and her nipples were no shit as long as my thumb. Like she had literal udders.
>>
File: 00017-790582.png (2.74 MB, 1536x1280)
2.74 MB
2.74 MB PNG
>>
>>107375822
Better off running the Q8
No i will not elaborate
>>
>>107375864
i downloaded both i ill try that next
>>
>>107375744
what if the loras trained on base don't look any better than the ones trained on turbo
>>
>>107375842
You're in our lobby.
>>
File: 1746193676702888.jpg (481 KB, 2048x2048)
481 KB
481 KB JPG
This thing is cool as hell.
>>107375876
I'll cry
>>
>>107375873
>3 toes
in any case, mind posting workflow or is it just the default?
>>
>>107375880
same difference
>>
>>107375873
lord have mercy.
>>
how do I use torch compile on clip in comfyui?
>>
>>107375873
>nipples are more than fine
i think they trained on infected pimples instead
>>
You can tell z image is going to be big because all the SDXL vramlets who have basically had to change nothing for the last two years are suddenly coming out of the woodwork asking retard questions.
>>
>>107375873
>the WAN 2.5 rugpull traumatized the AI community so badly kek
well, it's the same company that rugpulled us with wan 2.5 that is supposed to give us that base model so...
>>
>>107375780
:)

the tit adder is coming, you can't stop it.

>>107375837

oh no it fluxes the backgrounds
>>
>>107375898
wait for the sd1.5 neanderthals
>>
>>107375863
dog. a new fantasy king has arrived.

I only see a modest amount of toe and hand issues (bottom left foot, redhead's right hand). it gets the prompt.
>>
>>107375898
Z-Image is more realism focused. Unless we get an anime fine tune similar to illustrious/noob, then they won't be migrating.

I don't think illustrious is going to abandon SDXL until 3.5 gets funded(never happening)
>>
>>107375898
>implying there's anything better than SDXL
Unless you're just prompting realistic stuff in which case your gay
>>
File: I mean...png (466 KB, 720x720)
466 KB
466 KB PNG
>>107375906
>wait for the sd1.5 neanderthals
keek, I thought they died of old age
>>
>>107375916
Hasnt anime already been perfected?
>>
>>107375915
that was with that extended promt from >>107375634 the first attempt looked like this >>107375678
>>
>censored
>can't change camera angle
>understand only basic poses
and they said z img is good
>>
File: zturbo.jpg (2.49 MB, 3072x3072)
2.49 MB
2.49 MB JPG
First test training of a Z-Image Turbo lora, used Diffusion-Pipe and good old adamw 0.0001 LR for 100 epochs

You can most likely finetune the settings more, but the result was good (hopefully you can see who it is!), particularly for training on a distilled model like Turbo and only training at 512 pixel resolution

Training was really fast, ~50 minutes for 25 images * 100 epochs on a 5060 Ti 16gb

Anyway, here's the lora if anyone wants it: https://files.catbox.moe/4pfomp.safetensors

No 'trigger' needed, lora strength seems best at ~0.75-0.8 from my tests
>>
>>107375927
Of course not. Regional prompting(multiple unique characters) sucks with SDXL and using booru tags is simply inferior to natural language. The QUALITY of anime images is near perfect, sure, but that's not enough.
>>
>>107375938
Does the lora make the anatomy worse?
>>
>>107375950
Nothing that I've noticed, hands etc looks just as good as before
>>
>>107375949
meh, anime is all soulless slop to me. even the non ai shit the chinks produce
>>
>>107375873
go back to /b/ xd
>>
File: 989797.jpg (39 KB, 468x655)
39 KB
39 KB JPG
Hello anons, I want to train a lora of my waifu using Z Image Model, what's the bare minimum dataset size I need, software and VRAM? First time training, complete noob here, is there any rentry?
>>
>>107375966
>prompting for realism
Failed normalfag behavior
>>
I'm not downloading shitty turbo loras
>>
>>107375978
don't lose your head over all the details
>>
frieren is overrated trash hyped by zoomer tourists. fuck off
>>
>>107375978
wait for the base model its not out yet
>>
>>107376003
zoomer tourists hobby
zoomer tourists general
zoomer tourist board
zoomer tourist website
>>
>>107375978
12gb vram and ai toolkit
>>
>>107375978
>is there any rentry?
Retard. The DEMO of the model, not the full model but a distilled aesthetic tune, released literally two days ago. And the base model isn't even out yet. Fucking calm down.
>>
>>107375862
>I remember one time I hooked up with this girl and her nipples were no shit as long as my thumb. Like she had literal udders.
should have put a ring on that


>>107375888
>3 toes
yeah toes absolutely fucking suck on Z image turbo. I pray its just a turbo thing
>in any case, mind posting workflow or is it just the default?
it is the default workflow with the only two """objective""" "enhancements" that most people agree with right now: the TAEF1 encoder and un-bypassing the shift node with a value of 7


>>107375893
>lord have mercy.
literally 30% of slightly overweight persian girls have the exact same face as the one on the right, if that's what you're into.

>>107375897
>i think they trained on infected pimples instead
they're like 96% good enough. its not triggering my uncanny valley personally. I am getting nipples poking through bikinis often and bad nipples aren't uncommon either maybe 25% of nipples are bad (not 25% of gens with nipples, there is a 25% chance of a "nipple ngmi" occuring so thats 31% chance of a gen having good nipples with 4 nipples which is why I do a batch size of 4)
>>
>it's another Chinese company promises to release a model then doesn't episode
>>
File: 4524542.jpg (9 KB, 225x225)
9 KB
9 KB JPG
>>107376004
I's a good time to practice around with the turbo model right now, plus I'm a VRAMlet anyway
>>
File: z-image_00070_.jpg (1.25 MB, 1792x2176)
1.25 MB
1.25 MB JPG
any tips on making the scene dark
>>
File: 1750765936031539.png (3.1 MB, 1920x1080)
3.1 MB
3.1 MB PNG
>>107375776
this is so close on being nano banana tier in terms of manga, the text is readable for the most part, maybe the base model will reach the next model and make comic rendering usable
>>
>>107376021
>the TAEF1 encoder
huh, isn't that just for previews? where exactly are you using it
>>
>>107376021
Nah she had a boyfriend. It was kind of a one night thing. I'm a certified architect too and she was pretty slim.
>>
>>107376028
funs FOR senpai
>>
>>107376026
youre a self admitted newfag whose never trained before you have no idea what youre talking about
>>
VRAMlets, has our time come?
>>
File: 1738513956652053.webm (202 KB, 1911x1379)
202 KB
202 KB WEBM
>>107376021
>they're like 96% good enough
it's not even 50% good enough. It's literally just this. there's no detail at all, just a brown circle.
>>
>>107376026
>I's a good time to practice around with the turbo model right now
it's not. you can train using ram. you will have to do everything again when the base model drops (extremely soon)
>>
>>107376026
Please stop avatarfagging or go to /sdg/
>>
File: 00022-918163840.png (1.91 MB, 1120x1440)
1.91 MB
1.91 MB PNG
>>107375814
I had to change your promt to actually get it to work
>>
The fun thing about doing an infographic that compares 75 images from each model is genning the Z-image images and knowing after the first 3 what every single image is going to look like
>>
>>107376061
we fixed this issue 2-3 threads ago
>>
Noooo I still have ideas for SDXL image sets it can't be over so soon nooooooo
>>
>>107376026
once again proving frieren fags are fucking retarded zoomers.
>>
>>107376048
>you will have to do everything again
i will have to change the ai toolkit job model from z image turbo to z image base and click the start button AGAIN?>????? its so over
>>
>>107376067
Nta but that was not a fix unless your goal was to prompt 1 girls. It kills complex prompt adherence
>>
I'll wait for Onetrainer to implement ZIT
>>
WE ARE FVCKING BACK BROSSS
>>
>>107376073
you are the kind of ai person who does more harm to imggen than anti ai pple
>>
File: copertina-3881647358.jpg (306 KB, 1800x1800)
306 KB
306 KB JPG
>>107376086
>>
>>107376086
based
>>
>>107376067
it was a shitty band-aid solution that only works conditionally.
>>
File: z-image_00073_.jpg (1.28 MB, 1792x2176)
1.28 MB
1.28 MB JPG
i can't stop
>>
>>107376067
Tell me the fix and I'll throw that in as another column.
>>
>>107375898
yup, it's sad we take the opinions of vramlets seriously here.
>>
>>107376090
im not the initial zoomer retard that you replied to
>>
>>107376090
Not than anon but why are you being such a prick over a guy who just wants to train a LoRA? It's his time to spend how he pleases and you're acting like a total faggot over it.
>>
>>107376099
>it was a shitty band-aid solution
it works well you're just yelling at clouds now >>107372870
>>
Let's face it there won't be any good Zbooru models until 2026. There's still time for one last SDXL hurrah
>>
File: 1747164686578558.png (147 KB, 970x672)
147 KB
147 KB PNG
>>107376027
we're gonna be able to AI generate an entire magazine with articles and stuff too very soon, like you can do it now with effort and photoshop but soon it'll all just be from a prompt

people discussed making dark stuff recently. starting with a black image and img2img is one thing, i forget the rest

>>107376031
>huh, isn't that just for previews? where exactly are you using it
as the vae. picrel
there's a guide on reddit

>>107376036
>I'm a certified architect too and she was pretty slim.
based but the point of being an architect is to architect her yourself like how master architect pierce brosnan did but i understand

>>107376047
lol alright niggy have a (You) because the video was slightly funny
>>
>>107376106
model vramlets can run=lively ecosystem and cosntantly developing
model vramlets can't run=DOA
>>
>>107376113
>>107375873
>the WAN 2.5 rugpull traumatized the AI community so badly kek
>>
File: 435671987.png (1.62 MB, 1600x896)
1.62 MB
1.62 MB PNG
>>107375876
They will look better, what I'm wondering is how well they'll work on the turbo model.
>>
File: 1740883189857305.png (1.69 MB, 1024x1024)
1.69 MB
1.69 MB PNG
>>107376028
now it's ESL tier but readable lool
>>
>>107376128
yeah i just came up with a workaround
>>107376105
basically this but the second clip should be empty
>>
>>107376028
FUCK if this gets SDXL levels of danbooru support I'm going to SHIT myself
>>
>>107376158
i alreeady shat myself
>>
File: z-image_00075_.jpg (1.23 MB, 1792x2176)
1.23 MB
1.23 MB JPG
>>
>>107375787
where is that picture from. is this from part of a magazine or webcomic or something wtf lol
>>
File: 1035010167.png (1.56 MB, 1600x896)
1.56 MB
1.56 MB PNG
>>107376090
He's right though.
>>
>>
>>107376167
https://en.wikipedia.org/wiki/Sega_Power
i assume its from this
>>
>>107376164
what are your good/bad feet percentages asian foot anon? how many bad gens before a good gen like that
>>
>>107376102
Do some "Vogue" covers to mix it up.
>>
File: file.png (1 KB, 104x44)
1 KB
1 KB PNG
damn, 80s/it on my rx6600
>>
>>107375787
Why's he killing the Italian dog though?
>>
>>107376185
for z image
>>
what model is anon playing with now?
>>
File: 1745800973724999.jpg (42 KB, 720x689)
42 KB
42 KB JPG
>>107376193
he's tired of this shit
>>
>>107376171
It stil lrusn pretty slow on older GPUs, it seems to be optimized for RTX4+, runs much faster there even with the same VRAM
>>
>Anime finetune of z-image
>Wan2.2
The finetune doesn't even have to be danbooru style. This is it. I can feel it, the stars are aligning. Thank you China.
>>
>>107376146
cute macorot
>>
>>107376150
Prompt editing to force variety (you could get even more with a wildcard in the first step, like in the old days) is a form of cheating on this kind of test.
>>
>>107376171
kek
>>
>>107376207
Nah doesn't do it for me unless I can prompt my favorite artists, I need muh booru forks
>>
File: z-image_00076_.jpg (1.26 MB, 1792x2176)
1.26 MB
1.26 MB JPG
>>107376178
1 out of 3 have good enough feet
>>107376179
vogue has jewish vibes fuck that

ill change it up
>>
File: 1748786444395331.jpg (577 KB, 1280x2048)
577 KB
577 KB JPG
>>107376172
>i assume its from this
so you're telling me they put a nazi sonic child's drawing into a UK magazine? no vey
>>
>>107376207
2.2 still can't do anime style animation
>>
>>107376227
>1 out of 3 have good enough feet
you've found a good pocket of training data because for me it's like 1/8 or even rarer
>>
File: z-turbo_00096_.png (1.76 MB, 1024x1536)
1.76 MB
1.76 MB PNG
>>
>>107376158
Imagine the prompt following. Imagine having more than 75 tokens usable before the quality takes a dip. Holy fuck.
>>
File: 56546.jpg (28 KB, 452x263)
28 KB
28 KB JPG
>>107376026
>>107375978
Well?? Will you help me?
>>
>>107376229
>no vey
holy shit it did lmao
>>
So far I've generated images for SD3.5 medium, Chroma, and Z-Image. What other local models should be in the comparison? (For photo images, not anime)
>>
>>107376256
Qwen Image
>>
>>107376249
>tranime brown avatarfag zoomer retard needs to be spoonfed every button click
read and shut the fuck up >>107376018
>>
File: ComfyUI_08125_.png (1.14 MB, 944x1280)
1.14 MB
1.14 MB PNG
>>
>>107376256
Pony v6, BigASP SDXL
>>
>>107376185
Use lower quants.
>>
File: ComfyUI_08104_.png (1.62 MB, 944x1280)
1.62 MB
1.62 MB PNG
>>
File: ComfyUI_08081_.png (1.5 MB, 944x1280)
1.5 MB
1.5 MB PNG
>>
File: 1750042188807213.png (3.69 MB, 1920x1080)
3.69 MB
3.69 MB PNG
>>107375776
>Just write a giant prompt bro the model will understand.
China is amazing.
https://files.catbox.moe/yy9o5m.txt
>>
File: ComfyUI_08079_.png (1.52 MB, 944x1280)
1.52 MB
1.52 MB PNG
>>107376341
>tfw accidentally the rape demon
>>
File: z-image_00078_.jpg (1.09 MB, 1792x2176)
1.09 MB
1.09 MB JPG
>>107376300
https://www.youtube.com/watch?v=sFl5rKWlOS8
>>
File: 1747860739559971.png (2.34 MB, 1168x1752)
2.34 MB
2.34 MB PNG
>>
>>107376158
Wouldn't the model degrade in quality if you finetune it with danbooru tags/dataset? Like how chroma's finetune made hands, feets and limbs worse than the base model
>>
File: 1751317114721013.jpg (336 KB, 1920x1080)
336 KB
336 KB JPG
>>
File: z-turbo_00106_.png (1.8 MB, 1024x1536)
1.8 MB
1.8 MB PNG
>>
File: ComfyUI_00727_.png (1.43 MB, 1440x720)
1.43 MB
1.43 MB PNG
>>107376380
top jej
>>
>>107376380
>Mario and his big ass nose
>"Jude"
sounds about right
>>
File: 1763466559503297.png (3.94 MB, 1920x1080)
3.94 MB
3.94 MB PNG
>>107376351
>>
>>107376249
the dataset is very important. make sure you cut it down to only the most salient features. like the head for example. cut off the head of all your photos before you train. then all of this data has to be customized to your available vram. scaled, if you will. if you're not careful, a model that seems very small can suddenly grow very large, and then it's all over.
>>
>>107376410
how did china produce SOTA text with a tiny local model
>>
File: z-image_00079_.jpg (1.17 MB, 1792x2176)
1.17 MB
1.17 MB JPG
>>
File: 1751444325956178.png (3.6 MB, 1920x1080)
3.6 MB
3.6 MB PNG
>>107376410
>>
File: 1739642805082326.png (2 KB, 106x47)
2 KB
2 KB PNG
>>
god I need -edit right now
>>
>>107376425
this is a great question, the text encoder is only a 4b model, this is black magic to me, I have no other explaination
>>
File: 1736698366567033.png (3 KB, 210x69)
3 KB
3 KB PNG
>>107376444
king. what are you training?
>>
>>107376271
>fp8 is 19gb
>most I can do is probably Q4_0 with offloading
This'll be tough but I'll give it a shot.

>>107376312
I haven't even done base SDXL. There's no way BigAsp will do well at this sort of image so that seems like a waste of time.
>>
>>107376379
SDXL only got better the more it got raped
Vanilla SDXL is kinda shit
>>
>>107376444
unless you have 4k+ images, I have no idea why you'd be doing that many steps.
>>
>>107376425
they distilled from a mysterious bigger model that only they have access to
if chinese llms are anything to go by, this means that they simply trained on gens stolen from the western proprietary SOTA
>>
>>107376425
too much is left on the table, i mean imagine these models in 5-7 years, omnimodal small model you can tell what you want with words and it just instantly changes the image. you can have something similar right now if you glue things together bust not a natively multimodal model.

but as long as scaling is cheaper than r&d people will bruteforce that as the safe route towards progress until less and less companies can afford to scale more and more and then those will focus on research, get a breakthrough, and the cycle begins anew
>>
>>107376474
everyone is saying Z-Image-Base is only 6B though, according to the paper that another Anon insists on mentioning.
>>
>>107376425
This is a big misconception, 6B parameters with a 4B LLM is already huge. You can only scale so much before you reach diminishing returns, and most researchers are just lazy with their data curation.
Read the paper.
>>
File: flux2_qccc_c-0037.jpg (221 KB, 1600x1600)
221 KB
221 KB JPG
>>
my gpu keeps peaking down from 100% usage to 0% usage whats causing this? the temps look ok
>>
>>107376494
>6B parameters with a 4B LLM is already huge
not compared to other modern models tho no?
>Read the paper.
i glossed as much as i could understand
>>
>>107376474
A seething jew, on my /ldg/ ?

More likely than you think
>>
>>107376486
>scaling is cheaper than r&d
But US companies are spending literal trillions on scaling as Chinese are outperforming them out of moms basement with r&d
>>
>>107376494
I think it will be difficult to reach higher resolutions without scaling the model size. We've been stuck < 4MP for years now.
>>
>>107376507
>i glossed as much as i could understand
ask gemini to make a summary for you kek
>>
>>107376486
>>107376512
read the paper
>>
Total ComfyUI Victory
>>
File: 7000 steps examples.jpg (3.9 MB, 2355x5888)
3.9 MB
3.9 MB JPG
>>107376459
Z Image Turbo DALL-E 3-like Girls style lora that I had the dataset ready for
>>107373741

>>107376467
The lora keeps getting better so I keep training, I'll train until it breaks completely for multiple checkpoints in a row
>>
what Z-Image are you guys using?
>>
>>107376507
You don't have to speculate, try it yourself. What's better, Qwen Image or Z Turbo?
>>
>>107376421
Thanks, do you recommend train my lora in SDXL before?
>>
>>107376526
its like burning the lora is a feature
>>
File: 1761319970944223.png (3.94 MB, 1920x1080)
3.94 MB
3.94 MB PNG
>>107376433
>>
File: 1745835874100362.png (202 KB, 401x354)
202 KB
202 KB PNG
Okay. Hear me out. X-image
>>
>>107376341
>>107376357
catbox / prompt?
>>
>>107376185
Another Vramlet with a GTX 1080 here, our cards should be about equally fast for games, but AMD sucks a bit more at AI.
At what resolution, what CFG and Quant are you using?
With Q5_K_S and CFG at 1 i get about 4s/it at 512x512 and 14s/it at 1024x1024
With Q5_K_S and CFG at 2 i get about 7s/it at 512x512 and 28s/it at 1024x1024

Can anyone else post their results?
>>
>>107376546
makes you think, what happened to Y-image?
>>
>>107376474
>stolen from the western proprietary SOTA
who cares? they released open weight model, and proprietary SOTA models are also going to benefit from this
>>
>>107376558
>what happened to Y-image?
the same thing that happened to Iphone 9 and Windows 9
>>
>>107376546
ZA-image
>>
File: file.png (112 KB, 1443x244)
112 KB
112 KB PNG
>>107376474
>this means that they simply trained on gens stolen from the western proprietary SOTA
they did the opposite actually
>>
>>107376526
looks good, any loras?
>>
File: zimg_00001_.png (2.23 MB, 1280x1280)
2.23 MB
2.23 MB PNG
>>107376552
thats what i generated
i did a 512x512 test with 25.83s/it
>>
>>107376572
>any loras?
? I'm training it
>>
god damn I slept through two and a half threads
is the base model out yet?
>>
File: 1760361455166918.png (2.26 MB, 1696x1208)
2.26 MB
2.26 MB PNG
>>
>>107376587
if it was it wouldn't have been 2.5 threads
>>
>>107376587
two more weeks
>>
>>107376590
shitty samplers
stop it
>>
File: 1736436354283062.png (3.7 MB, 1920x1080)
3.7 MB
3.7 MB PNG
>>107376544
>>
File: 1738656581639657.png (72 KB, 730x579)
72 KB
72 KB PNG
more like zzzzzzzzzzzzzzz-image
>>
File: AGHH HOLY FIRE .gif (2.84 MB, 492x276)
2.84 MB
2.84 MB GIF
>>107375776
>>107376351
>>107376410
>>107376433
>>107376544
>>
File: 1751836155956182.jpg (407 KB, 1280x2048)
407 KB
407 KB JPG
>>
>>107376600
I think he used that lora
https://civitai.com/models/2175050/vhscommercial?modelVersionId=2449356
>>
File: 1736789275474619.png (3.84 MB, 1280x1920)
3.84 MB
3.84 MB PNG
>>107376351
nice morklow
>>
>>107376645
thanks, and based piercel kek
>>
Zimage lora from pic rel?
>>
>>107376661
>Zimage lora from pic rel?
when we'll get z-image edit there won't need a character lora anymore, can't wait
>>
>>107376580
What program, quant and CFG did you use for that one?
>>
if i'm not getting any speed up from using lower quants of zit then the bottleneck is in my old vramlet gpu cores, not the memory?
>>
File: 1764464291682961.jpg (16 KB, 576x324)
16 KB
16 KB JPG
when they said zimage is uncensored they werent kidding wow
>>
>>107376645
that's fucking crazy good text how many steps and how many attempts did this take
>>
File: next level shit.png (2.04 MB, 896x1152)
2.04 MB
2.04 MB PNG
>>107376351
>>
File: log_challenge.jpg (586 KB, 1536x2048)
586 KB
586 KB JPG
>>
File: 1743009616847011.png (2.8 MB, 1168x1752)
2.8 MB
2.8 MB PNG
>>107376600
>>107376636
yeah its the lora
>>
File: file.png (23 KB, 1059x170)
23 KB
23 KB PNG
>>107376670
comfyui, i dont know what a quant is i just started generating again since 2024, cfg 1, 8 steps
>>
Anyone managed to produce good results with WAN i2v on 8Gb VRAM? I got it working, but the videos are slow-mo garbage with some lethargic motion.
>>
>>107376677
it doesn't know penis and the vagoopers and nipples look a bit odd, but that's not censoring they just didn't include alot of it in their training
>>
>2mins per image for a mere 20 step gen
Qwen is going to take 2.5hrs to generate the images I need. Painful. I'll have to do this overnight.
>>
File: flux2_qccc_c-0033.jpg (298 KB, 1600x1600)
298 KB
298 KB JPG
>>
>>107376681
2 more weeks and we'll be able to finish Berserk all by ourselves
>>
File: 1764449088543062.png (2.84 MB, 1168x1752)
2.84 MB
2.84 MB PNG
>>
File: 1746233903765228.png (3.91 MB, 1280x1920)
3.91 MB
3.91 MB PNG
>>107376679
30 steps, did 5 batches of 2 at a time while tweaking the prompt
>>
>still no real long video for wan
>every long video lora, node or workflow still relies on last frames
>all produce sudden jerky movement

The color shift doesnt even bother me, there must be a way to "smooth" out transitions between each generated video on the fly? The painter long vid is a good starting point https://github.com/princepainter/ComfyUI-PainterLongVideo if any smart anons can remedy this.

I tried wan windows context nodes and even riflexrope, no luck :(
>>
>>107376707
I already finished Berserk by prompting Griffith as my wife and Casca as my dog. Miura would approve.
>>
>>107376691
Dunno how well Comfy can make use of your AMD card, maybe someone with one can comment on that and give his s/it or it/s for comparison.
>>
File: 1757823750829174.png (3.24 MB, 1920x1080)
3.24 MB
3.24 MB PNG
>>107376681
>>
>>107376695
it feels a bit like the model thinks peepees and virginas are the same thing and it tries to build one amalgamation out of the two
>>
>>107376737
the 6000 series cards are done i think

the new triton and miopen updates arent supported for 6000 series
>>
File: 1751837134236494.png (204 KB, 549x411)
204 KB
204 KB PNG
>i have a dream, that one day models will never again be trained on futas, so that they know that a woman should never have a penis
>>
File: 1733007198768391.png (299 KB, 759x456)
299 KB
299 KB PNG
>sir we've found the man who invented bokeh, you know, that background blu-AAAAAAAAAACKKKKKKKKKKKK
>>
File: 1753418185768264.png (355 KB, 1010x761)
355 KB
355 KB PNG
>>107376782
>ve found the man who invented bokeh
god?
>>
File: ComfyUI_08050_.png (2.55 MB, 1280x2048)
2.55 MB
2.55 MB PNG
>>
The last time I proompted was nearly two years ago. This is fun.

anything I should change from the default comfy workflow?
>>
>>107376791
truly the best proof this world is made by an evil deity and hell itself.
>>
>>107376767
not familiar with amd cards running on comfy, but i faintly remember something about them running on linux with comfy, if you are on windows then WSL (Windows Subsystem for Linux) might be an option, but you gotta research that yourself. good luck fellow vramlet
>>
File: 1761908297528218.png (1.72 MB, 1024x1024)
1.72 MB
1.72 MB PNG
>>107376774
kek
>>
>>107376717
>yeah its the lora
yes, it makes sense. interesting
>>
File: 1756507037178275.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
>>107376782
>>107376791
got this shit in 30 seconds, this is magic dude lmao
>>
>>107375938
hot damn
>>
File: ComfyUI_ZIT_00012_.png (1.63 MB, 1024x1024)
1.63 MB
1.63 MB PNG
>>107376552
1080 here too, at Q6_K and CFG 1 i get 19s/it at 1024x1024
>>
File: 1760107337605136.png (1.97 MB, 1024x1024)
1.97 MB
1.97 MB PNG
>>107376845
lmaoo I didn't expect Claude to accept that prompt rewrite
>>
>>107376694
Most people use lightning loras and apparently that's where the slow-mo comes from. Still, 8GB, that's rough buddy.
>>
File: flux2_qccc_c-0052.jpg (388 KB, 1600x1600)
388 KB
388 KB JPG
>>
>>107376872
how long is that prompt?
>>
>>107376879
https://files.catbox.moe/iuiw1m.txt
>>
File: z-turbo_00145_.png (2.39 MB, 1152x1536)
2.39 MB
2.39 MB PNG
>>
File: 1746165959445610.jpg (373 KB, 1280x2048)
373 KB
373 KB JPG
Z makes everyone with light skin a little asian
>>
File: 1747207374895220.png (985 KB, 1752x1168)
985 KB
985 KB PNG
>>
>>107376886
this is nuts. i've been writing 70 token prompts
>>
>>107376929
Ikr, this model is fucking insane
>>
>>107376876
that's pretty neat
>>
>not enough compute to run image model, captioning model, and llm all at the same time
:(
>>
File: 1746242811583985.png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
>>107376872
>>
>>107376942
so sequential
I'd love to pipe my zit outputs straight to wan but that ain't happening
>>
>>107376943
>Interview: Tongyi-MAI vs Black Forest Labs
oh I definitely know where this is going
https://www.youtube.com/watch?v=VIjlkGu_RwA
>>
File: 1764343347090301.png (506 KB, 736x813)
506 KB
506 KB PNG
>>107375729
Would a 3090 be enough to run img2vid comfortable locally or you need a 5090 shit? I currently have 3060 12gb and get a disconnecting error in ComfyUI when using higher quants or things like that. It just needs enough vram to load in memory right?
>>
>>107376942
cant you use the already loaded qwen model for textgen?
>>
>>107376960
i wish but it doesnt work like that
>>
File: skull.jpg (643 KB, 1536x2048)
643 KB
643 KB JPG
>>
>>
>>107376954
>It just needs enough vram to load in memory right?
no the vram amount can be larger or lesser depending on the resolution you are genning and some other things too

you should buy a 5090, if you can, if you want to be serious with local AI. and hopefully you have at minimum 32 preferably 64gb of ddr5 ram as well
>>
>>107376954
i run everything on 8gb vram + 64gb, q8 wan i2v and t2v ggufs, bf16 zimage
>>
File: 1747803486139242.png (1.01 MB, 1000x800)
1.01 MB
1.01 MB PNG
>>
File: 1735505208048597.png (2 MB, 1752x1168)
2 MB
2 MB PNG
>realistic images
>>
>>107376954
I have a 3090. It has s probably the bare minimum for worry-free video generation. You could probably get away with a little less, but that is the minimum without having to use low vram tricks like lightning Lora.
>>
>>107376978
how many hours left before it's officialy monday in china?
>>
>>107376985
it's sunday morning, 9:45am
>>
>>107376983
there isn't a little less, it's 16gb or 24
>>
>>107376994
So we only have 14 hours left before we can doom, grim...
>>
>>107376996
Why no 20 gb cards tho.
>>
>>107377003
good fucking question
>>
>>107376979
is that bald notch lmao
>>
File: 1758626379152068.png (577 KB, 1021x746)
577 KB
577 KB PNG
>>107376978
>>
fellas do you know if it's possible to merge noobxl and rouwei clip together? rouwei clip has more styles and characters. noobxl has anatomical and compositional advantage
>>
Z Image Turbo DALL-E 3 style lora 11500 steps in, keeps going.

I wonder if my previous version of this lora on Qwen Image wasn't training this good because I used double the learning rate and trained until it broke down during training for the first time but now I kept going and it fixed itself. I'll have to retrain the Qwen one. https://civitai.com/models/2093591

I only briefly tried to train some basic things on flux and sdxl once before but still to me even Turbo Z Image seems quite incredible for training in comparison, it just gets what it needs to do.
>>
File: pain...png (64 KB, 259x194)
64 KB
64 KB PNG
I want to sleep, but I don't want to wake up after everyone has celebrated the release of Z-image base...
>>
>>107377043
i dont even call it sleep anymore. i call it lora training
>>
>>107376872
>straight path is pink
>>
>>107377038
waow
>>
Going outside good training pockets and getting slop is worse than blueballs I need base
>>
File: ComfyUI_00211_.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
it does minecraft pretty well
>>
>>107377064
How well does it know 4chan?
>>
File: 1734141708694338.jpg (322 KB, 1280x2048)
322 KB
322 KB JPG
chinese century
>>
>>107377064
>it does minecraft pretty well
forget about minecraft, did you notice in your image it can also render Hatsune Miku? Damnnn, best model eva!
>>
>>107377074
the century of humiliation for the US has officially begun
>>
>be me
>going through hundreds of my training lora sampled images
>suddenly the next image is blurry across the entire face of the woman
>FUCK FUCK FUCK FUCK WHICH IMAGE IN THE DATASET WASNT TAGGED PROPERLY.png
>realize that the blur moves and that it's actually my eyes that are blurring from a whole day of barely blinking
phew, the gens are safe.
>>
>>107377073
Not at all
>>
>>107377043
probably not coming until after /next/ weekend
>>
File: 1743188342728806.png (2.02 MB, 1168x1752)
2.02 MB
2.02 MB PNG
>>
>>107377038
Her leg doing something weird
>>
File: ZIMAGE_00350_.png (473 KB, 512x512)
473 KB
473 KB PNG
>>107375906
That's me back from my 2yr break.
And, no I will never upgrade from my GTX 1060. Whoever's maintaining that python cuda bullshit stopped supporting my card, but I figured how to get the old cuda running.
>>
>>107377087
Good.
>>
File: 1756200571175582.png (856 KB, 1069x1012)
856 KB
856 KB PNG
pony v7 > z
>>
File: 1738335998114618.jpg (383 KB, 1280x2048)
383 KB
383 KB JPG
>>107377081
>the century of humiliation for the US has officially begun
i might start learning Mandarin. i dont give a shit about chinese people killing chinese people on june 1989 and the porn ban doesn't affect me as long as I have AI. I for one welcome the replacement of our Jewish overlords with Chinese ones. and I never really had a problem with Chinese immigrants either, just the tourists
>>
>>107375784
>the end of the weekend
at no point they said "the end of the week end"
https://github.com/Tongyi-MAI/Z-Image/issues/7#issuecomment-3586493968
>>
>>107377134
based, they're the lesser evils at this point
>>
File: 1745646442392637.png (1.89 MB, 1752x1168)
1.89 MB
1.89 MB PNG
>>
>>107377134
I'm waiting for a speech recognition model that isn't just speech2text2speech so that he can precisely teach how to pronounce words before starting
>>
File: 1748580317854685.png (2.62 MB, 1432x1432)
2.62 MB
2.62 MB PNG
>>107377132
>>
>>107377137
I never bothered to check into the claims but I assumed the whole weekend claim was bullshit.
Chinese people have weekends too.
>>
>>107377134
Unironically China has more freedom than my shithole country at the moment.
But at least houses cost over a million dollars, that's a good thing right? ...right?
>>
>>107377134
tianamen? it wasn't the state that did the violence.. it was the student protesters
>>
>>107371032
How come it's not possible to use this thing on Comfy?
>>
>>107377205
>vladmandic/sdnext
This is like the worst possible, most bloated UI you could support. No neo, no Comfy... It's insane.
>>
>its the end of november
>no ltx2

i weep
>>
i can say without a doubt z image shits on any model i tested previously on my rx 6600

its insane, it just generated images i thought it could never generate, no details on what was generated

this is the best model out there right now i think were going to get nano banana pro levels of details with image editing
>>
File: 1757184924736568.jpg (627 KB, 1280x2048)
627 KB
627 KB JPG
>>107377234
they literally said mid-december
but they're also kikes, so who nose

>>107377199
>tianamen? it wasn't the state that did the violence.. it was the student protesters
did i stutter nigger? i just said I don't give a shit
>>
>>107377107
that's not her leg anon...
>>
File: Comparison.jpg (3.45 MB, 3072x1536)
3.45 MB
3.45 MB JPG
I have an increasing number of minor gripes with Z-Image but for what it is at the speed it does it, it's hard to fault
e.g. fairly similar results here between it and Flux.2, I think both are good, Flux.2 of course took a zillion years longer though
>>
>>107377333
I wish it could do anything but the most basic of poses.
>>
>>107377333
Try 8 steps using euler with the normal scheduler instead, the comfy workflow adds a bunch of extra noise for no reason.
8 steps + euler normal is basically what z-image used in their reference setup.
>>
>>107377266
"huehue my country sucks i should move to another country, but i really don't care about my previous programming where other country=bad"

ya you'll get far
>>
is it normal for my gpu to not be 100% utilised with max power draw at all tiem when generating shit
>>
What comes first, Z controlnet or edit model? Img2img is too finicky, changes too much
>>
>>107377363
Not unless you’re genning videos or with insufficient vram.
>>
>>107377370
so i need to repaste the heatsink it keeps jumping from 100 to 0% utilisation
>>
>>107377333
Lost the texture on the right or was it without the Ben Day from the begin
>>
>>107377366
>What comes first, Z controlnet or edit model?
Z-Base
>>
>>107377363
no, something must be bottlenecking it
>>
>>107377394
the temp on the hotspot sometimes quickly reaches 80 degrees c
i am being cucked by the thermal paste i think
>>
>>107377420
your thermal paste is fucking your wife in front of you?
>>
>>107377420
Yeah could be, I had to RMA one of my cards because it would overheat/throttle and even freeze during gens, this one I've got can go all day at full tilt.
>>
>>107377428
its gotten bad lately
>>
Is there a node out there for wan 2.2 to do its generation in 1 go instead of high noise, pausing then low noise? I no phr00ts wan exists but its a little too slopped
>>
File: file.png (107 KB, 246x230)
107 KB
107 KB PNG
someone bake a new cake i need to show my beautiful image to everyone
i generated just now it took me 1300 seconds
>>
>>107377445
Are you using a Voodoo 2 or something?
>>
>>107377350
Yeah. In any case while Flux.2 is a very very good model both editing wise and T2I wise I think that BFL building both capabilities into the same model was a giant blunder, there's no way around it lending to the perception of it having a bad "size to quality" ratio given that you still have to load the entire thing no matter what. And I also feel that there's TEs that BFL could have used that are smaller with equivalent or better performance than Mistral-Small-3.2-24B-Instruct-2506 (maybe GLM-4.1V-9B-Thinking for example).
>>
>>107377458
the prompt was quite hefty
>>
>>107377355
??? I used 9 per pass based on the Z huggingface which states 9 inference steps actually amounting to 8 proper DiT steps. NOT doing the upscale and trying to go right to 1536 in one go would for sure be worse if that's what you meant, that wouldn't prove anything / wasn't the point here at all.
>>
>>107376243
nice. catbox?
>>
>>107377458
It's all math at the end of the day, you could spend your entire life diffusing an image by hand
>>
>>107377420
If your temperature delta between core and hot spot is more than 14C, you probably have thermal paste pump out. Phase change pad like PTM7950 will fix that.
>>
>>107377478
>1gril, iphone_selfie_lora_9999.safetensors, very generic, (uninteresting:1.5)
>>
Fresh

>>107377493
>>107377493
>>107377493
>>107377493

Fresh
>>
>>107377488
you've done him
>>
>>107376797
>antanholorry
>>
>>107377467
You could've just posted it on catbox or something



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.