[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor application acceptance emails are being sent out. Please remember to check your spam box!


[Advertise on 4chan]


File: collage.jpg (3.73 MB, 3678x4768)
3.73 MB
3.73 MB JPG
Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107344153

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe
https://github.com/ostris/ai-toolkit

>Z
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
https://huggingface.co/Comfy-Org/z_image_turbo

>WanX
https://rentry.org/wan22ldgguide
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
File: 1736221473679440.png (3.6 MB, 2560x1396)
3.6 MB
3.6 MB PNG
For those who weren't there on the previous thread, increase your shift to get rid of the noise.
>>
File: 1031976155576479.png (1.05 MB, 896x1152)
1.05 MB
1.05 MB PNG
>>
god imma cum

>oooh husbant you want pray street fighta 2 for tha supa nintendo??
>>
Kijai's Torch Compile node can now handle zimage.
>>
File: 1755838262794823.png (67 KB, 1063x537)
67 KB
67 KB PNG
Is there a way to input a dimension value, then a ratio (2/3 9/16 etc) + rounded to x (16/32/64) instead of typing everything?
>>
File: bruh.png (969 KB, 1080x675)
969 KB
969 KB PNG
left is Z-image and right is Flux 2, I mean, come on... what the fuck did the bfl fags do to make it so bad?
>>
>>107345792
No but you can use scale latent if you already have the ratio dialed in right and you just want to make it bigger/smaller.
>>
Can someone explain to me how they got the turbo model before the base model? I thought they made the turbo model from the base model (with that teacher distillation shit)
>>
>>107345792
Wait until swarmui implements native support for Z, it has excellent dimension value tool
>>
>>107345765
Thanks, made a big difference.
>>
Anyone else fucking around with the img2img? what're your go to settings so far?
>>
Bloatmaxxers, have we been blown the fuck out?
>>
File: 355076932789882.jpg (450 KB, 2048x2048)
450 KB
450 KB JPG
>>107345809
Safety
>>
>>107345765
>b-but Z-image has muhh jpg artif-ACK
kek
>>
File: file.png (2.18 MB, 1424x1008)
2.18 MB
2.18 MB PNG
Why does Anon love cheese pizza so much?
>>
also thank you >>107345607
>>
>>107345868
IT'S TASTY AND RIGHT-OUT-OF-THE-OVEN
>>
>>107345815
our based chink overlords released the turbo model first, they're still sitting on the base model until you get your social credit score higher
>>
>>107345898
the twitter chink said that the base model is "bad", so if turbo got distilled from this "bad" model and we got this kino, does it means we'll get the final boss later? lmao
>>
>>107345898
i mean he wanted an explanation on how it works in the nitty gritty, but lol true our social credit scores are in the shitter

especially the simpsons anon since he turned into a pedo THROUGH z-image. funniest shit i've ever seen in these threads.
>>
File: z_mod_00040_.jpg (650 KB, 1408x1952)
650 KB
650 KB JPG
>>
>>107345910
prompt? I like that anime style
>>
>>107345765
I'm not sure but it might be better to keep the noise version then run a 2nd pass ksampler on it for like 2 steps.

I do something like that with Illustrious, where I intentionally create a shitty version (with RescaleCFG) then clean it up on a 2nd pass, because the shitty version has more intelligent composition.
>>
File: AnimateDiff_00001.mp4 (1.27 MB, 720x720)
1.27 MB
1.27 MB MP4
"the video is shot like a first person view and the viewer is the camera-man.
the demon woman crawls up to the camera man with seductive motions as the foliage on the ground gives away as she moves on it her hair swaying with the movements with realistic physics and the camera follows her as she crawls closer to the camera and places her hands on the camera mans thighs as she smiles seductively and looks up to the viewer as the camera now looks down on her face and the forest ground as the background, point of view style."

How do I prompt for the camera being an actual pov, first person view?
>>
>>107345916
80s and 90s retro anime style illustration
>>
>>107345918
there's a node for that, it lets you use rescaleCFG for a certain amount of time
https://github.com/BigStationW/ComfyUi-RescaleCFGAdvanced
>>
>>107345909
Nobody is born deviant.
>>
File: file.png (1.82 MB, 944x1456)
1.82 MB
1.82 MB PNG
>>107345909
Is that z-image?
How did you make her thick but not fat?
>>
File: Untitled.png (145 KB, 1384x950)
145 KB
145 KB PNG
>>107345928
I used to use that one but I made a better version of it
>>
File: z_mod_00045_.jpg (783 KB, 1408x1952)
783 KB
783 KB JPG
God damn it does small detail so well when other models completely shit themselves.
>>
>>107345939
img2img my nigga
>>
>>107345909
I like this.
>>107345939
But this... is perfection
>>
>>107345956
lame
>>
>>107345954
you'd think they used a godlike tier vae but it's still the regular flux vae lmao
>>
>>107345961
>the virgin "husband let's go running in the park!" vs the ascended "ooo husbant my chowestoral levelrs arr too high.."
>>
>>107345954
it's humiliating for the rest of the field to be this much pwned by a 6b model desu lmao
>>
>specify working out in the gym
>adds visible sweat beads
S O U L
>>
File: file.png (2.1 MB, 1152x1152)
2.1 MB
2.1 MB PNG
>>107345961
I see...a fellow architect
>>
Why is wan2gp slow as shit vs comfy? I downloaded it specifically to avoid dealing with comfy's shit, tried to generate a 5 second video, and it claimed it'd take over 2 hours. Comfy would have had it done in ~10 minutes on my shitware.
>>
File: 1758769546314259.png (2.31 MB, 1280x1280)
2.31 MB
2.31 MB PNG
>>107345921
>>
>>107346018
>Why is wan2gp slow as shit vs comfy?
Comfy has the best back end of all of them, not even close
>>
>>107345964
IMAGINE IF THEY... FINETUNE IT.
THE POSSIBILITIES OF EVEN MORE DETAILS... IMMMMMMM CUUUUUUMMIIIIIIIIIIIIIIING I'M CUMMING I'M CUMMING I'M CUMMIIIIIIIIIIIIIIIIIIIIIIIIIIIIINGGGGGGGGGGG
>>
File: 1756555264585654.png (2.46 MB, 1280x1280)
2.46 MB
2.46 MB PNG
>>
It deeefinitely doesn't have much knowledge on nudity. But it doesn't seem censored though.

>>107346044
wwoooaaahh
>>
*insert pigeon gooner with lots of spilled milk around*
>>
File: Grok_vs_Zimage.jpg (853 KB, 2352x2336)
853 KB
853 KB JPG
top row is Grok
bottom row is Zimage
Zissies, how are we coping with this one?
>>
>>107346002
can you catbox that so I can see what kind of prompts you're using? I have no idea how to prompt this shit.
>>
>>107346080
Oh wow, didn't know Grok improved so much
>>
>>107346080
nice
>>
File: z_mod_00055_.jpg (869 KB, 1328x1848)
869 KB
869 KB JPG
>>
>>107346085
>photo of character
>brief desc of what the character is doing in what setting

and that's it basically. Should probably feed that shit into gemini or something to improve it, make it way more detailed. But it works.
different variations of photorealistic tags works, like amateur photo or "Polaroid SX-70 manipulation photograph" like one anon was using in his txt2img.
>>
>>107345964
>>107346044
someone tell lax about this so he can tell the chinks to finetune it so we can have even better details
have you ever seen nai v4.5's details? well they finetuned flux's vae and looks better
>>
File: ComfyUI_temp_mtebu_00003_.png (3.71 MB, 1360x1760)
3.71 MB
3.71 MB PNG
>>
>accidentally gen a 1080p 16:9 wan video at 81 frames
>only half of my 32gb vram is used

???
>>
>>107346080
Backgrounds have better architecture in all three Z-Image.
Women are more beautiful/cute in all three Z-Image.
Western image generators only make sluts and avant-garde (LGBT-style).
>>
detailmaxxing
>>
zimage being viable in the long term will all depend on how trainable
>>
File: 1762626946186974.png (2.38 MB, 1280x1280)
2.38 MB
2.38 MB PNG
>>
>>107346098
The skin of the girls looks noisy
The rest of the pic doesn't
>>
>>107346080
>Huge cloud-only SAAS model is slightly better than a 6b local model
Yeah, how on earth will I cope...
>>
>>107346095
How are people getting it to do characters? I asked it for mercy from overwatch as a photorealistic girl and it gave me totally unrelated slop that looked like it was a 3D rendering of a chink mobile knockoff game of a knockoff.
>>
File: 5550928536.png (1.42 MB, 832x1216)
1.42 MB
1.42 MB PNG
>>
>>107346124
you guys should stop complaining about this, it's much better that an AI model is slightly too noisy instead of the opposite. there are 50 different ways to reduce noise.
>>
>>107346126
>6b local model that's a turbo distill of an unfinished checkpoint
>running on steps as low as 4-8
honestly its like apples to slightly better apples running on datacenter PC's ran by jeets and musk. all things considered we're eating pretty fucking good.

>>107346129
we're img2img maxxing now son.
>>
File: file.png (368 KB, 500x558)
368 KB
368 KB PNG
>>107346080
>>
File: 1752986592437849.png (7 KB, 706x32)
7 KB
7 KB PNG
comfyui nodes 2.0 verdict?
>>
File: file.png (362 KB, 795x762)
362 KB
362 KB PNG
hows the gooning for z-image so far? i have yet to see one
>>
>>107346113
Keanu Reeves doesn't look like he knows his way around a piano
>>
>>107346101
Glich in the matrix
>>
>>107346149
look at the catboxes of the previous threads
>>
>>107346112
It's a distilled model so not so much from the get go.
>>
>>107346138
I fucking hate comfy and don't know how to do this shit like I would be able to in forge. Help a retard out with a workflow?
>>
>>107346135
Don't bother responding to retards, these are the kind of morons who want shitty encodes where they remove all film grain making everything uncanny.
>>
File: 1760836489283257.png (2.53 MB, 1280x1280)
2.53 MB
2.53 MB PNG
>>107346113
>>
Instead of using Qwen3 as text encoder can I just use Gemma or something else? Already have tons of other models but not Qwen.
>>
File: 1021504501130804.png (949 KB, 832x1216)
949 KB
949 KB PNG
>>107346157
They will supposedly release the non-distilled base model weights too.
>>
>>107346146
Haven't tried them yet, hopefully they add more basic functionality which you need third party nodes for now.
>>
>>107346160
I'll help out my nigga retard, no problem.
https://files.catbox.moe/ec22tr.png
adjust denoising to 0.3 or lower depending on the image, i'm not sure what exact settings i was using for this image but set it to 8 steps euler simple to start with if it isn't already.
https://files.catbox.moe/ec22tr.png
>>
File: z_mod_00059_.jpg (1.03 MB, 1848x1360)
1.03 MB
1.03 MB JPG
>>
>>107346181
that coffee can kick in any day, preferably right now so i don't do silly things like re link the same catbox

>>107346179
god i love her face here, very cute.
>>
>>107346094
very nice
>>
>>107346149
Decent, but of course the true potential will only be tapped once you can train loras.
>>
File: trash_resized.jpg (3.88 MB, 3827x5172)
3.88 MB
3.88 MB JPG
Hey hey Anon, Anon here.
Some more from the mill.
I'm still sorting and checking, there's a lot of things that work absolutely great or decent for these prompts.
res_2m, euler, sa_solver
simple, ddim_uniform
Those seem to be favorites so far. Fucking model is a beast, most of it is at least decent, while some of them are pretty mind blowing for a small turbo model.
China, man. Fucking China.

Full sized of all prompts:
Frilly titties: https://files.catbox.moe/ue2243.jpg
Trash girl: https://files.catbox.moe/3u1zwe.jpg (Resized version picrel)
Rat girl: https://files.catbox.moe/cpz3uy.jpg
Mixed media: https://files.catbox.moe/jtfu1h.jpg
Oil painting: https://files.catbox.moe/6kvktz.jpg
>>
>>107346144
kek
>>
>>107346194
>Rat girl
>click
>actually human
Most disappointing thing that happened today.
>>
>>107346181
>>107346185
So what is the second image? The workflow is kind of moot until we know what the input was.
>>
>>107346207
If it stays like that it seems like your day will be great.
>>
>>107346103
A worthy cope
>>107346129
You can try "cosplay of mercy (overwatch) ..."
But I don't know if it knows mercy natively
>>
>>107346179
I fully expect something bad to happen because only gay and retarded things are allowed to happen in this world. Euther they won't release the weights or the nondistilled model will suck, there is always a catch
>>
>>107345765
I made this change and it created way more blocky artifacts in my image. For reference what sampler/scheduler/step count do you use anon?
>>
>>107346156
>>107346188
yeah i checked on civit ai, it doesn't know what pussy or dick looks like but atleast its uncensored, ill just have to wait for some legend to properly train it
>>
>>107346224
the regular euler + simple, 15 steps
>>
>>107346221
You already posted this.
>>
>>107346227
>i checked on civit ai
can you show one such link I can't find them
>>
>>107346227
why not both?
https://files.catbox.moe/vdwyr0.jpg
>>
>>107346237
I will keep doomposting until proven otherwise
>>
File: 1746222104077924.png (245 KB, 940x600)
245 KB
245 KB PNG
>>107346221
>I fully expect something bad to happen because only gay and retarded things are allowed to happen in this world.
nothing ever happens
>>
>>107346230
Thank you will experiment more. Maybe it was because I was not generating photo style but 2D art but maybe I was just using enough steps.
>>
File: AnimateDiff_00001.mp4 (3.72 MB, 1904x1072)
3.72 MB
3.72 MB MP4
>>107346153
Definitely. I guess it's because there's not much detail, just flat 2d.
>>
File: 1750186156670850.jpg (1.16 MB, 1536x2048)
1.16 MB
1.16 MB JPG
>>
>>107346256
me in the pic
>>
>>107346238
>https://civitai.com/models/2168935/z-image?modelVersionId=2442439
or you can filter by z-image, on the images tab

>>107346243
bleach now.
>>
anyone been writing down actor/celebrity knowledge for Z?
>>
>>107346256
GOD he's literally me
>>
>>107346256
lmaooooo
>>
A note from a VRAMlet: loading the full model in Comfy with one of the fp8 weight modes seems to make the outputs notably more artifact riddled than loading q8 GGUF or even q6.
GGUFs are a little slower for me though but everything is slow anyway since my GPU old so whatever, if I'm gonna wait anyway might as well get something good.
>>
how does shift work in how it scales up/down? is there a point were i SHOULD expect it to start breaking down from being too high? Like 10?

because i think 10 might be better than 7 kek
>>
File: 1735689453830036.png (2.64 MB, 1280x1280)
2.64 MB
2.64 MB PNG
oopsies
>>
File: 1754262415707842.png (2.73 MB, 1216x1680)
2.73 MB
2.73 MB PNG
>>107346261
https://civitai.com/images/111748988
damn there's some good images in there
>>
>>107346282
if 10 works for you, then what the fuck is the point of your question? just experiment until you find what works for you, fuck you.
>>
>>107346296
this is the type of shit you let your therapist know about bud. autism shouldn't be this hostile.
>>
>>107346282
>is there a point were i SHOULD expect it to start breaking down from being too high?
AI models are black boxes so the only answer comes from experimentation, see by yourself
>>
>>107346213
i tried widowmaker and it produced a regular brunette with no ass
>>
Reminder that BFL tried to make a video model.

How big do you think that would haven been? 500B? 1T?
>>
>>107346300
hostility towards people of lesser intellect is always valid
>>
>>107346315
he's smarter than you though
>>
https://civitai.com/images/111700885
kek
>>
base model really needs to be released so I can train a big ass lora
>>
>>107346322
me on the right
>>
File: 1746840361518719.png (2.4 MB, 1280x1280)
2.4 MB
2.4 MB PNG
>>
>>107346221
>nondistilled model will suck
this one, they delayed the release of the base model because of how bad it was, and are trying to unfuck it, see previous threads
>>
Reminder that generating CSAM is illegal and is punishable with jail time
>>
File: z_mod_00073_.jpg (753 KB, 1320x1704)
753 KB
753 KB JPG
>>
>>107346332
THE DONKS... IT WILL BE A GOOD FUCKING ERA
>>
how bad is the q3 quant?
t. gigavramlet
>>
>>107346321
i doubt it
>>
>>107346332
>>107346364
>genning fat doinks in Z-image

>>107346360
w-would..
>>
>>107346339
>not 2B's skull with a blindfold and white hair
tch
>>
>>107345765
sovl vs sovless
>>
File: 1760059471422529.png (150 KB, 458x250)
150 KB
150 KB PNG
>>107346368
what is wrong with you people? go for bigger quants and offload... >>107345353
>>
>>107346332
if it actually releases and it trains well ill become a member of the CCP
>>
>>107346344
only in cucked countries

and how will they know
>>
>1male, male focus
>get a vagina on the man
epic
>>
>>107346245
>nothing happens
>gay and retarded thing happens
>good thing happens
order of probability
>>
>>107346398
based zimage showing you what you are for genning men.
>>
>>107346398
Pooners are men chud
>>
File: ComfyUI_temp_jcjyg_00160_.png (3.81 MB, 1088x1920)
3.81 MB
3.81 MB PNG
>>107346332
Just how big are we talking?
>>
>>107346398
>prompting model trained with natural language with tags
>>
>>107345792
this?
>>
>>107346421
BIIGGGEERRR
>>
>>107346421
100 epochs minimum
>>
File: ComfyUI_00001_resized.jpg (3.8 MB, 7297x3154)
3.8 MB
3.8 MB JPG
Starting some photo style tests.
Euler/Simple, 9 Steps, shift of 3.

Prompt prefixes used:
An image taken on a 1990s analog disposable camera of
An amateur candid photograph taken on an iphone of
An amateur candid photograph taken on Sony Cybershot from Flickr of
Anaesthetic 5, a still from a movie scene of
A professional 35mm film photography of
A 1960 technicolor film still of
A classic cinema film still of

So far, Chroma outshines it, but there's a lot of prompts to go through. Pardon the spam for a bit.
>>
File: 1740198374518358.jpg (866 KB, 1248x1824)
866 KB
866 KB JPG
this is good
https://civitai.com/images/111711391
>>
File: 1735577359892787.png (2.67 MB, 3536x517)
2.67 MB
2.67 MB PNG
>>107346445
>So far, Chroma outshines it,
really? I don't get that much difference on chroma when prompting for those kind of styles
>>
>>107346344
>decided to wan2 some images with lolis without any anime lore
>mfw
>>
>>107346445
thank you for your service once again, and don't apologize, this stuff is super helpful.
>>
>>107346435
That is pretty big. I'm very much looking forward to the tunes.
>>
To be fair, you have to have a very high IQ to understand danbooru tags. The precision is extremely subtle, and without a solid grasp of of the english language most of the tags will go over a typical prompter's head.

The prompters understand this stuff; they have the intellectual capacity to truly appreciate the depths of these tags, to realize that they're not just jeets describing a woman- they tag something deep about 1girl. As a consequence people who dislike booru tagging truly ARE idiots- of course they wouldn't appreciate, for instance, diffraction_spikes, ben-day_dots, or tenshi_kaiwai. I'm smirking right now just imagining one of those "natural language" simpletons scratching their heads in confusion as they desperately google what the name of the light patterns that appear at the bottom of a pool is called. What fools... how I pity them. And yes by the way, I DO have a danbooru platinum account. And no, you cannot see it. It's for the ladies' eyes only- And even they have to demonstrate that they have a positive feedback score within 5 points of my own (preferably lower) beforehand.
>>
File: 1735137457107347.png (2.39 MB, 1280x1280)
2.39 MB
2.39 MB PNG
>>107346382
oh shit it knows 2b, nice
>>
File: ComfyUI_00001_.jpg (3.61 MB, 5825x3670)
3.61 MB
3.61 MB JPG
>>107346462
Yeah, I think so. Picrel is a plot of the same thing on Chroma V46 I did back in July.
>>
>>107346445
I'm getting really sick of your "chroma is better" posts bro, no one is going to use chroma so shut the fuck up. It has no future.

seriously, do you really see a future for it? no.
>>
>>107346386
You don't even need that node, ComfyUI offloads the required amount by default if you load a model that won't fit in VRAM alone. I tested it and it was not any faster to manually specify a GB amount with the DisTorch2 loader compared to just using the built in UNET loader. I guess it might depend on setup. But in any case yeah just offload some.
>>
>>107346477
this but almost fully unironically
>>
>>107346483
why not? it's completely normgroided
>>
>>107346445
Chroma is definitely better when it comes to style variety, but it's also like 5 times slower.
>>
File: 1745683519567149.png (2.13 MB, 2024x746)
2.13 MB
2.13 MB PNG
>>107346487
the 3 images are the same thing (looks like a seed difference)
>>
for an ancient vramlet (2070 super 8gb) chroma is fucking slow. z-image is also fucking slow. this is pain
>>
File: z_mod_00084_.jpg (755 KB, 1944x1184)
755 KB
755 KB JPG
>>
>>107346445
Z-Image is smaller and not slow as molasses, that's the reason why it already BTFO chroma. It doesn’t need to be better, it just needs to be good enough, a sinificabt improvement over XL, and fast.
>>
>>107346491
>ComfyUI offloads the required amount by default if you load a model that won't fit in VRAM alone.
it doesn't work well for me it doesn't put enough on the ram and when it starts to run it OOM
>>
>>107346515
How slow are we talking
>>
>>107346504
And twice as large. If Z-Image responds well to training it will own image generation
>>
>>107346519
>It doesn’t need to be better
but the funny part is that it is better, it doesn't have that oversaturated color and the details are much better (better anatomy too)
>>
>>107346488
I'm not shilling for Chroma, just saying that in terms of photography styles, from all the models I have tested with it (Krea, Chroma, Wan, Qwen, Hunyuan) it has been the only one so far that got a lot of styles at least partially correct, while others consistently shit the bed.
The Z images are still leagues beyond what Krea/Qwen/Wan/Hunyuan managed to do.
>>107346508
Seed is the same on all gens. But yeah, you're correct. Chroma was still very much in my mind as the one model that actually got close to what you'd be going for.
>>
>>107346533
9 s/it on z-image
>>
>>107346244
i will be happyposting. im happy and content with what we got. immense research paper as well. 640,000$ for a model of this magnitude, imagine how much xAI/grok spent for their model, which is only slightly better than >6b model
>>
>>107346550
SAME. This is a GOOD VIBES general for the REST OF THE YEAR. We're EATING GOOD.
>>
>>107346281
have you tried the bf16 weights? i get images in only 25-30s (text encode included in 30s) on a 3060 12gb
>>
>>107346550
>i will be happyposting.
based, it's rare we're getting some good news so I'm gonna profit of that moment
>>
open source keeps winning lads

>OpenAI has been hacked

>ChatGPT maker OpenAI confirms major data breach, exposing user's names, email addresses, and more
>>
>https://huggingface.co/ChenkinNoob/ChenkinNoob-XL-V0.1
looks like there's a new NoobAI based finetune
>>
>>107346545
Could be worse... could be 20 s/it!
>>
>>107346592
More good news? And on this day a thanksgiving day? I could not be happier.
>>
File: ComfyUI_00002_.jpg (3.81 MB, 9885x2143)
3.81 MB
3.81 MB JPG
>>107346445
Here's a silly one testing camera types and brands.
As always, not too interesting, but it's kinda hilarious how accurately it captured the camera models themselves if it put it in the images.

A Polaroid instant photograph from the 1970s of
A Kodak Brownie box camera photograph from 1950 of
A Hasselblad medium format photograph of
A Leica M6 rangefinder photograph of
A Canon AE-1 35mm photograph of
A Nikon FM2 photograph from the 1980s of
A Pentax K1000 student photography of
A Minolta X-700 photograph of
An Olympus Trip 35 vacation photo of
A Fujifilm X100 street photography of
A Lomo LC-A photograph of
A Diana F+ toy camera photograph of
A Holga 120N photograph of
A Pinhole camera long exposure of
>>
>>107346592
>ChatGPT maker OpenAI confirms major data breach, exposing user's names, email addresses, and more
bruh, how is that a good news, a lot of people are using ChatGpt, even people here
>>
File: 882828715382545.png (1.64 MB, 896x1184)
1.64 MB
1.64 MB PNG
>>107346160
>>107346181
Slightly cleaner img2img workflow with resolution control if you'd like.

https://files.catbox.moe/ix9cek.json
>>
ive been gone 12 hours.
how big did Z image blow up?
>>
>>107346594
Yeah, and it's a proper noob instead of waislop.
>>
File: 1749976789053810.png (1.98 MB, 1280x1280)
1.98 MB
1.98 MB PNG
>>
File: ComfyUI_00059_.png (990 KB, 1024x1024)
990 KB
990 KB PNG
>>
>>107346611
>how big did Z image blow up?
biggest blowup in the history of /ldg/, and I was here during the blowup of Flux and Wan
>>
>>107346603
>Kodak moment
Thanks anon
>>
>>107346610
thanks bud.

giving you the biggest thumbs up i possibly can IRL.
>>
>>107346621
What do you mean?
>>
File: ComfyUI_temp_pdidk_00003_.png (3.2 MB, 1280x1984)
3.2 MB
3.2 MB PNG
>>
>>107346599
and how fast are vramchads?
>>
they should've called it zeus image or something for how GODLY and lightning FAST it is
>>
>>107346611
>>107346621
>work with most popular frontend for day 0 support
>small model that literally anyone with any gpu can run fast
>limited in seed variance but still sota for ootb realism, ip knowledge, celeb knowledge, anatomy, camera control, text etc
simple formula
>>
>>107346642
I run at around 20 miles per hour.
>>
>>107346603
you know it has an effect when it changes the default asian type to caucasian lmao
>>
>>107346642
You can roughly calculate the speed, if SDXL has 2B parameters and z-image has 6B it's 3-4 times slower than SDXL. Extrapolate this to the hardware in question.
>>
>>107346604
ChatGPT is gay
OpenAI is gay
People who use it are gay
>>
>>107346614
Did you test it already? any comparison pics?
>>
>>107346194
ty for your cervix
>>
File: ComfyUI_00003_resized.jpg (3.78 MB, 8804x2668)
3.78 MB
3.78 MB JPG
>>107346603
Here's a few fun ones.

A 1920s silver gelatin print of
A 1930s depression-era documentary photograph of
A 1940s wartime press photograph of
A 1950s Life magazine photograph of
A 1960s fashion photography of
A 1970s National Geographic photograph of
A 1980s glamour shot of
A 1990s grunge photography of
A 2000s digital camera photograph of
An early 2010s Instagram filter photograph of

And to compare, here's the old Chroma V46 doing the same thing:
https://files.catbox.moe/dq7kqp.jpg

I still think Chroma did a better job, but I imagine with better prompting and explicitly describing the photograph (sepia, torn, whatever) Z will do fine.
>>
>>107346592
>>OpenAI has been hacked
So they got sora 2???
>major data breach, exposing user's names, email addresses
GAY
>>
>>107346657
eeeh... sdxl is 15 sec (28 steps). ZIT is 80 sec (8 steps).
>>
>>107346669
please stop with the 1girl comparisons already, that's basically personal taste. do something more objective.
>>
>>107346674
That's the ballpark.
>>
>>107346669
if you add characters like "a woman disguised as 2b from nier automata" then the style doesn't work anymore, desu the prompt adherence of this model is not that great, and that's expected it's a turbo model, I'm really hyped by the base model now
>>
>>107346515
ggufs are very slow, have you tried bf16 with ram offload?
>>
Hey anons, is a 12 gb 3060 good enough for Z-image turbo? I wanna try it out after a great run with Illustrious
>>
>>107346712
just try it out, doofus
>>
>>107346712
it uses a bit more than 12gb, so offload >>107345353
>>
>>107346705
yeah, but rtx 20xx can't do bf16. and if you do fp16 dtype compute, it's not faster than ggufs
>>
File: 1119122198172702.png (1.14 MB, 832x1248)
1.14 MB
1.14 MB PNG
>>107346712
I'm using it on that GPU, it's good enough. About ~3s/it, this image took 30 seconds
>>
>>107346719
>if you do fp16 dtype compute, it's not faster than ggufs
yeah but fp16 is still the best quality so it's still better to use that
>>
>>107346690
Quit treating objectives as 1girls.
>>
I already got bored of its limitations
>>
>>107346712
Try chroma before then it will feel lightning fast
>>
>>107346594
Tried it, couldn't gen characters with 2k pics in their dataset released this year, so I don't think it's worth
>>
all we need now is a new UI to make everyone drop cumfart and my wish list this year is complete
>>
Is Zimage inference compute bound or memory bandwidth bound?
>>
File: ComfyUI_00013_.png (3.12 MB, 1280x1920)
3.12 MB
3.12 MB PNG
fun model
>>
is z-image going to give my computer some kind of chinese zombie virus?
>>
>>107346712
yeah, use the default wf, works out the box
offloads automatically, no need for custom nodes
>>
>>107346778
all image/video gen is compute bound
>>
File: giphy.gif (3.96 MB, 480x270)
3.96 MB
3.96 MB GIF
>>107346746
try putting pic related on your second monitor
>>
>>107346773
Just vibe rawdog it in python. Easy peasy.
>>
>>107346781
no, don't be cra-我爱北京天安门,
>>
>>107346781
New Comfy update deleted all my files...
>>
File: 1738242318623342.png (2.3 MB, 1280x1280)
2.3 MB
2.3 MB PNG
>>
File: ComfyUI_00004_resized.jpg (3.71 MB, 8768x2951)
3.71 MB
3.71 MB JPG
>>107346669
A Daguerreotype from 1850 of
A Cyanotype blueprint photograph of
A Sepia-toned albumen print of
A Tintype photograph of
A Wet plate collodion photograph of
An Autochrome color photograph from 1910 of
A Hand-colored photograph from 1890 of
A Glass plate negative photograph of
A Stereoscopic 3D photograph of

It did way better again than any other Chinese or BFL model. Based.
I'll be done in two more after this and then you'll be free again.

>>107346703
Haven't played around with the prompt adherence that much, but it does seem that longer, more fleshed out prompts will start to deviate from the style more (and if the subject is especially unfitting).
>>107346690
You're supposed to look at how well the model gets the style, not swoon over the girl.
>>
>>107346796
we already knew comfy is spyware and malware. the model is fine though
>>
is z turbo just flux 2 but for poors?
>>
>>107346796
rm -r dirname
mkdir -p dirname

It happens.
>>
File: 1742375492312154.png (1.15 MB, 1080x770)
1.15 MB
1.15 MB PNG
>>107346817
>>107346817
>is z turbo just flux 2 but for poors?
it's way better at realism
>>
>>107346817
>flux 2
flux 2 is barely better than flux 1 kek
>>
>>107346817
flux 2 is just z for retards
>>
What about art gens? Can z-poop do some oil paintings or something?
>>
https://files.catbox.moe/bou48h.png

is nudity on z just cooked until we get loras?
>>
File: 189950154694814.png (1.11 MB, 832x1248)
1.11 MB
1.11 MB PNG
>>
>>107346854
it looks pretty good as it is, the loras won't have much issues making the real deal
>>
>>107346854
yes sadly. so, we gotta wait ((2 more weeks)) for the noob trained base model.
>>
What are the recommended sizes to use? Can I just use anything?
>>
>>107346869
2k seems to be the limit
>>
>>107346748
i laughed at this but it hurts that it really takes forever with chroma
>>
File: 1747413706968574.jpg (1.83 MB, 1536x2048)
1.83 MB
1.83 MB JPG
>>107346845
Yes, but a bit worse than Chroma. Still very impressive.
>>
>>107346817
>bigger model=better
That meme died yesterday
>>
Someone remind me what nudity / explicit nsfw looked like when SDXL had just released.
>>
>>107346879
this
>>
>>107346879
its been dead a while, multiple large chinese models came out that were all pretty bad
>>
>>107345765
Isn't the shift dependant of the resolution? like if you go for 1024x1024, shift 3 is fine but if you go for something bigger you have to increase that shift, I've seen that somewhere...
>>
File: 1749345848618501.png (463 KB, 500x628)
463 KB
463 KB PNG
>>107346895
>multiple large chinese models came out that were all pretty bad
Ilya was right
>>
>>107346883
what was the exact date?
>>
>>107346896
If that's the case, someone really should make a custom node that automatically adjusts the shift based on resolution.
>>
I told you, you fucks. I told you that you didn't really needed an insane amount of params just to make a good, smart model. Models actually were incredibly inefficient and Z-image proves it. But nooooo
>MUH PARAMS MUH +100B PARAMS LOCAL CANNOT COMPETE
Retards.
And I bet you could even reduce those 6B params to 4, 3 or even 2 and still be as good as the 6B one, if done right.
>>
>>107346877
Not bad.
>>
the thousands of hours of abuse we have inflicted upon our GPUs for a model to drop that just kinda works. Excited to get back to lora training, but man what a waste
>>
File: ComfyUI_00005_resized.jpg (3.84 MB, 8528x2871)
3.84 MB
3.84 MB JPG
>>107346810
Second to last one.
Not too great with TV & Video aesthetics, sadly. But perhaps that needs more prompt wrangling. I would have expected it to get CCTV camera footage, at least.

1980s VHS camcorder footage of
1990s Hi8 video camera footage of
Early 2000s MiniDV footage of
Security camera CCTV footage of
Dashcam footage of
GoPro action camera footage of
A Webcam screenshot from 2003 of
A Public access television still from 1985 of
An MTV music video still from 1999 of
>>
>>107346912
July 2023 apparently.
>>
>>107346925
Based Nostradanon
>>
File: 1739963043799745.jpg (1.34 MB, 1536x2048)
1.34 MB
1.34 MB JPG
>>
>>107346925
>And I bet you could even reduce those 6B params to 4, 3 or even 2 and still be as good as the 6B one, if done right.
I get your point, but desu a bigger model will always be better than a small model if both are trained the same exact way, those alibaba fucks could release a 18b model (a size equivalent to Qwen Image) and the 24gb vram chads could still run it and the quality would definitely be close to something like Grok or Nano Banana Pro
>>
>>107346931
i found a correlation with other models that when it shows the actual camera it doesn't get the reference to a style, not sure if you would agree with that. this one also seems to avoid fisheye compositions
>>
File: 149000189928115.png (1.18 MB, 1344x768)
1.18 MB
1.18 MB PNG
>>107346929
True, but some of the knowledge will carry over.
>>
>>107346961
18b is still waaaaaaay smaller than a +100B model so my point stands firm.
>>
File: 1746187161153886.png (2.83 MB, 1280x1280)
2.83 MB
2.83 MB PNG
>>
>>107346932
July 26, according to github.
>>
File: AnimateDiff_00001.mp4 (2.6 MB, 1072x1408)
2.6 MB
2.6 MB MP4
Neat.
>>
File: z-image-two.png (3.97 MB, 1568x2048)
3.97 MB
3.97 MB PNG
>>
>>107346994
Nice 6 fingies Z-image!
>>
File: 126876680738349.png (1.25 MB, 896x1184)
1.25 MB
1.25 MB PNG
>>
>>107346925
I've been saying similar, for a model to take off locally it must fit comfortably on a current midrange GPU or it's DOA. People were cooming over each new bloatmodel forgetting that these models are pretty much static because no one has the hardware to modify them so as soon as run into limitations of the model it's a dead end
>>
>>107347019
I love how Nvdia was like "see? we're helping you run this giant flux 2 model by making it fp8 and adding some offloading!"

Uhh what about making cards with a lot of vram affordable so that we can put the whole thing in it in the first place?
>>
File: 1753040222356726.jpg (459 KB, 832x1216)
459 KB
459 KB JPG
>>107346663
Well it's kinda hard for me to say right now. It certainly didn't sabotage the artist styles, they are still as prominent as they are on the base noob and there's no waishit vaseline, but the outputs are significantly different compared to the epsilon noob from a year ago.
>>
File: 1736869314198628.png (2.41 MB, 1280x1280)
2.41 MB
2.41 MB PNG
>>
File: 1740115278456451.png (3.64 MB, 1904x1504)
3.64 MB
3.64 MB PNG
I still think the Siege of Pale is the best way to see how far an image gen can go and it's pretty decent.
>>
File: gamer.png (2.79 MB, 1288x1657)
2.79 MB
2.79 MB PNG
>>
What, isn't the point of a refiner to use a different model?
>>
>>107347042
Afaik Noob 1.1 was worse than the original epsilon 1.0 noob.
Not sure what this means in this context.
>>
File: charts.png (5 KB, 760x376)
5 KB
5 KB PNG
what happened to 4stats charts?
>>107346987
i remember the sdxl 0.9 beta leaked earlier
>>
File: 1750255339093622.png (2.8 MB, 1280x1280)
2.8 MB
2.8 MB PNG
it's pretty good at styles if you do some boomer prompting
>>
File: ComfyUI_00006_resized.jpg (3.84 MB, 8654x2385)
3.84 MB
3.84 MB JPG
>>107346931
And here's the last one. Sadly, another rather disappointing one.

A Lomography experimental photograph of
A Double exposure artistic photograph of
A Light leak experimental photograph of
A Photogram without a camera of
A Scanner photography of
A Disposable underwater camera photograph of
A Redscale film photograph of
A Solarized photograph of
A Chemigram abstract photograph of
An Expired film photograph with light leaks of
A Cross-processed slide film photograph of

Largely ignored all the quirky kinds of photography styles. Weird.
You're free from me now, Anon. Good bye, may your 1girls be beautiful.

>>107346810
I found an old plot I did with Chroma flash of this plot. Here it is:
https://files.catbox.moe/782xla.jpg
While it's of course blown out to shit, the styles are pretty good.

>>107346962
Yeah, Qwen did the same thing. It added action cams and CCTV cameras constantly but never got the actual style at all.
>>
File: 1746907136139971.jpg (134 KB, 1024x559)
134 KB
134 KB JPG
>>107347057
vs Nano Banana
>>
File: 1751609670069050.png (2.25 MB, 2525x988)
2.25 MB
2.25 MB PNG
>>107347104
>>107347057
I'm sure it can be closer to nano banana if we have the base model + reasoning
https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo/picture
>>
>>107347070
>sdxl 0.9 beta leaked earlier
True, I just don't remember if it could do nsfw or if that only came with tunes / loras or whatever else was around at the time.
>>
>>107347103
shank you in the cervix
>>
>>107347057
>>107347104
Well shit, if the sorcerers were more expressive i would prefer Z to nano banana.
>>
File: 1752350715374088.png (1.24 MB, 2556x558)
1.24 MB
1.24 MB PNG
https://www.reddit.com/r/StableDiffusion/comments/1p8462z/z_image_tinkering_tread/
>but it's actually works with cfg above 1, despite being a distilled model, but it also requires more steps As for now I tried cfg 5 with 30 steps and it's looks quite good. As you can see it's a little bit on overexposed side, but still ok.
interesting
>>
File: file.png (2.71 MB, 1248x1824)
2.71 MB
2.71 MB PNG
>>
>>107347151
z looks better stylistically but nano adhere to what i wanted more
>>
>>107347019
Flux had a lot of research and snake oil released for it though. More than any other previous model and that was right on the edge of consumer hardware.
>>
File: 1762019865983060.jpg (1.81 MB, 1331x2000)
1.81 MB
1.81 MB JPG
>>107347196
supposed to look like this ultimately
>>
File: 1750713509275737.jpg (1.66 MB, 2048x2048)
1.66 MB
1.66 MB JPG
>>
>>107347153
Yes, this model just keeps on giving, negative prompting even

A shame even if expected that prompting in chinese gives best results though
>>
>>107347153
>It's best to use chinese prompt where possible. Gives noticeable boost.
>it's actually fucking true
i can't believe it. huge boost too, in every way including face detail. that said, i've only tested this with an already azin subject so results could vary, doing everything in chinese might make westoid girls turn bug.
>>
File: z_mod_00088_.jpg (670 KB, 1288x1936)
670 KB
670 KB JPG
>>
File: flux1_0008.png (2.11 MB, 832x1216)
2.11 MB
2.11 MB PNG
>>107347065
yeah but if you switch vaes you need to translate your latent
>>
>>107347228
Qwen3 is a Chinese llm model. It understands chink really well. Text encoder is responsible for the initial vectors.
>>
>>107347251
>Qwen3 is a Chinese llm model.
but it knows a shit ton of languages though
>>
>>107347251
Yeah i guess it should be a no brainer but. i don't know. i'm used to my burger language being the standard for everything.

not complaining though, i'll take what i'm getting for fucking free. even if it means passing every prompt to gemini to translate to winnie the pooh.
>>
File: 1750219558680430.png (2.16 MB, 1280x1280)
2.16 MB
2.16 MB PNG
>>107346382
you got it boss
>>
File: comfy_00002_.png (3.67 MB, 1655x1839)
3.67 MB
3.67 MB PNG
Quick test with my huge-ass concept bleeding boomer prompt.
Easily trades punches with Qwen in prompt adherence (some details are of course wrong but still) and the concept bleed between the subjects is basically 0, haven't seen that before on any model.
That's pretty damn impressive.

Prompt:
https://files.catbox.moe/6fmtfx.txt
>>
>>107347271
>not 9s holding 2b's skull during his psychotic hackerman phase
tch
>>
>>107347246
Translate how.
I2i so far works nice to get the pose.
>>
File: ComfyUI_00108_[1].png (1.51 MB, 1024x1024)
1.51 MB
1.51 MB PNG
It can do a pencil sketch but sometimes it really wants to not understand my prompt. Tried both "her eyes are obscured by her long hair" and "her hair is covering her eyes" and it just keeps giving me blindfolds lmao.
>>
File: 1751431328306316.jpg (1.59 MB, 1536x2048)
1.59 MB
1.59 MB JPG
>>107347278
The model knows 2b but struggles with 9S for some reason.
>>
File: 1753232597961913.jpg (1.1 MB, 2560x1828)
1.1 MB
1.1 MB JPG
>>107347153
>>107347228
>>107347270
bruh...
>>
File: 1760031102316855.png (1.4 MB, 1024x1024)
1.4 MB
1.4 MB PNG
>>107347103
Aerochrome worked but not redscale, sadly.
>>
>>107347287
>struggles with 9S for some reason
dang.
>>
>>107347298
>to xi or not to xi, that is the question, and we have our answer
>>
this model is so based it made everyone realize comfyui is fucking garbage and it deserves a better UI kek
>>
>>107347323
>realize comfyui is fucking garbage
we didn't need a new model to realize this
>>
>>107347262
Chinese and English are its primary languages.
Small models are not that great with smaller languages or even something like German in some cases.
>>
>>107347323
meh, I think comfy has a good UI. the backend is what's kinda shit.
>>
File: 1585-4154179236.jpg (1.48 MB, 3840x2367)
1.48 MB
1.48 MB JPG
FOSS killed local models, all UI's are trash, Comfy, SD, Forge, SwarmUI, even Stability Matrix that encompasses all of them isnt free of the blight of FOSS mental disease. None of them work. Have fun with your models of choice cause you aint getting more.
>>
File: ComfyUI_00137_.png (2.87 MB, 1488x1872)
2.87 MB
2.87 MB PNG
>>107347282
Lovely.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.