[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107328508

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe
https://github.com/ostris/ai-toolkit

>WanX
https://rentry.org/wan22ldgguide
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd
https://gumgum10.github.io/gumgum.github.io/
https://huggingface.co/neta-art/Neta-Lumina

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
What is the standard practice for leveraging the vram installed on a separate PC on the same network?

I have an RTX 5070 Ti with 64GB dram and it's been mostly fine for wan2.2, but I recently added NAG to the workflow and gen time is now taking twice as long for 97 frames for 992x720. Can't I just offload the text encoding off to some 8GB vram pc?.
>>
File: collage.jpg (3.22 MB, 2975x3710)
3.22 MB
3.22 MB JPG
>>
File: flux_krea_00039_.png (2.72 MB, 1088x1920)
2.72 MB
2.72 MB PNG
i reckon i'll have me one of them
fresh baked local model generals.
>>
Blessed thread of frenship
>>
ty baker!
>>>/b/realistic+parody became >>>/r/realistic+parody
>>
File: 1746837732977701.png (732 KB, 1024x1024)
732 KB
732 KB PNG
>tfw a 6b model destroys a 32b model in realism lmao
https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo/picture
>>
>>107332467
is this on huggingface yet? even runnable in comfyui? i don't want to give the chinks my phone number just to download it.
>>
>>107332474
it's supposed to be released today, we have to wait a bit before being able to test that out, comfyui has already implemented it so that's something
>>
>>107332452
>https://rentry.org/wan22ldgguide
Why there's no updated t2v workflow? Which lightx2v I should use with t2v models?
>>
File: 1755756357473374.png (1.37 MB, 1024x1024)
1.37 MB
1.37 MB PNG
>>107332467
alibaba is definitely cooking holy shit
>>
File: 1739993131966240.png (745 KB, 498x818)
745 KB
745 KB PNG
>huge breasts according to flux 2
>>
Why not try to optimize model architecture instead increasing parameters?
>>
>>107332467
>no model at hf
grim
>>
>>107332508
everything should be optimized for 24gb vram
>>
File: 1760471328186989.png (1.4 MB, 768x1024)
1.4 MB
1.4 MB PNG
>>107332508
>Why not try to optimize model architecture instead increasing parameters?
that's what alibaba is doing, and they do a great job at it
>>107332467
>>107332503
>>
File: Flux2_00030_.png (1.58 MB, 1024x1024)
1.58 MB
1.58 MB PNG
>>107332467
Same prompt converted Chinese -> English
>In this nighttime outdoor scene, the lighting is dim, with weak light sources in a low-light environment (such as distant lights). In the close-up composition, a young Chinese woman wears a light gray top with black backpack straps visible on her shoulders. She has long, dark hair, which is dynamically flowing in the wind, with some strands falling around her shoulders and face. Her facial skin has a natural texture in the dim light, and her eyes are looking towards the camera with a natural expression. The background consists of blurred trees, grass, and a fence: the outlines of the trees are faintly visible in the dim light, the grass is dark in color, and the fence has a mesh-like texture. Overall, in this low-light environment, details such as the movement of her hair, the texture of the backpack straps, the outlines of the trees, and the texture of the grass are rendered realistically.
>>
>>107332488
i will be stalking the thread today with great interest then. It looks promising!
can't believe this is a 1280pix image, look at that. damn.
>>
File: BASED.gif (2.25 MB, 636x640)
2.25 MB
2.25 MB GIF
>>107332523
>small model
>seems to know a lot of IPs
>the realism is on point
>Apache 2.0
I kneel again China, you guys are the future.
>>
File: borzoi googly eyes.png (295 KB, 648x680)
295 KB
295 KB PNG
>that image of the chick in her lingerie
holy shit it might actually be able to do nsfw out of the box
>>
File: 1755769459686956.jpg (740 KB, 2048x1390)
740 KB
740 KB JPG
>>107332528
this is humiliating for bfl holy shit
>>
File: god bless the chinks.png (139 KB, 1900x790)
139 KB
139 KB PNG
>>107332560
>>107332562
and it's only the turbo model, just imagine the improvement once we get the "best version"
>>
File: 1740003630255120.mp4 (1.36 MB, 832x480)
1.36 MB
1.36 MB MP4
4 steps/4 steps is the way (kijai 2.2 MoE distil high, 2.2 lightning low).

the golden retreiver dog on the left fires a huge blue lightning bolt from their paws at the man wearing glasses. The man with glasses flies off his chair to the right through a window into the sky, flipping over and over, over the side of a mountain cliff during the day. he lands far below and a giant lightning bolt hits his body where he lands, creating lots of smoke and fire.
>>
>>107332577
>just imagine the improvement once we get the "best version"
improvement like none of us being able to run it kek
>>
>>107332586
the difference between base and turbo is that turbo is a distilled 8 steps model, they're both 6b parameters
>>
>>107332599
ah ok, forgive me for not knowing/ being retarded. don't dock my social credit score.
>>
File: file.png (388 KB, 625x374)
388 KB
388 KB PNG
>>107332503
>no background blur
>>
>>107332603
I'm in a good mood today so I'll only dock your izzat score instead kek
>>
Reminder for ChromaGODS

Try your prompts with "aesthetic 9," in positive and "aesthetic 1," in negative if you havent until now
>>
>>107332467
I am serious when I say this, lodestone has to finetune it and add NSFW shit to it.
>>
>>107332467
modelscope download --model 'Tongyi-MAI/Z-Image-Turbo' --local_dir 'Z-Image-Turbo'

>Requests.exceptions.HTTPError: The request model: Tongyi-MAI/Z-Image-Turbo does not exist!
Huh? Just trying to follow the instructions. How do I download this?
>>
File: you suck bfl.png (2.12 MB, 1670x1415)
2.12 MB
2.12 MB PNG
Flux 2 fucking sucks, and that yellow tint makes it obvious they trained their model on synthetic 4o imagegen shit
>>
When we getting Z image? Flux 2 download is 30+ gb total even with gguf, kek
>>
>>107332652
>How do I download this?
you can't, it's not officially there yet
>>
>>107332532
>can't believe this is a 1280pix image, look at that. damn.
since it's a small model you can go for bigger resolutions and it won't be too slow, it's pretty smart if you ask me, I think the (big res + normal size model) is a better combo than (1k res + giant model)
>>
>>107332614
DO NOT DOCK IZZAT BLOODY BITCHOD

>>107332655
its like the chinks waited for BFL to fail again so they could BTFO them. absolutely SAD!
>>
File: 1757371441591070.mp4 (1.57 MB, 832x480)
1.57 MB
1.57 MB MP4
one more of the dog shock guy:
>>
File: 1735396514537025.png (637 KB, 1136x912)
637 KB
637 KB PNG
>>107332655
qwen edit + 4/8 step lightx2v lora is the way. fast and good, also makes great pepes.
>>
File: img_00047_.jpg (1.02 MB, 1264x1592)
1.02 MB
1.02 MB JPG
>>107332562
damn
>>
File: kek.png (26 KB, 220x165)
26 KB
26 KB PNG
>>107332679
>its like the chinks waited for BFL to fail again so they could BTFO them.
they're so petty, I love that
>>
>>107332665
Oh I see.
Hopefully Soon TM then.
>>
>alibaba releasing 2 open weight image models by two different teams
so what's the deal with wan??? why is it api only now?
>>
>>107332705
>why is it api only now?
what? the next iteration of QIE will be local and that Z-image model will be local as well
>>
>>107332713
>will be, will be, will be...
>but nothing happens
stop lying
>>
Why am I filling up my vram at this stage? Models are fully offloaded, 196gb, 5090.
>>
>>107332748
>they added huggingface links for no reason
why are you doubting them? they always delivered
>>
>>107332705
>>107332748
you came in here looking for a fight, but all you're getting is me calling you a faggot nigger.
>>
>>107332562
>>107332695
so if i understand correctly it's another team than the Qwen team that did this inside of Alibaba, if they managed to get this kino at 6b just imagine if the scaled it up for 12+b, if I were an employee from the Qwen team I would be worried as fuck, the other team seems to be more talented than them
>>
For me, its Z-Image.
>>
>>107332751
No you didn't.
Virtual VRAM and donor device means CPU is used for the exceeding amount. Whatever fits into your GPU is still used.
Set device parameter, the third setting to CPU if you want the agonizingly slow CPU inference.
>>
File: kek.png (1.91 MB, 1408x768)
1.91 MB
1.91 MB PNG
>>107332679
>its like the chinks waited for BFL to fail again so they could BTFO them. absolutely SAD!
kek
>>
>>107332508
that would encourage people to use local AI instead of API
>>
Bros if those images aren't cherry picked I am genuinely excited for Z-Image.
I am gonna cope and hope that it can do light NSFW (booba) even.
>>
>>107332713
reading comprehension level: gpt-0.5
>>
Please don't be too excited for this model, Alibaba will see they have struck gold and they'll keep it for themselves :(
>>
File: Flux2_00034_.png (1.49 MB, 1408x768)
1.49 MB
1.49 MB PNG
>>107332797
>>
File: ComfyUI_temp_pqugj_00036_.png (3.33 MB, 1824x1248)
3.33 MB
3.33 MB PNG
https://files.catbox.moe/2hcnb6.png
>>
>>107332842
this, thats why you should always shit on every mistake to make the company want to qualify themselves for you, you gotta counterract the normgroid praise that they will be showered with anyway
>>
>>107332562
To be fair, the Flux. 2 image is the only one that's actually dim.
>>
https://files.catbox.moe/vpr1ig.png
>>
>>107332851
how did it manage to even copy the logo? lmao
>>
https://files.catbox.moe/t1cwmr.png
>>
>>107332871
>Panel 3 (Bottom Left): This panel is split into two sections.
>Left Side: A yellow caption box reads "ONE DAY LATER..." Below it is a logo featuring a stylized 'e' or 'a' curve with the text "ALIBABA Z-IMAGE (6B)".
>Right Side: A group of four diverse office workers gathers around a computer, looking intrigued and happy. One character asks, "A new 6B local model? Z-Image?"
Beats me
>>
>>107332868
Why would you catbox censored images?
>>
File: ComfyUI_temp_iubdp_00106_.png (2.28 MB, 1824x1248)
2.28 MB
2.28 MB PNG
https://files.catbox.moe/5y9hti.png
>>
>>107332807
As long as they don't intentionally poison the model like BFL does with Flux, it should be easy to add NSFW using lora/finetune.
>>
File: 1763710556613020.png (432 KB, 661x645)
432 KB
432 KB PNG
>>107332797
>>107332851
I'm not gonna lie this logo looks pretty good, if an Alibaba employee is lurking here, you can use it as the official Z-image logo lol
>>
I know flux is trained on itself and require 9 pages of essays from an ai to prompt for an image, but at least the default settings should do better than this..?

>>107332790
I'm retarded, forgot to set the amount for the actual model that matters.. It's genning now.
>>
>>107332899
they never do that, Alibaba is based and give us normal models with the goated apache 2.0 licence
>>107332907
unfortunately it looks less slopped after 40 steps (it's too long bruh)
>>
File: 1747481106292273.png (1.96 MB, 1120x1440)
1.96 MB
1.96 MB PNG
LOCAL WON
>>
>>107332899
We don't know how well it will respond to lora training, plus doing NSFW as lora is usually iffier than base model knowing, and it can be difficult to combine with other loras.
It's better that the base model knows it already.
>>
File: ComfyUI_temp_iubdp_00052_.png (3.17 MB, 1824x1248)
3.17 MB
3.17 MB PNG
>>107332889
It's not censored, it was meant to be wholesome image :/
https://files.catbox.moe/4r4jf3.png
>>
>>107332467
Oh it'll also be an edit model, I wonder if it'll be better than QIE (probably not but it'll be worth a try)
>>
File: BrunoLipBite.png (968 KB, 660x640)
968 KB
968 KB PNG
>more images being added to that page
>skimpier and skimpier girls
>anatomy understanding is pretty good even for a bunch of 1mp base images
please don't fuck us over
please don't fuck us over
please don't fuck us over
>>
>>107332932
>It's better that the base model knows it already.
Of course, but beggars can't be choosers, and even if the model doesn't know about NSFW it's still miles better than BFL training/poisoning their Flux models with tons of synthetic images depicting breasts without nipples and crotches without genitals.
>>
File: Flux2_00036_.jpg (2.85 MB, 1824x1248)
2.85 MB
2.85 MB JPG
>>107332854
Nice work as usual. Same prompt, default workflow, sa_solver_pece

>>107332889
nta, bc sharing is caring. I can just copy the prompt and see how flux2 does without asking
>>
>>107332964
>please don't fuck us over
TRUST DA CHINKS
>>107332981
>it's still miles better than BFL training/poisoning their Flux models with tons of synthetic images depicting breasts without nipples and crotches without genitals.
this, fuck bfl, the chinks are showing you can improve your models without being a lazy fuck and just stack more layers to solve the problem
>>
>>107330558
>"kijai's nodes are better"
>does 4 steps with native, 6 steps with kijai
gee I wonder why it's "better". do you even look at the settings, retard?
>>
File: Capture.jpg (92 KB, 720x1280)
92 KB
92 KB JPG
>>107332467
the chroma footfag is gonna have a blast with that one lol
>>
File: 20 steps Q6_K.png (2.9 MB, 1536x1024)
2.9 MB
2.9 MB PNG
My benchmarking prompt tested on Q6_K Flux 2:
>Detailed photograph RAW of seven smiling friends of different races that are at a nightclub concert with dim lighting that is shining on their faces, behind them is a crowd of people dancing while fighting with large swords, everyone is holding a sword in their left hand and an intricate beer glass with differently colored beer in the right hand. Far behind them above the DJ there is a sign which has "Minimum drinKing age 021!" written on it in stylized cursive letters.
>>
>>107333014
can someone try that prompt on modelscope, I wanna see if Z-image can handle such complex prompt >>107332467
>>
>when it understands the mission, just in a roundabout way
well okay
>>
>>107333014
>>107333024
>yellow tint
lmao are you serious bfl?
>>
File: 40 steps Q6_K.png (2.84 MB, 1536x1024)
2.84 MB
2.84 MB PNG
>>107333014
Same seed with 40 steps
>>
>>107332532
I mean, the big part is its coherency. It got the Chinese characters right if this isn't an edit prompt and is actually a from nothing gen.
>>
https://huggingface.co/fal/FLUX.2-Tiny-AutoEncoder
>Tiny AutoEncoder trained on the latent space of black-forest-labs/FLUX.2-dev's autoencoder. Works to convert between latent and image space up to 20x faster and in 28x fewer parameters at the expense of a small amount of quality.
hold up, maybe this can save flux 2
>>
File: ComfyUI_00042_.jpg (833 KB, 2048x2048)
833 KB
833 KB JPG
>>107332897
Took this prompt for flux 2.
I think that dype works for flux 2. This is 2048p. The first attempt without dype at 1024 looked like shit.
627s gen rofl.
>>
>>107333001
You already got embarrassed in the previous thread, my illiterate idiot. There is no need to continue doubling down on your stupidity.
>>
>>107333010
Chromajeet is going to seethe because it’s not chroma, that’s all he does
>>
>>107333048
man the biggest quality loss and the fastest thing in these workflows IS the vae, fucking that over for a couple of seconds speedup is worthless, especially for edit workflows
>>
File: 1758642774080723.png (3.37 MB, 3489x1518)
3.37 MB
3.37 MB PNG
>>107333039
>It got the Chinese characters right if this isn't an edit prompt and is actually a from nothing gen.
it's a normal image from turbo, not the edit model
https://modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo/picture
>>
>>107333061
I'm not that anon. also, you're not even using the same light loras between the workflows. you're a giga fucking retard nigger
>>
>>107333061
>doesn't use the same loras
>doesn't use the same number of steps
>still doubles down and thinks he's right
why are you like this?
>>
>>107333088
anyone that disagrees with wanschizo is a samefag according to him
>>
>>107332463
catbox? all my krea gens look insanely washed out
>>
>>107332467
>>107332503
I never expected a 6b model to look this good, what's their secret sauce?
>>
>>107332508
bulk and cut, bulk and cut
>>
>>107333048
I dunno. Given the details at play and the Chinese models likely not using it, this isn't going to be a case of what happened with the first FLUX VAE where everyone just reuses it.
>>107333069
Holy shit, that is some good stuff right there. Now to see if anyone has money left over for fine tuning on these models or if that is dead...
>>
gonna make a better comfyui for people to use. it's reached eol and they seem to be wasting that billion pretty quickly. hope you're well anon.
>>
>>107333088
>>107333094
Anon, loras have nothing to do with the teacache effect on hands.
I know you're still butthurt over being repeatedly exposed as a complete and utter idiot, but do at least try to get over your loss.
>>
File: ohh mommy.png (1.3 MB, 1024x1024)
1.3 MB
1.3 MB PNG
>>107332467
https://www.reddit.com/r/StableDiffusion/comments/1p77dli/some_images_i_generated_using_the_zimage_model/
pretty good
>>
>>107333142
>I'm too retarded to make an actual 1:1 comparison
there's nothing we can do for you anon, you're too braindead for this
>>
>>107333142
wrong, faggot. it is known that the 1022 lora has that effect. alright, I'm done with your down syndrome ass
>>
>>107333096
nah the problem is you're a socially retarded autist who is too stupid to vary your prose, and end up falling into all the same patterns.
>>
>>107333144
where the is second hand
>>
>>107333123
Novel text encoder.
Probably a lot of pre-training data set pruning to prevent low quality images from degrading the model.
And aggressively specializing on realism probably.
It will likely be bad for non-realism, artistic, editing, etc. use cases.
Honestly smaller models should do that instead of trying to become useless jack of all trades.
>>
>107333159
>107333142
exhibit a
>>
File: 1739469907958759.png (3.71 MB, 8208x1445)
3.71 MB
3.71 MB PNG
>>107333144
here's some more examples with prompts
>>
>>107333158
>it is known that the 1022 lora has that effect
Which wasn't being used in any posted you laughably stupid idiot xD
>>
File: 20 steps Q6_K.png (1.74 MB, 1024x1024)
1.74 MB
1.74 MB PNG
>>107333014
>>107333031
Another seed and resolution
>>
>>107333123
we are comparing a realism image model to a general model that does gen, edit, multi-ref and 4mp. i think it's great to provide smaller specialized models but they are going to be different sizes based on that alone
>>
>>107333175
> 1girl, standing
>>
>>107333031
in my case that greta gen is actually just flux.

>>107333116
sorry, was using a node that had krea in the name field for some reason. that's just cyberrealistic flux.

been meaning to check progress on krea checkpoints.. last i tried it, kinda schizo.
>>
>>107333186
thanks for this but benchmarks usually show like what device and speed as well
>>
File: the famous 1girl.png (1.44 MB, 1151x1236)
1.44 MB
1.44 MB PNG
>>107333196
> 1girl, standing
>>
>>107333196
> 1fag, crying
>>
>>107333154
>autist still can't let go of 1:1 comparisons
>even when the anon who DID the 1:1 comparison clarified to you that it shouldn't matter, and that the base template shouldn't have this issue
Bizarre how dumb you are my dude. Ppl with severe autism like you can never move on lol
>>
never change /g/
>>
File: ComfyUI_00043_.jpg (911 KB, 2048x2048)
911 KB
911 KB JPG
Yeah dype seems to work just fine for flux 2, very nice.
>>
1boy, crying, soaked in urine, pajamas, gamer chair, cat ear headphones
>>
Alright zimage is nice yeah yeah okay
aesthetically it's good, size is good, but what is necessary are the technological advancements since

reference based image gen
training, ideally with said references and/or multiple res/aspect ratio
prompt enhancement
controlnet and similar (depth, canny, normal, etc.)
inpainting, outpainting, whatever variant
text editing

you get the idea, that stuff.
>>
>>107333191
>a general model that does gen, edit, multi-ref
let's be serious there, Qwen Image Edit is 20b, do you think it's that much inferior to Flux 2 (32b)? and it'll even be closer once we'll get the next iteration of QIE this week
>>
>>107333209
anon, you lost the argument, time to swallow your pride and let it go man
>>
File: 1726075449183071.jpg (7 KB, 249x250)
7 KB
7 KB JPG
>>107333168
>Honestly smaller models should do that instead of trying to become useless jack of all trades.
idk why barely anyone does that, one fuckhueg model after another drops that dies after the initial hype because they are all limited and get dropped because no one has the hardware to modify them. Focus for local should be on small models that are easy to train so you can readily modify them for whatever you need instead of trying to pack a trillion parameters into it and failing anyway
>>
File: 1742042169772414.png (2.88 MB, 2189x1065)
2.88 MB
2.88 MB PNG
>>107333234
>but what is necessary are the technological advancements since
>reference based image gen
there will be a Z-image edit though, we won't just get an image model
>>
>>107332751
you're using the full 60+gb model but only offloading 4gb. You need to set virtual_vram_gb to like 32 or more. works for me on a 3090 with it set to 40
>>
Is there an ai model in comfy that I can ask to describe an input image to then feed the prompt into flux?
>>
File: Qwen_00093_.png (1.93 MB, 1328x1328)
1.93 MB
1.93 MB PNG
>>107333186
Composition actually reminds me of qwen a lot. Quality is better, of course, but qwen one was pretty WIP and not shooting for any sort of quality at all.
It's impressive how it resolves faces into different ones as steps go by, though.
>>
>>107333237
no idea what your point is, the anon was asking why a 6b model is good at realism vs a 32b one. i have no idea about qie and flux2 edit comparisons.

i suppose QIE is probably better for image edit since it's an image edit model and not a general model? that was literally what i just finished saying.
>>
File: 1751580226090127.mp4 (1.23 MB, 832x480)
1.23 MB
1.23 MB MP4
wan 2.2 kino:

the golden retreiver dog on the left fires a huge blue lightning bolt from their paws at the man wearing glasses. The man with glasses flies off his chair to the right through a window into the sky in a skydiving pose with an electric aura around his body, flying over the side of a mountain cliff during the day. he lands far below in a large electric field and a giant lightning bolt hits his body where he lands, creating lots of smoke and fire.
>>
File: 20 steps Q6_K, 2.jpg (307 KB, 2048x2048)
307 KB
307 KB JPG
>>107333014
Image at 2048x2048 resolution instead, coherent but much more slopped.

It also seems like it has trouble with capitalizing K in drinKing across seeds, so it's seems it's overfit too much in some aspects.

>>107333205
I'm obviously just benchmarking the quality of the model. Although I'm using Q6 instead of Q8 which I'll switch to later
>>
>>107333320
so basically flux 2 is mid at everything because it tries to do everything at once? but the point was that it was supposed to be good at everything because it's a giant 32b model, that bloat was supposed to make that model a good jack of all trades
>>
>>107333309
Yes, a VLM. There's joycaption and others.
https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
or you can use google ai or something if you want to go that route
>>
"You've come this far have you? Watch me obliterate The World with my secret weapon - DJ Play will summon up AI demons! Let's get down!"
>>
>>107333337
never said any of that but you do you
>>
>>107333345
shit forgot link lmao
https://www.youtube.com/watch?v=FZREn9f0tzU
>>
>>107333345
That dude resigned.
>>
>>107333338
Thanks.

>>107333333
hexaget
>>
>>107333345
the face consistency is all right but again the plastic skin ruins the whole thing
>>
File: Z-image turbo.png (2.11 MB, 1024x1280)
2.11 MB
2.11 MB PNG
>>
File: arrest.jpg (604 KB, 3986x832)
604 KB
604 KB JPG
Sora - Nano Banana Pro - Flux 2 Pro. Which did it better? Is Flux 2 comparable to the closed source stuff?
prompt: A gritty photo-journalistic nighttime street scene in Osaka’s Dōtonbori district, shot with a long telephoto lens that compresses the dense field of neon billboards. The iconic Glico running-man sign glows brightly in the background, surrounded by a collage of saturated, overlapping advertisements: a large pink billboard with bold white letters resembling “Calbe,” various LED panels, and storefront lighting casting hard reflections onto the wet pavement. In the foreground, three Japanese riot police officers in full tactical gear aggressively secure the scene beside two marked police sedans with rooftop beacons flashing red and blue. Their uniforms include dark ballistic vests, riot helmets with half-raised visors, radios, and reflective patches catching the neon glare. One officer is a woman, shown in a low-angle, close-range perspective arresting a yakuza member. The yakuza man lies prone on the ground, face turned sideways, grimacing in frustration or anger. His hands are behind his back as the female officer forcefully applies handcuffs, her expression focused and stern. The two male riot cops stand nearby holding MP5 submachine guns. One officer keeps his MP5 shouldered and aimed down the street, posture tense. The other male officer shouts at surrounding pedestrians, ordering them to disperse, his body half-turned toward the crowd. Passersby in the midground appear blurred from motion, emphasizing urgency and chaos. Harsh mixed lighting from neon signs, police strobes, and storefronts creates deep contrast, sharp highlights on metal surfaces, and reflective color spill across the scene. Slight atmospheric haze softens the distant signage but leaves the officers sharply defined. The overall tone is raw, documentary-like, and captures a tense moment of police action in a busy urban night setting.
>>
>>107333358
starting to think the plastic skin anon is the same as the fake cum anon in /gif/ and it's one very long running joke
>>
File: document_0.jpg (53 KB, 512x512)
53 KB
53 KB JPG
>>
>>107333358
yeah no kiddin'. just taking a fun break from 1girls for a sec to do a concept i've been laughing about for months.
i swear if we get one more parametermaxxed shit checkpoint releasing before the end of the year im gonna have a stroke.
PLEASE SAVE US Z.

>>107333378
oh please do not start lumping me in with another poopdickschizo this is the first time i've personally posted flux in this general.
>>
>>107333123
>what's their secret sauce?
they decided to stop acting like retards and train their model with real images instead of synthetic shit
>>
>>107333376
Just the Flux 2 pro img
>>
>>107333376
didn't read the prompt but the first image looks like straight anime ass.
>>
File: shire.png (1.46 MB, 1344x752)
1.46 MB
1.46 MB PNG
>>
>>107333394
Sora and their yellow tint; name a more iconic duo
>>
File: let this be the one.png (178 KB, 656x381)
178 KB
178 KB PNG
>>107333387
>i swear if we get one more parametermaxxed shit checkpoint releasing before the end of the year im gonna have a stroke.
same anon, I feel ya
>>107333387
>PLEASE SAVE US Z.
*raises his hands*
>>
File: 20 steps Q6_K.png (1.86 MB, 1024x1024)
1.86 MB
1.86 MB PNG
>>107333333
Here's also a prompt from >>107333175
Lol
>A woman standing on a rainy city sidewalk holding a transparent umbrella, making a cute pouty expression. Raindrops on the umbrella sharply detailed, neon reflections in puddles. Her floral top appears slightly damp at the shoulders. Street signs and headlights create soft bokeh lights. Photorealistic mood.
>>
>>107333408
>name a more iconic duo
peanut butter and jelly
>>
File: ComfyUI_00023.png (2.74 MB, 1200x1800)
2.74 MB
2.74 MB PNG
>>107333309
I use Gemma 3 27B Q4 in LM Studio. You can even add multiple images at a time and have it combine them into a prompt if you want.
>>
>>107333412
that's quen image right?
>>
File: 1740811055942702.jpg (807 KB, 1790x1277)
807 KB
807 KB JPG
>>107333412
>>
cmon china release the new qwen edit.
>>
Tested low resolution, 8 steps fast gens on a Q4 that fits in 24gb vram.
Seed variability without any hacks is better than qwen, but nothing to write home about. The goblin witch is always on the right, but I like that faces and poses are different.
The model seems like it has lightning built in. Gens at 8 steps have mostly converged, but trademark low step dithering remains. There is no collapse in composition at 368x368.
Speed is around 20 seconds per this meagre gen.
>>
>another skimpy preview has hit the Z page
i'm gonna come. I'M GONNA COME!
xi-sama, forgive me for what i'm about to do to my social credit score.
>>
>>107333447
lmao
>>
File: 20 steps Q6_k.png (3.32 MB, 1328x1328)
3.32 MB
3.32 MB PNG
>>107333412

>>107333423
It's Flux 2
>>
>>107333447
bfl released their model in the worst moment I swear to god
>>
>>107333451
the hair texture is great, I really love the details this shit has probably the best local vae
>>
>>107333476
>he was focused on the hair detail
haha lol GAY
>>
>>107333447
I knew I made the right choice by not downloading this 50gb bloat plastic shit
>>
>>107333447
Z-image is better but makes the woman a chinaman automatically. Not saying it's a bad thing
>>
File: flux2__00031_.png (2.75 MB, 1920x1088)
2.75 MB
2.75 MB PNG
>i have taken the GGUF pill
>>
File: file.png (463 KB, 500x628)
463 KB
463 KB PNG
>>107333447
>Ilya Sutskever said yesterday that scaling is dead and that we should focus more on optimizing the training process
>One day later Z-image has proved just that
we definitely live in a simulation, it can't be just coinscidences lol
>>
>>107333508
to be fair, bfl is a western model so the default human will be an european type, for a chinese model the default human will be from east asia I guess
>>
I swear all these bloatmaxxed models are a psyop to drive people to use API models
>>
>>107333526
>optimizing the training process
Isn't the data quality biggest bottleneck for most things AI now?
>>
File: 1744150076421862.png (1.46 MB, 832x1248)
1.46 MB
1.46 MB PNG
I fed a certain image into joycaption and didn't read what it spit out lol
>>
>>107333552
Kek I recognize what photo you used
>>
File: let's gooooo.png (1.21 MB, 1024x1024)
1.21 MB
1.21 MB PNG
>>107332467
>It can do 90's animes
GOTY
>>
Flux 2 verdict:

It's a better but much more bloated version of Qwen Image that might be saved with good loras and speedups but given how even Qwen isn't as popular given it's own size this basically won't ever be saved, especially given that unlike with Qwen Image, you can't even train a lora of Flux 2 on 24gb vram gpus.

And with Z image about to drop which is much better out of the box while being a small model, Flux 2 won't even be in the spotlight enough for anyone to even waste time and money trying to cope their into trying to fix it.
>>
>>107333588
really doesn't look great though honestly. but not too surprised, its probably out of the area of expertise for Z.
probably the reason the more they add new photos the progressively less dressed the women are getting and this is the only anime example kek
>>
>>107333600
>really doesn't look great though honestly.
it's still from the turbo model though, it'll look better on the base model
>>
>>107333596
thank for you for the detailed update from obese coomer news
>>
by the way there's a demo available for Z, for any of you that can brute force through the chinese and have a fake phone number to give them for signup kek
>>
File: 1755751478926553.jpg (1.19 MB, 1248x1824)
1.19 MB
1.19 MB JPG
>>107333596
Flux 2 is decent, but it feels like a very minor step in most aspects compared to chroma, and with less stylistic knowledge across the board. A good finetune can make it desirable though.
>>
File: IMG_20251126_192715.jpg (306 KB, 1659x1416)
306 KB
306 KB JPG
The model has a dedicated flux2 scheduler which is almost identical to beta 0.9/0.8 at 512. But curiously it is resolution dependent (which should be the right way to approach this for other models, too.) Resolution works kind of like automated shift, making the slope steeper as it rises. I've only noticed a tangible difference in slope at above 1024.
>>
>>107333596
>And with Z image about to drop which is much better out of the box
We don't know that. To me it looks slopped in previews. Let it drop first.
>>
File: not ready.jpg (252 KB, 1440x1080)
252 KB
252 KB JPG
>>107333588
>>107333010
>>107332695
>>107333451
>a turbo model managed to get this much praise
Z-Image be like:
>>
>>107333641
That shit will cost hundreds of thousands if not millions to finetune, it's DOA
>>
>>107333655
It looks like it leans towards asian women which might indicate it's overfit on other things that matter but outside of that there's no question it's more realistic and detailed
>>
File: Z-image turbo test.jpg (103 KB, 1024x1024)
103 KB
103 KB JPG
It definitely looks good but I'd like to see something else than asian people though.
>>
>>107333665
>krillin is BFL getting blowed up but there's no goku around to 'avenge' his death
you love to see it.
>>
How does emphasis work in flux vs sdxl models? do i just put some parenthesis around the specific words/sentence of my prompt and that's it?
>>
>>107333718
yes, if you want to go for a ratio you go for (prompt:2) for x2 for example, it can also work for (prompt:1.5) too
>>
>>107333725
thanks babe
>>
File: img_00071_.jpg (455 KB, 1264x1592)
455 KB
455 KB JPG
>>
>>107333718
Emphasis is CLIP only thing. So SD 1.5, 2.1, SDXL.
No other text encoder uses it. Don't use it with flux.
>>
>>107333734
I don't think knees are supposed to work like that...
>>
File: interesting.png (2.25 MB, 2546x963)
2.25 MB
2.25 MB PNG
>>107332467
>https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo/picture
So it can do like Nano Banana Pro? You give it a vague prompt and it'll think of the rest by itself? sounds based
>>
>>107333744
>>107333718
>Don't use it with flux.
debo with his misinformation again, on ComfyUi the code works with every single model
>>
>>107333744
It 100% works with natural language text encoders like T5. I've personally tested this myself and so has another anon.

IE, you can do "the man has a (large penis):1.5" with Chroma, WAN, etc.
>>
>>107333376
Sora fails hard at everything.
>>
>>107333768
>>107333772
yeah it works alright. needed it to balance out the huge titties, wide hips, and hugo boss number on abby here. Without emphasis it really killed detail and accuracy on the outfit.

https://files.catbox.moe/3jdvv1.png
>>
>>107333802
>nazi shit
cringe
>>
>>107333802
>nazi shit
based
>>
>>107333802
cringe

>>107333819
yeah if you're an edgy 12 year old.
>>
>>107333828
holomeme didnt happen, you cant even prove its mathematically possible, sorry
>>
File: 1762322156475960.jpg (1.33 MB, 2668x2018)
1.33 MB
1.33 MB JPG
We really got cucked with the dev model, flux 2 pro looks way more realistic
>>
>>107333841
>implying I care about some fake news from jews
nazi shit is still cringe
>>
>>107333856
>a 6b model somehow BTFO's at the very least qwen
How?
>>
>>107333857
if youre not jewish what part of it is cringe, soi?
>>
File: ComfyUI_04649_.png (614 KB, 832x1216)
614 KB
614 KB PNG
>>107333856
They want you to pay for their API.
Of course it does.
>>
>>107333876
>what part of it is cringe
forced edginess is cringe. you might as well say you like my little pony while you're at it, faggot.
>>
File: Z-image turbo.png (1.51 MB, 1024x1280)
1.51 MB
1.51 MB PNG
>>107333872
looks like the non-Qwen team at alibaba is the more talented one lol
>>
>>107333892
sir, this is 4chan. there's nothing edgy about posting nazi stuff here
>>
>>107333856
Qwen Image is so unrealistic it looks like a nba2k game lool
>>
>>107333892
>calls someone a faggot
>gets triggered at a nazi uniform
reddit is most definitely more your speed lil bro
>>
>>107333856
>Celtics
Nice
>>
>>107333904
>this whole site is cringe thus I must be cringe too
kill yourself newfag.
>>
>>107333892
what are you even sperging about? im gonna cum to abigail shapiro either way. you're the only one crying about it to this extent. why even care this much?
>>
ok chinaman
give it to me
what is the catch?
>>
nazi larp is cringe, commie larp is cringe, larping is cringe.
cringe nigger alert
>>
>>107333929
>what is the catch?
it's only great at realistic shit, not so much at everything else
>>
File: Z-image turbo.png (846 KB, 720x1280)
846 KB
846 KB PNG
oh momyyyy
>>
File: ComfyUI_00045_.jpg (1020 KB, 2048x2048)
1020 KB
1020 KB JPG
"4k quality, highly detailed professional photo. a portrait photo of a beautiful nordic woman with blue eyes and blonde hair. she stands in a forest during a sunset. she is backlit by the sunset as her face is lit by a sliver of light from above. her skin is realistic with freckles. her expression is disgusted with the viewer, frowning. the depth of field of the background is beautiful and there is a bit of dust particles in the volumetric haze."

Ok, I abandon flux 2.
>>
>>107333952
Do we know when it will be released? I want to run that shit on ComfyUi ffs.
>>
>>107333944
Why does Pinkie Pie have a massive hunchback?
>>
>>107333206
> multiple men, standing
>>
File: 1752095934303075.jpg (1.64 MB, 2592x2304)
1.64 MB
1.64 MB JPG
https://www.reddit.com/r/StableDiffusion/comments/1p7a1g3/some_anime_style_images_i_generated_with_z_image/
For those wondering how well it fares at anime
>>
>>107333963
that one million gigabyte mistral text encoder is really proving its worth here
>>
File: 1739751800916514.png (3.52 MB, 1248x1824)
3.52 MB
3.52 MB PNG
>>
>>107334005
for a model ""specialized on realism"" it produces better anime images than your regular Qwen/Flux slop lool
>>
>>107333963
kek

>>107334005
>>107333946
are you sure about that?
>>
>>107334029
>are you sure about that?
I rest my case, seems like this model is pretty good at everything
>>
>>107333963
>no idea how to prompt
>the model must be bad

go back to sdxl anon
>>
>>107333944
ponyfag, makes sense why you're cringe. you'll troon out by next year.
>>
File: WHAT A LOSER.png (239 KB, 640x478)
239 KB
239 KB PNG
>>107334039
you lost to a 6b model you bfl-cuck employee
>>
>>107334005
Aieee sam altman save me, it's too dangerous
>>
>Flux 2 can't "obtain view from side" on images
lol...
>>
>>107334058
*it worked after 4 seeds...
>>
>>107333685
it knows who is JJ Lin?
>>
>>107333952
>great at realism
yeah ok whatever, I sleep
>>107334005
>great at anime
REAL SHIT??
>>
Why arent we using FastWan?

>Sparse distillation for Wan2.1 and Wan2.2 to achineve 50x denoising speedup

https://github.com/hao-ai-lab/FastVideo

Looks like they've released I2V https://huggingface.co/FastVideo/CausalWan2.2-I2V-A14B-Preview-Diffusers/tree/main
>>
Damn I wanna fine-tune z image.
>>
>>107334073
does it work with newer lightning loras at actual minimal quality hit?
how does it compare to painter nodes?

testing all this is annoying so people use what works
>>
>>107334073
Why aren't you? Post some examples. Shouldn't take too long, right?
>>
>6B parameters
>modern text encoder
>good at realism
>good at anime
>nearly Qwen-Image levels of prompt comprehension
>less slopped and better vibes than basically any other base model
>apache 2 license
Yeah, I'm thinking this is the model to finally kill SDXL once and for all. Fucking finally.
>>
File: img_00072_.jpg (507 KB, 1264x1592)
507 KB
507 KB JPG
>>107334005
i wonder how do they do this with such a small model
>>
>>107334112
>Yeah, I'm thinking this is the model to finally kill SDXL once and for all. Fucking finally.
you have to respect how long SDXL managed to stay relevant though, almost 3 years lmao
>>
>>107334005
> 6 fingers
>>
File: alex jones approval.gif (2.12 MB, 177x210)
2.12 MB
2.12 MB GIF
>it's even REALLY fucking good at anime out of the box
this is it.
>>
>>107334113
a lot of performance and quality was always on the table, look at how much pony v6 improved the quality its base model back in the day, bigasp loras etc, small tuning can drastically improve the models, let alone when a company doesnt fuck up the model themselves from the start
>>
>>107334112
There's always a catch. SDXL survived all other models, chances are this one too
>>
>>107334121
>turbo model
>>
BFL employee seething in this thread for being cucked by a small model lmao
>>
>>107334096
>>107334100
Calm your tits, I just found it
>>
>>107334135
>SDXL survived all other models
SDXL is literally doing this aura shit
https://www.youtube.com/watch?v=IRPI3lSACFc
>>
>>107334112
You also need it to respond well to training and not be unreasonably slow to run inference from (can't tell purely from param size) but yes honestly the most exciting model in a while to say at least.
>>
>>107334157
>You also need it to respond well to training and not be unreasonably slow to run inference from (can't tell purely from param size)
https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo/summary?version=master
>It offers sub-second inference latency on enterprise-grade H800 GPUs
this shit is hella fast, probably faster than SDXL
>>
File: YASSSSS.gif (760 KB, 498x243)
760 KB
760 KB GIF
>>107334122
>this is it.
I waited for this moment for so long, I almost gave it up in believing it, local is saved
>>
>>107334112
> apache 2 license
To kill SDXL you need much more. And this is like illustrious based models compared to pony - just slightly better.
>>
>>107334206
>apache 2 license
>To kill SDXL you need much more.
what's better than Apache 2.0?
>>
>>107334214
Ligma 3.0
>>
File: do it.png (174 KB, 640x640)
174 KB
174 KB PNG
>>107334005
I'm not gonna lie, if this alibaba team can cook such a great image model, what's preventing them to do the same on videos? I can see them making something better than Wan 2.2
>>
Yume is anime SOTA
>>
>>107334233
I almost forgot that existed..
>>
>>107334214
It was accidental quote.
>>
>>107334174
Wow.
Z-Image is the GEM that saved local diffusion from API demons.
All Hail Alibaba
All Hail Chinese overlords.
>>
File: plus 15 social credit.png (414 KB, 1280x720)
414 KB
414 KB PNG
>>107334251
>>
>>107334258
+15 izzat credits
>>
>>107334233
And other hilarious jokes you can tell to yourself!
>>
when are your chinese overlords releasing Z?
>>
File: 1758243525681183.jpg (1.02 MB, 2460x1586)
1.02 MB
1.02 MB JPG
>>107334005
https://www.reddit.com/r/StableDiffusion/comments/1p7a800/zimageturbo_anime_generation_results/
Here's some more anime kino
>>
>>107334282
they said today, so we're waiting...
>>
File: Z promotional image.jpg (80 KB, 1024x1024)
80 KB
80 KB JPG
realistically speaking, how can white w*men compete?
>>
Z-Image sounds too goid to be true
>>
>>107334313
pls I really hope its not some bait and switch shit.,its waay too good
>>
>>107334282
comfy already supports it, the pr has been merged already so it's imminent, also a PR for flux 2 scheduler previews just went in if anyone is brave enough to update.
>>
>>107334282
>>107334324
soon enough, Comfy implemented the inference code yesterday, so the release will come right after
>>
>>107334300
this is actually pretty good, and like it knows some characters IP, no more migu spam anymore, that's always a good thing
>>
File: 1632792585100.png (486 KB, 500x762)
486 KB
486 KB PNG
i don't even care about my thanksgiving pig out in less than 24 hours, all i care about is genning 1girl (huge booba:1.5) with the new best realistic chinese model
thank you xinnie the poo
>>
>>107334339
Asian 1girl >>>>>>>>>>>
>>
>>107334339
>i don't even care about my thanksgiving pig out in less than 24 hours,
to be fair, I know this model will be my best thanksgiving ever, we peaked lol
>>
>>107334314
Actually existing may help.
>>
>>107333447
>>107332562
The flux2 output is obviously following the prompt better despite the skin texture. Prompted for dim lighting, but get a blast of hipster flash. Noted that the shoulders of the top should be wet, but got a top with no shoulders.

It's just a different kind of slop where it's still deciding what looks nice regardless of the ask. and with the flux images, by providing the plastic skin by default it's at least not directly contraditicting the prompt.
>>
>>107334324
>>107334326
>>107334327
If Z is actually what it seems, then FLX2 is DOA
>>
>>107334343
>implying some giga autist won't immediately have lora training figured out in less than 5 hours from release
or even just tags that seemed to work through brute forcing
>>
>>107334346
existing within the digital world is preferable to ours
>>
File: wut.jpg (12 KB, 185x185)
12 KB
12 KB JPG
What was Flux again, 12B? I can barely run that shit quantized to Q6_K on my shitbox but it's very slow so if Z-Image is half the size that should be good news for me (sometime in the future when it gets quantized).
>>
>>107334314
by virtue of not being a bug
>>
>>107334355
it's a turbo model, the base model will respond better to prompts, and even if it's not the case you will be able to use the "reasoning" shit to make it even better at prompt understanding >>107333756
>>
File: flux2__00044_.png (1.68 MB, 832x1216)
1.68 MB
1.68 MB PNG
>>107334143
kek
>>
that is fucking insane. no way a turbo'd 6b is doing this. we're being memed.
>>
>>107333376
Z-image with the same prompt; Euler, 9 steps. Can't get it to have it in more steps, it gives me an img not found error
>>
File: 1741916672643899.png (431 KB, 800x582)
431 KB
431 KB PNG
>>107334381
>no way a turbo'd 6b is doing this. we're being memed.
don't underestimate china, they got this
>>
>>107334388
>9 step model somehow nearly matches fucking nano banana
wuh?
>>
>>107334366
>What was Flux again, 12B?
yep, so it's twice as small but it looks way better and knows way more concepts
>>
>>107333756
lolol it's the incredibly useful 'solve a math problem on a whiteboard image' demo again
>>
>>107334336
Does it know teto?
>>
so whats the catch? theres always a catch
>>
>>107334381
JPEG the strongest
>>
>>107334400
I wouldn't go that far, the text is noticeably more slopped.
But the overall quality is very good, yes.
>>107334422
China will win AI race.
Which I am totally fine with.
>>
>>107334413
>it's the incredibly useful
you have no idea how useful it actually is, you can literally say something really vague like "make a comic about this subject" and it'll do everything, including the script, like Nano Banana Pro

for example, on that example >>107332797
I used this simple prompt
>Create a manga page (in color) on the following topic: a new local 32b image model called “Flux 2” (developed by the German company bfl) has just been launched and users are testing it, but they end up being disappointed (not realistic enough and too big). A day later, the Chinese company Alibaba releases a new local 6b model called “Z-Image” that is much better than Flux 2, almost as if they had waited for bfl to release their model so they could humiliate them right after (they are really petty).
>>
>>107334366
>just downloaded Q4_0
>between the model and encoder its 35GB
>35GB

Thinking of just giving this one a miss and waiting for chroma 2 or some shit. I'm already running out of space on a 2TB SDD (and that's after getting rid of all of the models I didn't use).
>>
I have a bad feeling about this, I don't buy it after all the failed models
>>
>>107334422
>so whats the catch?
you have to accept the chinese state mandated bugwife who will record your sex to send to alibaba for nsfw model training of which you will only get the distilled turbo model.
>>
>>107334370
and not eating ones (also bats, snakes, cats and dogs).
>>
>>107333262
>idk
what you dont know could fill a library
>>
>>107334422
>so whats the catch? theres always a catch
so far the only thing that could go wrong is nudity, and even that isn't that serious lol
>>
>>107334400
Goes to show that the "pile more layers on it" is super flawed, at best

another z-image one, I tried the prompt improved-version of the F-35 pic I tried last thread:Professional quality modern analog photograph with visible film grain and vibrant color palette, featuring a single F-35 Lightning II in mid-air performing a tight pirouette maneuver. The aircraft’s entire fuselage and wings are covered in intricate, full-color irezumi-style tattoo illustrations — including traditional motifs such as koi fish, dragons, cherry blossoms, and wave patterns — rendered with fine linework and saturated pigments that contrast against the jet’s metallic surface. Colorful smoke trails in magenta, cyan, and gold emanate dynamically from both wingtips, curling through the air in response to the aircraft’s rotation. The shot is framed as a beauty portrait, captured from a low-angle three-quarter rear perspective to emphasize motion and artistry. Lighting is bright daylight with soft directional highlights reflecting off the jet’s curves and inkwork. Background is a clear sky gradient transitioning from pale blue at the horizon to deep azure overhead. The image appears as a full-page editorial spread in a professional aviation magazine; overlaid text in English is clearly legible and positioned along the bottom margin in a clean, sans-serif typeface, reading: "F-35 Lightning II: Where Stealth Meets Tradition". Text block is horizontally centered, 12-point size, with 1.5x line spacing and subtle drop shadow for readability against the background.
>>
>>107334355
This level of cope...

Flux open models will always look like plastic, which makes them worthless for anything realistic involving people
>>
>>107334421
>Does it know teto?
I hope so
>>
>>107334214
AGPLv3
>>
>>107334439
I wonder if adding more steps: 20, 40 etc. Will improve the text. Unfortunately modelscope gives me an "img not found" error if I try more than 9 steps
>>
What's the best Local Diffusion model that can generate good looking images with minimal prompt. Good looking in terms of something that average boomer isn't going to recognize as AI?
>>
File: Z-image turbo.jpg (161 KB, 864x1152)
161 KB
161 KB JPG
>>
>>107334005
gonna wait for danbooru finetune
>>
>>107334445
It's cool, but
a) it's done by having an llm think about the initial prompt and then generate a very detailed prompt for the image generator, so the generator itself isn't doing the heavy lifting on that (autoregression isn't a requirement). The important thing is complex prompt adherence.
b) this is useful for people who don't know what they want and have the mindset of commissioning a synthetic artist. If you want to act as an artist yourself, you need to maximize control over the image, not delegate large aspects to the computer.
>>
>>107334502
>>107334502
>>107334502
>>107334502
>>
>>107334519
>If you want to act as an artist yourself, you need to maximize control over the image
you have the choice to use it or to not use it, so everyone is happy
>>
>>107334422
the examples so far have what looks like severe compression or scaling artfacts
>>
>>107334490
The answer is most probably yes but by how much is up in the air.
I guess we will be able to test locally when it releases SOON™
>>
>>107334224
>alibaba's only real competition is going be themselves
I really want this to happen now, sounds funny as fuck
>>
>>107333379
nice
>>
>>107332443
Offloading 16 GB VRAM to cpu with anon's loading technique >>107324741 with fp8 (which is workflow optimized by Nvidia) it's only 2 mins per gen on my 3090. You might be doing something wrong or GGUF not optimized yet.
>>
>>107332467
>>107332503
This is great, is it uncensored? Also it looks as though the Chinese have never seen a Chroma output, this is not groundbreaking realism y'know.
>>
>>107335955
using reference images makes it take longer
>>
Can someone please tell me what files to download to use Qwen in forge-neo?
Searches engines are shit nowadays, I can't find anything, and nothing Copilot tells me works...



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.