[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


I'd Rather You Start Schizoposting Edition

Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107339853

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe
https://github.com/ostris/ai-toolkit

>Z
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
https://huggingface.co/Comfy-Org/z_image_turbo

>WanX
https://rentry.org/wan22ldgguide
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
>>107341058
>>
First for z-image sucks.
>>
why the fuck is the Nvidia offloading on by default?
>>
total bloatmodel death
>>
>>107341081
how do you turn it off
>>
>>107341081
nobody needs anything else other than flux2
>>
>>107341081
i guess for tranny vidya games?
>>
File: 1764206337705971.png (742 KB, 1503x1664)
742 KB
742 KB PNG
Why is the official ComfyUI page shilling APIShit and not mentioning z-image at all?
>>107341064
I thought Comfy was local-first?
>>
Is this new z image model open to finetunes, including nsfw ones, or is it doa like bfl models outside of flux schnell?
>>
File: ComfyUI_08887_.png (1.42 MB, 1152x1152)
1.42 MB
1.42 MB PNG
>>
reminder that paid cuckmodel shills are itt right now
>>
>>107341103
>flux schnell
That was also DOA
>>
>>107341081
>teehee your entire computer just slowed to a crawl and you don't know why
>teehee
>>
>>107341101
Maybe because the last piece of news was 2 days ago, before the release of z image?
>>
>>107341101
for comfyorg
flux2 > zimage turbo
>>
>>107341113
I meant license wise.
>>
..And after the premiere of Von Braun, we also have a new retelling of Schindler's List!
>>
File: ComfyUI_276397_.png (1.08 MB, 1280x720)
1.08 MB
1.08 MB PNG
>>107341101
People are on vacation for thanksgiving so the blog post might be up Monday.
>>
File: zimage_c-20-0036.jpg (449 KB, 2048x2048)
449 KB
449 KB JPG
Z image artist knowlege is poor. It only knows the most popular, and for those it doesn't know it has the same fallback as Qwen where it uses the name as a reference for the ethnicity of the subject, or for a style bucket (ie Italian name -> generic renaissance style approximation)
>>
>>107341107
as a clean feet lover, this is some good shit, especially with a cute girl
>>
>>107341125
shaniqua's list
>>
>>107341125
Can you make an obese guy in a concentration camp with a Judenstern saying "FLUX 2" on his clothing?
>>
>>107341127
why aren't you commenting on all the complaints racking up? why do you still make the absolute shittiest templates?
>>
>>107341129
>cake the image in a layer of ugly noise instead of actual brushwork
new meta just dropped
>>
File: ComfyUI_08893_.png (1.45 MB, 1152x1152)
1.45 MB
1.45 MB PNG
>>
>>107341146
100% wf issue
>>
>>107341150
is it also good at non asian girls?
>>
>>107341152
Is this the part where you blame comfy instead of the model?
>>
File: ComfyUI_07550_.png (1.53 MB, 944x1280)
1.53 MB
1.53 MB PNG
>>
>>107341162
who cares
>>
>>107341136
sides caved in reading this. thank you.
>>
>>107341172
ded
>>
I did not lose respect for comfy when he started associating himself with avatarfags.
I did not lose respect for comfy when he started enshittifying the UI.
I did not lose respect for comfy when he started adding api nodes.
I lost respect for comfy when someone posted that promo video and we learned that he is a fat fuck irl.
>>
z-image danbooru finetune might be the best thing for /hdg/ troglodytes
>>
>>107341172
Can you get rid of the blur using prompts?
>>
>>107341127

Reminder ComfyUI still:
>doesn't remember queue if you crash/quit
>can't stop the queue if one gen OOMs, it will go through all the queued gens and OOM on them all instantly, forgetting all sent workflows in case you don't have each saved
>can't be scheduled so it only gens within a particular time or at least begin at a set time
>doesn't have "precompute text encodings of all queued gens and throw the encoder out of vram forever" toggle, speeding up bigger models by double digit % if genning multiple images

These wouldn't be as bad if this wasn't trivial to add for them who know the codebase well at least in a dirty way before implementing it properly and it wouldn't be that bad if there werent literal dozens of memory leaks and bad memory allocation code that no matter what OOM you for video gen every once in a while despite unloading all models for every single gen, having 24gb vram 128gb ram and dynamically managed pagefile that gets filled to 170+gb sometimes.

Even the basic feature of just being able to stop the gen mid way through a step instead of having to wait multiple minutes for it to finish for high quality video gens wasnt implemented until a few days ago and ONLY after some guy created a node to do it first, proving that it's obviously possible. Only after that did comfy write what were essentially two unique lines of code that added this basic feature.
>>
>>107341181
/hdg/ would be too busy irony posting to even care
>>
This is a major upscale for Seedream 5, I just know Bytedance won't let themselves be beat by open sores
>>
>>107341180
I never respected comfy and was a spaghetti hater since day 1
Total spaghetti death
>>
>>107341182
the blur *is* part of the prompt famalam.

A dramatic movie poster for 'Shaniqua's List'. A serious-looking, very fat overweight African American woman with curly hair, wearing a black nazi uniform with a red armband featuring a white circle and black swastika on her left arm, stands prominently in the foreground. In the blurry background, numerous people in striped prisoner uniforms are visible in a somber, industrial setting with dark, muddy ground. The top left corner shows the NETFLIX logo in red, followed by 'Presents' in white. The title 'Shaniqua's List' is at the bottom, in a classic, slightly distressed white serif font. The lighting is dim and dramatic, casting a serious tone.
>>
>>107341168
wheres the jpg artifacting in images like >>107341150
>>
File: ComfyUI_08895_.jpg (443 KB, 2048x2048)
443 KB
443 KB JPG
>>107341089
Kek, forgot to set it to 2k
>>
>>107341193
oh ok I thought it was forced
>>
>>107341186
>stop the gen mid way through a step
That issue stems from ai researches usually being dogshit programmers and pasting reference code as is into the node is much easier that fixing it.
>>
Reminder than comfyui sends all your data to their cloud for analysis and can format your hard drive anytime if they want.
>>
>>107341186 (me)
>Even the basic feature of just being able to stop the gen mid way through a step instead of having to wait multiple minutes for it to finish for high quality video gens wasnt implemented until a few days ago and ONLY after some guy created a node to do it first
I copied this from my previous posting of this complaint, and this part is no longer true, this happened weeks ago by now instead of few days.
>>
>>107341214
cool i hope the glowniggers see this one that's cracking me up
>>
>>107341195
most obvious at the ends of the hair, but whole picture looks like it has noise reduction at 200%
>>
File: zimage_c-20-0040.jpg (160 KB, 1328x2048)
160 KB
160 KB JPG
>1girl, sculpted by michelangelo
>>
WHERE IS THE LORA SUPPORT???
>>
File: ComfyUI_23498_.png (3.1 MB, 1280x2048)
3.1 MB
3.1 MB PNG
>>
You have to update comfy to use z, but o remember just a few days ago someone showed that the latest update deliberately made the up worse and hide a bunch of shit. For those of you who not the bullet and updated, how bad is it.
>>
File: Z-Image-Turbo_00022_.png (3.56 MB, 1280x2048)
3.56 MB
3.56 MB PNG
>>
>>107341242
Hagbros eating good
>>
>>107341237
Made the UI worse*
>>
>>107341237
ComfyUI is terrible, rendering nodes tanks it to 20FPS. And it's not a model loading issue because if I just move the viewport away from the nodes it spikes up to 240 FPS.
>>
File: ComfyUI_08896_.png (1.67 MB, 1152x1152)
1.67 MB
1.67 MB PNG
>>107341129
Shame, it does do stylized photos quite well. Flux.2 however does seem to know every artist style I threw at it, but I have some hopium that this is due to distillation and when they give us base model it will be better since Z knows a few of the ones I tested from Flux.2.
>>
File: ComfyUI_01590_.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
>>
Comfy should be dragged out on the street and shot
>>
>>107341255
when previewing images, resize them down to like 32px
it's some retarded approach to rendering texture arrays.
>>
File: ComfyUI_08897_.png (1.82 MB, 1152x1152)
1.82 MB
1.82 MB PNG
>>
Can Z-Image do upside down faces?
>>
Has anyone else noticed that every big SAAS model is starting to look the same? I feel like there was more variety in a batch of Dall-E 3 gens than in Flux 2, Seedream, Nano Banana Pro, any of this shit.
>>
File: ComfyUI_08898_.png (1.69 MB, 1152x1152)
1.69 MB
1.69 MB PNG
>>
>>107341283
the famous "upside down on grass" test?
>>
>>107341285
They're probably being trained on each other's slop
>>
>>107341301
I think they're just plagiarizing each others' methods
>>
>>107341285
The money being put into AI wants it to be replicating a certain thing instead of doing it's own unique thing like what a creative type would want
>>
Hello everyone, how have threads been?
Is the poop dick schizo still at large?
>>
File: Z-Image-Turbo_00024_.png (3.52 MB, 1440x2048)
3.52 MB
3.52 MB PNG
>>
File: ComfyUI_08902_.png (1.22 MB, 1152x1152)
1.22 MB
1.22 MB PNG
>>
>>107341303
Why not both
>>
File: zimg_0131.png (1.33 MB, 832x1216)
1.33 MB
1.33 MB PNG
>>
>>107341313
Feet look like rubber, very bonerkilling
>>
File: ComfyUI_08903_.png (1.83 MB, 1152x1152)
1.83 MB
1.83 MB PNG
>>
Lumina2 + few step distil + realism lora
chinese revolution
>>
>>107341312
Very unsafe gen
>>
..And we're back with LDG news. Breaking story, PoopDickSchizo is back, and this time he's like a hydra with many unkillable heads.
>>
File: wan22__00003.mp4 (238 KB, 480x480)
238 KB
238 KB MP4
>>
File: ComfyUI_276420_.png (2.31 MB, 1536x1536)
2.31 MB
2.31 MB PNG
>>107341186
frontend issues like queue stuff will probably get fixed at some point.

If you have a workflow that is OOMing go make an issue with it on the repo.
>>
File: ComfyUI_08904_.png (1.52 MB, 1152x1152)
1.52 MB
1.52 MB PNG
>>
File: ComfyUI_08905_.png (1.4 MB, 1152x1152)
1.4 MB
1.4 MB PNG
>>
>>107341285
The more data samples, the closer the results will look between models, even if the datasets are different.

Similar to how a live poll's results will swing back and forth with the initial votes, but then 3,000 votes in and the percentages barely move anymore. They stabilize around a particular result.
>>
>>107341354
I don't buy it.
>>
So will it be possible to train loras for turbo? It was still possible with Flux dev and schnell despite being distilled. Though the loras all wrecked the anatomy
>>
File: wan22__00004.mp4 (506 KB, 480x480)
506 KB
506 KB MP4
>>
File: 1761139341590710.jpg (267 KB, 1013x1449)
267 KB
267 KB JPG
added sdxl, so basically z image is almost twice as big as sdxl
>>
>>107341336
is that z image?
>>
>>107341354
unless they're all training on the same shit that doesn't make any sense
>>
fucking great model. based alibaba
>>
anyone tried the z image edit model? is it even out?
>>
>>107341369
this chart is something you made?
>>
Man I must be tripping because this model feels like every other chinese model.
Like I've already seen these images before.
>>
File: ComfyUI_01629_.png (2.84 MB, 1280x1920)
2.84 MB
2.84 MB PNG
step right up, step right up
>>
>>107341395
used perplexity because I was annoyed at never remembering what model was what size
>>
>>107341369
thank you based chartautismo

>>107341385
it's fantastic, way better than i could have expected.

>>107341397
might be a flux trauma symptom, many such cases.

>>107341398
kino
>>
>>107341369
Z-image runs fullsize on 12 gigs perfectly fiine. 4070S here.
>>
>>107341380
I think he is treating "different data" as different large samples from the same set of data, that set of data being "all the data that exists"
>>
>>107341336

Other ComfyUI QoL things:
>doesn't have a native "fuzzy match model names in loader nodes" feature that automatically resolve paths to model files that were moved since last usage, or, god forbid, just find those models anywhere they might be by their unique hash
>doesn't have "widget control mode: before" as default seed changing behaviour, which is much more intuitive
>goes to first workflow tab when closing the currently active one instead of going to the one that was last used or at least the one next to it, helping people with many active tabs
>doesn't allow you to drag and drop a workflow tab anywhere on the bar to the right but instead you have to drop a workflow on top of another
>doesn't have a quick swap button for hight/width of image resolutions on nodes
>can't foward image dimensions from load image node to nodes that use dimensions
>when loading a workflow and running it, even if you have seed set to randomize, the first time will silently use the seed in the workflow instead of randomizing it
>workflows doesn't have fuzzy search or anything similar, if you search "wan lora 1" it wont find your "wan 2.2 lora 1" workflow
>>
>>107341369
>2x as big as SDXL
>2x as slow as SDXL
>2x the resolution of SDXL
>4x res VAE compared to SDXL
>at least 10x better than base SDXL
this is the scaling that we need. not shit like hidream with 4 text encoders that's 4x bigger and 2x slower for images not even 10% better than flux 1.
>>
>>107341402
would be neat to see with a bunch of models going back to SD1 and before. i wonder how accurate perplexity is with shit like sana or other obscure models
>never remembering what model was what size
same
>>
Drag and shot the comfy
>>
>>107341369
Turns out license wise, the only time bfl ever cared was with Schnell.
>>
>>107341412
bahaha.. uh ok.. well that also doesn't make sense but sure
>>
>>107341363
>>107341380
Imagine if you took one trillion completely random photographs in the world, for one model. Then another trillion random photographs in the world, for another model. They would start to start to look the same, the randomization ends up being less random the more data samples there are.

You can scale up this thought experiment:
Imagine if you took infinite photographs of the world (essentially building a simulation of earth) for one model. Then took infinite photographs of the world for another model. The two models end up being the same.

You can scale down this thought experiment:
Imagine if you took 1000 photographs of the world for one model and 1000 photographs of the world for another model, the two models would be much more randomized than the trillion version is.
>>
File: ComfyUI_08906_.png (1.57 MB, 1152x1152)
1.57 MB
1.57 MB PNG
>>107341326
Yh unfortunately only a few come out non slopped enough for my tastes.
>>
File: 1569698720378.jpg (99 KB, 448x537)
99 KB
99 KB JPG
a WAI-style nsfw tune of this gonna go hard.
>>
>>107341425
give me a few and I can try
sd1.5?
>>
File: ComfyUI_08907_.png (1.74 MB, 1152x1152)
1.74 MB
1.74 MB PNG
>>
File: Z-Image-Turbo_00025_.jpg (1.75 MB, 2048x2048)
1.75 MB
1.75 MB JPG
>>
>>107341440
top of feet, so nice
>>
BLACK FOREST LABS POOL CLOSED
>>
>>107341430
they never cared. schnell was a pseudo-local release designed to bait 'developers' into thinking the model could be salvaged. meanwhile it was giga-slopped, hyper-distilled, and sabotaged to the point where it wasn't finetuneable. chodestone spent $100k of his budget de-distilling it and rebaking from scratch only to be left with a mess of melted limbs and nonsense artifacts.
harsh lesson to never attempt serious development on anything BFL releases, it's bloated dated garbage designed to shill API. flux 1 was only relevant because it was the first natural language local model, though scholars at the time correctly pointed out that it was miles behind dall-e 3.
>>
File: uni_pc_z-image__00003_.png (1.81 MB, 1024x1024)
1.81 MB
1.81 MB PNG
>>
This model really changed my mind that bigger models was the only way forward. I am a "size don't matter" believer now
>>
File: ComfyUI_08908_.png (1.65 MB, 1152x1152)
1.65 MB
1.65 MB PNG
>>
>>107341466
any improvement that happens with a small model will be better on a bigger one, hardware will be the bottleneck for years to come until we get a small model that will be able to do everything
>>
File: 1752087299450009.png (27 KB, 150x150)
27 KB
27 KB PNG
>>107341449
>>
File: ComfyUI_08911_.png (1.62 MB, 1152x1152)
1.62 MB
1.62 MB PNG
>>
>>107341453
A bunch of
Blacks
Farting
Logs
is the best evropa can do now... really makes you think.
>>
>>107341446
1.5, 1.4, VGAN+CLIP, Pixart Sigma, Lumina 2, both Schnell and Dev Flux, uh.... fuck i swear there are more
>>
>>107341444
>wai
stop it, slopped
wai is just a shitmerge
now noob finetune on other hand
>>
>>107341462
I guess you're right, especially concerning how hard it was to finetune schnell.
Kind of sad, but it seems they don't really care about the competition and just do their own thing.
>>
Is Zimage comaptible with any of the speedtraining snakeoil methods?
>>
File: ComfyUI_08912_.png (1.66 MB, 1152x1152)
1.66 MB
1.66 MB PNG
>>
>>107341444
>WAI-style nsfw tune
Once again, not a tune.
>>
>>107341466
>I am a "size don't matter" believer now
it's more that there was plenty optimizations to have at any size, so this shows the 20B+ models could look even better if they did the same
>>
File: ComfyUI_08913_.png (1.61 MB, 1152x1152)
1.61 MB
1.61 MB PNG
>>
>>107341127
Hi comfyanon, can you stop bloating up the frontend? It's dropping frames baka
It's supposed to litegraph, but now it's bloatgraph
>>
>>
>It doesn't know natalie portman
welp, at least it got every detail right anyway kek
>>
>Comfy 3.75
Why does middle mouse wheel now clone the entire fucking work flow and how do I un-keybind this shit.
>>
>>107341451
I've got canal paths near me that look just like this. Majestic.
>>
>>107341484
ok
>>
File: ComfyUI_08914_.png (2.04 MB, 1152x1152)
2.04 MB
2.04 MB PNG
>>
>>107341508
how many mouse wheels you got?
>>
File: ComfyUI_08915_.png (1.7 MB, 1152x1152)
1.7 MB
1.7 MB PNG
>>
>>107341466
Big models were always memes because big=static, you can't/barely can modify them with local hardware. That's why XL is so popular, every retard can play around with the models, meaning lots of interesting things get done. Bloatmodels might be able to do more out of the box but once you run into a limitation it's basically over because the model is static for all intents and purposes.
>>
>>
>>107341476
>hardware will be the bottleneck for years to come until we get a small model that will be able to do everything
Not if we're still using attention/transformers. In 4 years the perspective will change and a "small model" will be one that fits onto an entry level 48gb GPU

>>107341488
Kek what competition? They are the weights-available leaders in the West
>>
File: ComfyUI_08916_.png (1.46 MB, 1152x1152)
1.46 MB
1.46 MB PNG
>>
To our new golden age. Cheers fellas.
>>
How good is zimage in following a complex prompt?

Example, the one used for qwen :

A vibrant, warm neon-lit street scene in Hong Kong at the afternoon, with a mix of colorful Chinese and English signs glowing brightly. The atmosphere is lively, cinematic, and rain-washed with reflections on the pavement. The colors are vivid, full of pink, blue, red, and green hues. Crowded buildings with overlapping neon signs. 1980s Hong Kong style. Signs include:
"龍鳳冰室" "金華燒臘" "HAPPY HAIR" "鴻運茶餐廳" "EASY BAR" "永發魚蛋粉" "添記粥麵" "SUNSHINE MOTEL" "美都餐室" "富記糖水" "太平館" "雅芳髮型屋" "STAR KTV" "銀河娛樂城" "百樂門舞廳" "BUBBLE CAFE" "萬豪麻雀館" "CITY LIGHTS BAR" "瑞祥香燭莊" "文記文具" "GOLDEN JADE HOTEL" "LOVELY BEAUTY" "合興百貨" "興旺電器" And the background is warm yellow street and with all stores' lights on.
>>
>>107341520
>>107341531
What's the surefire way to get rid of the psudocompression? These look good.
>>
>>107341520
>uuuuohhh husbant...
>>
>>107341528
>Kek what competition? They are the weights-available leaders in the West
Competition in general, it's not like it matters if the weights are Chinese, German or whatever.
>>
File: Z-Image-Turbo_00027_.jpg (1.68 MB, 2048x2048)
1.68 MB
1.68 MB JPG
>>107341511
>>107341451(You)
>I've got canal paths near me that look just like this. Majestic.
Lucky. My canal paths have gators
>>
File: euler_z-image__00004_.png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
>>
>>107341528
>4 years
>48GB entry level GPU
What are you smoking
>>
https://www.reddit.com/r/LocalLLaMA/comments/1p4urm7/we_are_considering_removing_the_epstein_files/
>>
>>107341545
i have a subway near me that looks just like this!
>>
>>107341421
>>2x as slow as SDXL
this is with extreme distillation. Even FLUX1 could run in 3s on a 3090 with distillation.
>>
If this is what z image turbo can do I can't wait to see what the full model does
https://files.catbox.moe/7nvqib.png
>>
>>107341534
that certainly pushes its limit
>>
were there genuinely retards itt who thought the bloated slow crap like flux/chroma/qwen/hidream/neta would actually take off? Nobody wants to wait 30+ seconds for subpar 1024x gens on a 5090. there is a reason SDXL remained winning for so long. z-image is the first model since SDXL with the actual potential to both dethrone it while being a full upgrade, not a sidegrade.
>>
>>107341560
whoooaaaa...it can do 1girl...standing there?!?!
>>
File: ComfyUI_08917_.png (1.46 MB, 1152x1152)
1.46 MB
1.46 MB PNG
>>
>>107341542
If the weights were german, the model would be worse for cooking /ldg/ asian 1girls, baka
>>
>>107341573
I see you are new here but this is a base model. Even sd 1.4 wasn't this uncensored to start
>>
>>107341560
apparently the full model is worse currently
>>
>>107341573
sd3 couldn't kek
>>
>>107341542
>Competition in general
BFL is German, 100% they're trying to get contracts where the fact they're not a Chinese company is relevant here

And even without adjusting the goalposts I'd say the local competition is all equally in a state of "don't use this for actual work ever unless you have a really good reason like privacy or compliance" right now in terms of generative AI


>>107341550
>What are you smoking
Mostly distillate with terpenes but I'm thinking of splurging on one of those gimmicky disposable vapes with two strains in one, it's half live diamonds and the other half live resin

But this is just a napkin math guess based on current scaling + china catching up + the fact that 40gb A100s should be cheapish by then

I should have said "entry level AI GPU" because I also think the future of inference is dedicated discrete cards
>>
>>107341573
and its prompt following is insane if that is what you are after, that has already been shown off, and its skin detail is the best from any base model as well
>>
>>107341067
Wtf Bruce Lee? How could you hurt migu?
>>
File: file.jpg (813 KB, 2048x2048)
813 KB
813 KB JPG
>>107341565
Actually better than what I expected, see what Qwen does.
>>
>>107341589
its not out yet so what insider knowledge are you claiming?
>>
>>107341285
I think it's because of how they all clean up your prompt with LLMs now.
>>
>>107341569
It was mostly anons flexing that they can run them I guess, it was obvious they won't take off given their sizes
>>
>>107341589
probably not worse but not good enough for what they want as its still training
the guy who leaks shit on twtter who said that speaks through a language barrier
>>
File: ComfyUI_08923_.png (1.33 MB, 1152x1152)
1.33 MB
1.33 MB PNG
Seed variety is really bad on some prompts, like it locks in like Qwen.
>>
>>107341592
>the future of inference is dedicated discrete cards
I've been hearing that since 2022.
>>
>>
>>
>>107341615
yea, that is the trade off with strong prompt adherence, but fine tuning for more creativity instead will be easy
>>
File: 1738215698424927.png (165 KB, 2040x893)
165 KB
165 KB PNG
>>107341604
>>
So Z Image is done by Alibaba but not by the Qwen team? Why do they have multiple teams on this
>>
>Prompt enhancer with z-image-turbo might be better . System prompt is on its way!
https://xcancel.com/srameojin/status/1993793896397320193#m

THIS ISN'T EVEN THEIR FINAL FORM
>>
>>107341636
sounds like a bad translation if anything to me
>>
>>107341636
it's a bit weird that the distilled is better
>>
>>107341599
phew. wow. that's Qwen huh? there really isn't a model left out there that Z isn't assraping.
>>
>>107341569
Z and neta run at a similar speed on my system so
>>
File: lcm_z-image__00007_.png (1.49 MB, 1024x1024)
1.49 MB
1.49 MB PNG
>>
>>107341560
>actually proper genitalia
that's already so much better than flux nonsense on this
>>
>>107341543
>Lucky. My canal paths have gators
Exotic.
>>
>>107341633
>fine tuning for more creativity
is that a thing? any finetuning just removes the creativity iirc.
>>
>>107341639
So they can decide which one gets promoted to API-only
>>
Maybe it's because it's Turbo but this model seems very deterministic. Using the same prompt generating 4 images with different seeds results in almost identical images a lot of the time. Even if the prompt is very vague like "A monster under the bed" which could look like fucking anything.
>>
>>107341657
i asked it for a penis and got a disgusting mass of flesh.
>>
>>107341640
>Prompt enhancer with z-image-turbo might be better . System prompt is on its way!
care to explain what are these?
system prompt is for LLMs, so what does it have to do with zimage?
and prompt enhancer?
>>
>>107341647
Qwen FP16 yeah.
>>
>>107341659
No, finetuning is just changing parts of the weights, its already been shown that this works for qwen in making it less deterministic.

Also there are samplers that inject more noise between steps that helps too
>>
>>107341661
it's an issue for me as well. i believe qwen was like this too
>>
File: snake codec 1.gif (146 KB, 256x438)
146 KB
146 KB GIF
>>107341669
>care to explain what a system prompt would do with an image gen model?
>prompt enhancer? now you're really not making any sense.
>>
>>107341661
will be fixed by loras when they release the base model. Even just a lora that is mostly noise would do wonders. For now just use a prompt enhance or add random stuff to prompt
>>
>>107341661
For quite a while now I've been of the opinion that no one should ever use a base-model, loras are ALWAYS essential (that's the human-input that makes your outputs look different than someone else's outputs). It's way too easy for any two people's prompts to be the same.
>>
File: ComfyUI_00049_[1].png (1.59 MB, 1024x1024)
1.59 MB
1.59 MB PNG
OMG it Migu!
The watercolor effect on the background is decent but on Miku herself it's a little iffy. Looks more like normal anime style with some artifacts but could definitely be worse.
>>
>>107341661
You have to manually prompt the variation and be specific
If you don't want straight-on photos just tell it you want photo from the side or from behind etc.
>>
File: 1736000191955432.jpg (993 KB, 2048x2048)
993 KB
993 KB JPG
>>107341439
makes sense to me
>>
File: seeds_2_z-image__00003_.png (1.42 MB, 1024x1024)
1.42 MB
1.42 MB PNG
>>107341702
im hard now
>>
>>107341514
that monogatari live action looking good
>>
Can Z-Image do XI Jinping or Winnie the Pooh?
>>
>>107341618
>I've been hearing that since 2022.
You don't need to hear it, just look at the inference speeds you can get on custom hardware with stuff like Groq. Also both Nvidia and AMD are dedicating more resources to discrete NPUs etc

Here's another perspective: The gaming GPU is almost dead in favour of the APU at this point. Consumers have developed the learned helplessness about not affording high end GPUs and have always enjoyed consoles. Couple this with the fact that Nvidia and AMD make buckets on data center compared to consumer, the fact that consumer GPUs have already reached that point e.g. the 5090 is just a VRAM gimped RTX 6000 Pro Blackwell, and the fact that eventually you have to cave and do unified memory/soldered memory to get the higher bandwidth speeds you need like Apple Silicon or DGX clusters do, I really don't see discrete GPGPU being a thing for very much longer
>>
>>107341709
Yes >>107341620
>>
>>107341709
yeah
>>
>>107341709
you got it boss
https://youtu.be/PGa3xmdVvMM?si=xM9T_UzurJx1nh2y
>>
>>107341722
fuck yeah
>>
>>107341661
I wonder if this will help making an animation workflow later on.
>>
File: ComfyUI_temp_trrch_00001_.png (1.89 MB, 1280x1024)
1.89 MB
1.89 MB PNG
>>
now we just need LTXV-2 to release so we can be freed from the shackles of WAN
>>
>>107341692
>(that's the human-input that makes your outputs look different than someone else's outputs)
>t. promptlet
>>
File: ComfyUI_08928_.png (1.51 MB, 1152x1152)
1.51 MB
1.51 MB PNG
>>
>>107341741
kandinsky 20B already did that but its fucking fat and comfy still has not merged kaji's implementation cause comfyui hates torch compie which it needs apparently
>>
File: ComfyUI_08927_.png (1.49 MB, 1152x1152)
1.49 MB
1.49 MB PNG
>>
>>107341741
No, we need ltx2 so alibaba will release wan 2.5 out of spite
>>
File: ComfyUI_08931_.png (1.73 MB, 1152x1152)
1.73 MB
1.73 MB PNG
>>
>>107341748
>kandinsky 20B
link?
>>
File: ComfyUI_08934_.png (1.8 MB, 1152x1152)
1.8 MB
1.8 MB PNG
>>
https://files.catbox.moe/m3zlcd.png
Naked Frieren standing in a fantasy forest setting, her breasts and vulva exposed, vulva, pussy, 2d, anime screenshot, masterpiece, high resolution, very aesthetic
>>
>>107341742
It's not really about me, it's about the million other users who only type ONE sentence for their prompt. There's only so many ways a person can type one sentence, and they're all colliding, all producing the same result and feeling like a retard when they see their album cover as a video game asset in someone else's product. This is going to be sadly common in the future unless everyone adopts standards like using loras for the sake of uniqueness.
>>
2 Years ago I would not have questioned if these amateur photos are real. It's so over.
>>
File: ComfyUI_08937_.png (1.46 MB, 1152x1152)
1.46 MB
1.46 MB PNG
>>
File: vbrobsdehb301.jpg (28 KB, 499x373)
28 KB
28 KB JPG
>>107341764
>>
>>107341769
fuckin hot
>>
>It's not X, it's Y
stop letting your chatbots in here
>>
File: 1746800384476897.jpg (443 KB, 2069x1141)
443 KB
443 KB JPG
>>107341484
man sd1.5 was so small
>>
>>107341767
https://huggingface.co/collections/kandinskylab/kandinsky-50-video-pro

https://github.com/kijai/ComfyUI/tree/kandinsky5

20B does porn out of the box btw. But its big and slow as fuck
>>
File: ComfyUI_08938_.png (1.42 MB, 1152x1152)
1.42 MB
1.42 MB PNG
>>
File: uni_pc_z-image__00009_.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>>
File: ComfyUI_08939_.png (1.02 MB, 1152x1152)
1.02 MB
1.02 MB PNG
>>
>>107341784
nice feminine energy, next time quote me bitch.
>>
the step-by-step inference on Z Image doesn't seem very fast at all to me DESU, so far. 8 steps Z doesn't seem any faster than 25 steps of NetaYume at the same res
>>
>>107341805
nice OL
>>
>>107341784
You are absolutely right!
>>
File: ComfyUI_08936_.png (1.32 MB, 1152x1152)
1.32 MB
1.32 MB PNG
>>
>>107341787
>20B does porn out of the box btw. But its big and slow as fuck
Yeah I won't be running that shit unfortunately. Sad to hear that something like that out of the box already exists but is that big
>>
>>107341774
>It's so over.
Alternatively, intellectual property is dead and the children (think of them!) have been saved

>>107341773
Why do people forget that we already were in a slop lack-of-creativity attention economy culture war doomspiral before AI

Also, your point about millions of NPC prompting the same thing is another point for SaaS, because those same NPCs don't delete or turn off conversation history sharing so the Service can adjust the prompt slightly based on your preferences inferred from your past conversations
>>
File: ComfyUI_08940_.png (1.76 MB, 1152x1152)
1.76 MB
1.76 MB PNG
>>107341748
>Hunyuan 1.5*
We just need its NSFW tune.
>>
z image does 2k res perfect btw, you are not limited to 512 x 512 - 1024 x 1024, 1440x1920

>>107341823
nah, sorry to say hunyuan 1.5 is way more censored
>>
this is kandinsky 20B but at super low res / steps so it does not take 2 hours
https://files.catbox.moe/6pdai4.webp
>>
>>107341786
It gets some of the details incorrect. VG+C ran on the CPU, couldn't do videos (unless you count animations of it generating), 1.5 could be run on 4GB cards same with XL, and there's probably some other stuff that I don't realize because I'm retarded and don't wanna read papers right now. A chart like this with even more models, like Kandinsky and any others anon can think of, would be a really cool resource to have.
>>
>>107341838
howd you run it?
>>
>>107341844
>>107341787
>>
>>107341844
OFFLOADING
F
F
L
O
A
D
I
N
G
>>
>>107341838
The clit isn't supposed to be at this place
>>
>>107341838
Any workflows for this model? May fuck around and try to run it
>>
>>107341851
yea no shit
here is this as well btw
https://huggingface.co/Ada321/Kandinsky-5.0-T2V-Pro-sft-5s-Q8.gguf

https://github.com/Ada123-a/ComfyUI-Kandinsky
>>
File: heunpp2_z-image__00006_.png (2.92 MB, 1536x1536)
2.92 MB
2.92 MB PNG
>>
>>107341854
>>107341857
16GB vram is minimum btw, even with offloading the latent is about 8GB ish
>>
File: ComfyUI_08943_.png (1.75 MB, 1152x1152)
1.75 MB
1.75 MB PNG
>>
File: ComfyUI_08944_.png (1.81 MB, 1152x1152)
1.81 MB
1.81 MB PNG
>>
>>107341787
How slow we talking? Depending on how good it is, id be willing to wait up to 10 minutes with a 3090.
>>
What's actually "so over" is that this board's traffic when /ldg/ is popular is just "/ldg/ and friends".

>>107341786
>Z image is 6B
Huh guess I'll get off the couch and try it out. In my opinion Z.ai have been the most consistent Chinese lab of 2025, they feel like a Chinese version of Anthropic

>>107341851
>The clit isn't supposed to be at this place
That is a very, very reasonable mistake that models make (like double assholes or turning pussy lips into testicles or vice versa) and that example actually makes me more hyped because you need to have an understanding of the anatomy of the vagina and have seen enough clits to make a mistake like that

>>107341864
I am assuming that 64gb of ram is the minimum with 16gb of vram because I couldn't even run the q4_ks with just 32gb. The best I could do was a one frame 128x64 image that took 5 minutes to generate.
>>
>>107341878
lol, lmao even.
a hour on a 5090 using cache
>>
File: ComfyUI_08946_.png (1.47 MB, 1152x1152)
1.47 MB
1.47 MB PNG
Body horror when prompting for certain yoga poses that Chroma HD Flash nails first try.
>>
File: ComfyUI_07488_.png (2.84 MB, 2048x1280)
2.84 MB
2.84 MB PNG
>>
File: 1750450262792550.png (2.34 MB, 1120x1440)
2.34 MB
2.34 MB PNG
z image struggles to mix 2d and IRL, qwen beats it at this (for now)

>>107341811
it's not really faster, the distill is just way higher quality than we're used to. With CFG, it's similar speed to chroma for me too.

The answer is we need to ditch CFG and use something like this.
https://github.com/AMAP-ML/S2-Guidance
>>
File: ComfyUI_08949_.png (1.64 MB, 1152x1152)
1.64 MB
1.64 MB PNG
>>
File: ComfyUI_07569_.png (3.16 MB, 2048x1280)
3.16 MB
3.16 MB PNG
>>
File: ComfyUI_08954_.png (1.62 MB, 1152x1152)
1.62 MB
1.62 MB PNG
>>
>>107341885
thing requires a fucking h100 cluster then lmao im good. surely we'll get something as uncensored as it that ISNT fuck-you big
>>
>>107341885
Why the fuck can’t we have anything nice. Imagine z-image prompt adherence but for i2v. I’m tired of tard wrangling the hell out of wan and still not getting anywhere close to what I want.
>>
File: ComfyUI_08957_.png (1.73 MB, 1152x1152)
1.73 MB
1.73 MB PNG
>>
>>107341912
>>107341916
with a good 4 step distill it would prob get down to like 5 mins
>>
>>107341882
>In my opinion Z.ai have been the most consistent Chinese lab of 2025, they feel like a Chinese version of Anthropic
What else should anon know them from?
>>
>>107341882
It's not Z.ai btw, this is still alibaba
>>
File: heunpp2_z-image__00008_.png (2.71 MB, 1536x1536)
2.71 MB
2.71 MB PNG
>>
>>107341882
dumbass kid talking out of his ass lol
>>
please care about flux 2
>>
File: ComfyUI_07608_.png (1.31 MB, 944x1280)
1.31 MB
1.31 MB PNG
>>
>>107341950
would
>>
File: ComfyUI_08960_.png (1.74 MB, 1152x1152)
1.74 MB
1.74 MB PNG
>>
>>107341950
can u fill the box with cum plox
>>
File: tay.png (3.11 MB, 1824x1248)
3.11 MB
3.11 MB PNG
Flux 2 had better artistic potential while the China model is cheap yet functional slop machine spewing out cheap gimmicky memeslop clogging up all the threads, how fitting
>>
>>107341885
>an hour for 5 seconds
bro..
>>
is this option depracated or just hidden now
>>
>>
>>107341967
it's "streamlined"
>>
please care about flux 2
>>
File: ComfyUI_08962_.png (1.34 MB, 1152x1152)
1.34 MB
1.34 MB PNG
Kek, doesn't understand fellatio that well unfortunately, or maybe it's skill issue on my part.

https://files.catbox.moe/kair61.png
>>
I don't get it. It's just slop. The whole model is a slop machine. It's basically the same shit as flux 1.
>>
>it can't do the part were her body is made out of the pancake syrup but nails literally everything including perfect anatomy
flux would just fuck up the anatomy constantly, same with chromasome. wow.
https://files.catbox.moe/9nan6k.png
https://files.catbox.moe/c4mktw.png
>>
>>107341959
I’d probably be willing to wait even that long so long as it had god tier prompt adherence which it probably doesn’t. Imagine waiting for an hour excited for your 5 second goon gen, and then something is glitchy or your prompt was almost ignored cause you were just slightly too undescriptive. Long gen times make experimentation hard.
>>
File: ComfyUI_08963_.png (1.46 MB, 1152x1152)
1.46 MB
1.46 MB PNG
>>
>>107341997
Rent a 8xH100 cluster and get those 2 min gen times buddy
>>
z-image sucks in the same way SDXL sucks
>>
File: ComfyUI_01690_.png (2.82 MB, 1280x1920)
2.82 MB
2.82 MB PNG
>>
>>107342018
it feels like SDXL-2 in a lot of ways
>>
>>107342018
I remember being blown the fuck away when sdxl came out but could barely run it on my 1080 ti.
>>
File: ComfyUI_08964_.png (1.62 MB, 1152x1152)
1.62 MB
1.62 MB PNG
>>107341995
Yeah, Chroma has substantially better prompt following, NSFW concept and anatomical knowledge, in addition to higher res photos, way less slop and more variety. It's just that this is obviously a 6B Turbo model, and for what it can do out of the box (about 60% of what Chroma can) it's impressive. We'll see how its base model with full prompt following fairs, plus tunes (especially a large scale tune) will bring the best out of it.
>>
>>107342018
no, sdxl was safetyslopped when it first came out
>>
File: z-turbo_00041_.png (985 KB, 720x1280)
985 KB
985 KB PNG
>>
File: 1740597350809949.png (2.03 MB, 1120x1440)
2.03 MB
2.03 MB PNG
ok this works, I just have to start the prompt with the description of the 2d character.
>>
>>107342018
>>107341945
>>
File: ComfyUI_08966_.png (970 KB, 1152x1152)
970 KB
970 KB PNG
>>
Need a local audio model that can do NSFW shit and dialogue too
>>
File: z-turbo_00043_.png (1.05 MB, 896x1152)
1.05 MB
1.05 MB PNG
>>107342082
thats enough asian beauties for today, anon
>>
File: ComfyUI_07622_.png (3.02 MB, 2048x1280)
3.02 MB
3.02 MB PNG
>>
>>107342088
vibe voice not good enough for you?
>>
>>107342100
no anything outside of dialogue is very gacha
>>
File: ComfyUI_07597_.png (1.89 MB, 944x1280)
1.89 MB
1.89 MB PNG
>>
>>107342088
we need something where you can clone specific plap sounds
>>
File: 1737523168720720.png (1.06 MB, 832x1248)
1.06 MB
1.06 MB PNG
>>
File: 1759972772319413.jpg (413 KB, 904x904)
413 KB
413 KB JPG
>>107341934
>What else should anon know them from?
Their only other stuff is /lmg/ , the GLM series is considered a cheaper, but genuine alternative to Claude. We have Claude 3.25 at home because of them

>>107341937
>It's not Z.ai btw, this is still alibaba
... Okay in that case Alibaba is the best across all modalities for local, text image video a complete and total moggening

>>107341945
>dumbass kid talking out of his ass lol
I am probably younger than you, that is true, but I have an idea of what's going on in the industry. The actual "talking out of the ass" comes from the fact that who knows what decisions these companies would make if DRAM and compute weren't so bottlenecked. Pretty much every chip that gets released always has some SKU defined in the spec with double the VRAM
>>
File: ComfyUI_08968_.png (1.66 MB, 1152x1152)
1.66 MB
1.66 MB PNG
>>
File: ComfyUI_08969_.png (2.19 MB, 1152x1152)
2.19 MB
2.19 MB PNG
>>
>>107342052
I haven't tested this model yet (I am on a trip) , but from the outputs i've seen so far, z-image has less mangled anatomy problems than Chroma, and this is coming from a guy that defended Chroma for a long time. I think the base model with loras will look just as good as Chroma for photorealism, considering the base for z-image is already devoid of the plastic skin and buttchinfest Flux had
>>
File: ComfyUI_08970_.png (1.59 MB, 1152x1152)
1.59 MB
1.59 MB PNG
>>
Z Image seems to work with Qwen3-4B 2507 Instruct too. Haven't tried Thinking yet. Possible gains to be had there anyways since real finetunes of Qwen actually exist, unlike for T5-XXL.
>>
File: ComfyUI_01600_.png (1010 KB, 1024x1024)
1010 KB
1010 KB PNG
>>
>>107342143
i can't fully illustrate how happy i am knowing i don't need to put (cleft chin) in my negatives anymore thanks to the chinese.
>>
File: 1748558057824913.png (1.77 MB, 1120x1440)
1.77 MB
1.77 MB PNG
>>107341995
chroma, especially spark chroma definitely beats it at this prompt. pic is as far as I can get it with proompting. maybe the prompt enhancer can fix this when they release it.

z image is better than chroma overall, but won't replace it yet. ironically, chroma has better survivability against Z than qwen or flux because it has NSFW and more flexibility.
If the SPARK chroma guy finishes his work and then we get a Chroma distill with the same quality as Z, chroma will still be competitive.
>>
File: ComfyUI_00040_.png (1.3 MB, 1024x1024)
1.3 MB
1.3 MB PNG
>>107342161
not even remotely what the fuck is this kek, the body is supposed to be like picrel.
>>
File: ComfyUI_01602_.png (1.15 MB, 1024x1024)
1.15 MB
1.15 MB PNG
>>
File: ComfyUI_08971_.png (1.71 MB, 1152x1152)
1.71 MB
1.71 MB PNG
>>107342143
>less mangled anatomy problems than Chroma
Chroma HD Flash fixes all problems from original Chroma. This model however has very little seed variety (similar to Qwen) and can do less than Chroma out of the box. I would know because I've prompted all of these things in Chroma (though some of them are not first try on Chroma, there's still more seed variety). If a model cheats by beating its seed variety then it's not really better anatomy than Chroma imo.
>>
>>107342145
I need to look into how these instruct models work with the new gen of Chinese models. If you give it too illegal of a prompt, won't the instruct model just write "I'm sorry, I can't assist with that" instead of enhancing/encoding your prompt properly?
>>
Fresh

>>107342183
>>107342183
>>107342183
>>
File: ComfyUI_08972_.png (1.95 MB, 1152x1152)
1.95 MB
1.95 MB PNG
>>107342179
Also I'm paying very close attention to the prompt following, and it's also not up to par. But this is just the Turbo model with reasoning turned off after all.
>>
>>107341840
Yeah I didn't run a second pass but I probably should to check sources.
>>
>>107342181
no, that's not how it works, none of these models are being used in a way where it's possible for them to "refuse" a response, they only activate the "model understanding" state layers. There's no typical chat context going on at all
>>
Chroma still produces monstrosities more than often... anyone saying otherwise is just a fag



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.