[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Did They Just Ban Him? Edition

Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107364548

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe
https://github.com/ostris/ai-toolkit

>Z
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
https://comfyanonymous.github.io/ComfyUI_examples/z_image/

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
Z-Image is unsafe and violates Chinese law. The developers need to be more careful for future releases. Perhaps consider making the more powerful models API-only so prompts can be adequately filtered
>>
File: ComfyUI_08015_.png (1.29 MB, 944x1280)
1.29 MB
1.29 MB PNG
>>107366162
>>
>>107366162
bghira is that you faggot?
>>
>>107366162
Perhaps you should polish my balls.
>>
>>107366162
this but unironically
>>
>>107366172
holy SHIT it's NIGGER bird anon! so glad to have you back.
you tell that fucking nigger!
>>
Luckily ComfyOrg offers a solution through API Nodes. Access the full safe power of Z-Image 2 through ComfyCloud's API.
>>
Chinese girls want to fuck NIGGERS they're RICE BUNNIES they want BIG BLACK COCK

this post was sponsored by Chroma™
>>
File: flux2_bf16_c-0200.jpg (357 KB, 1600x1600)
357 KB
357 KB JPG
>>
also to whoever was saying z doesnt understand breast sizes, it does, and it understands types of breast shapes as well. you just need to know some good terms. look up a cheat sheet or something kek. it just works.
>>
File: 1764384517235851.jpg (106 KB, 1200x983)
106 KB
106 KB JPG
I applaud chinamen for obliterating censor trannies
>>
>gen nsfw 1girl w/ chroma
>compose w/ flux 2 references
I thought flux 2 is useless but it's actually quite good for this
>>
what resolutions does chroma support without throwing anatomy demons?

currently using 832x1216
>>
>>107366230
>without throwing anatomy demons?
if you gen at 256x256 you probably can't tell. does that count?
>>
File: ComfyUI_00247_.jpg (538 KB, 1280x2048)
538 KB
538 KB JPG
Z image has fixed all my issues I had with WAN not making jungle women ethnic and dirty enough and I am very happy

>>107366194
i would love to know how to get breasts larger than this. it feels like this is the maximum size for Z
>>
File: 1750220829391573.png (1.94 MB, 1024x1024)
1.94 MB
1.94 MB PNG
>no prompts
>gets a woman lying on gras
lmao they sure didn't want to miss that one!
>>
>>107366230
It's hard to avoid the anatomy demons. I could be wrong but I follow the same rules as sdxl, multiples of 128 in each dimension so like 1024x1024, then 896x1152, then 768x1280, and so on. I think generally for best accuracy you want to keep it close to 1 megapixel.
>>
>>107366237
yeah yeah, the spooky bad hands and feet of chroma
>>107366238
>Just don't go above 2048
alright, i was asking because i got elongated torsos and legs at 1600x1600
>>
What are the best settings to train zimage loras? Or is it similar to training Illustrious or Noob?
>>
>>107366261
I deleted the post because I thought you asked about Z. I didn't expect anyone to be using chrome in this day and age.
>>
>>107366265
>What are the best settings to train zimage loras?
Wait until Base.
>>
is 80 seconds per image normal for chroma? im using a 5090.
>>
File: ComfyUI_00226_.png (3.82 MB, 1280x2048)
3.82 MB
3.82 MB PNG
im getting extra toes very often on foot gens with Z turbo, both from above and from the soles

>>107365456
>catbox me boss. you have activated my jungle fever. and fevor.
ah shit im sorry i stepped away, the prompt was

>A National Geographic photograph from the 1970s, three extremely voluptuous very dark-skinned Brazilian indigenous women with primitive features, dirt-covered bodies now wet and glistening, long matted black hair plastered to skin, gold hoops, extremely plump lips, grass bikinis clinging to massive breasts and huge buttocks, thick thighs, standing under makeshift outdoor shower spray, left woman has mouth wide open with tongue extended and neck craned upward catching water droplets, center woman rubbing water over her dirty skin with hands on her enormous breasts, right woman squatting low with thick thighs spread, eyes closed with expression of intense pleasure, hands behind head exposing armpits and underarms, water cascading over her huge buttocks, candid moment, 1970s documentary film quality
>>
>>107366255
>I follow the same rules as sdxl
that makes sense, i'll just use the sdxl table then. thanks.
>>107366268
>I deleted the post because I thought you asked about Z. I didn't expect anyone to be using chrome in this day and age.
chroma's good, i think its the best. i use noob for anime though.
>>
>>107366281
thanks king, your fucking gens are jewcy

and nah i dont think the tits CAN get bigger, its just turbo's limited training.
>>
>>107366275
Seems a lil slow, depends on how many steps you're using and what workflow. I think I get about 30 seconds with the default workflow which is 26 steps of euler on a blackwell 6000 pro (which should be about the same speed as a 5090). Maybe make sure you're not running out of VRAM?

I usually use 40-50 steps with uni_pc or res_multistep which usually takes more like a minute.
>>
>>107366331
im doing 30 steps with heun/sgm uniform at 896x1152 with upscaling. i forgot to mention that. im not getting oom errors. ill try the samplers you mentioned.
>>
File: 2957904234.png (874 KB, 896x1152)
874 KB
874 KB PNG
>>
File: ComfyUI_09244_.png (1.97 MB, 1488x832)
1.97 MB
1.97 MB PNG
Z is very good with cars
>>
>>107366390
where's the 1girl in this image, what is this heresy?
>>
File: ComfyUI_09246_.png (1.68 MB, 1488x832)
1.68 MB
1.68 MB PNG
>>
File: ComfyUI_00273_.jpg (500 KB, 1280x2048)
500 KB
500 KB JPG
this one came out alright if you pretend its a small rock or really big grain of sand

>>107366288
>chroma's good, i think its the best
no one cares about my opinion and I missed the model war discussion last thread but personally I never found a desire to even download chroma (even when SPARK came out) because the increase in gen time and setting all that up for realism just wasn't worth it to me compared to using WAN, which I could use for videos to at any time

and now that Z image turbo is out, and from last thread it seems that loras train well on the turbo which imply that the base model will be even better I can say with absolutely zero sour grapes that I do not regret skipping Chroma entirely

but again my opinion doesn't matter. i mostly just wrote this out because I'm waiting for my series of gens where the researchers take a couple of tribewomen into society and funny lewd hijinks ensue. thanks for reading this far kek
>>
>>107366402
sir please put a 1girl in the image to adhere to /ldg/ guidelines
>>
File: 1742540566071198.png (3.72 MB, 2048x1280)
3.72 MB
3.72 MB PNG
>>107366308
>your fucking gens are jewcy
i have more types of brown women to gen first before sexualizing the jews but we'll get there

>and nah i dont think the tits CAN get bigger, its just turbo's limited training.
yeah I think so too. but I'm sure a macromastia lora will exist by the end of the year

>>107366415
from what i remember from my cars phase when i was younger, the hot asian 1girl is supposed to be in the passenger seat of the lamborghini
>>
>>107366407
>thanks for reading this far kek
i read the whole thing, you need to lay off the natural language prompting and just said "I just didn't think Chroma was worth it but I think Z-Image is worth it".

Me likey Chroma but the gen times are rough. Still prefer it to Z-Image for now.
>>
>>107366432
so you liked hot asian women when you were younger and now you like fat aboriginal women as an old man?
>>
>>107366390
toyota gt-r supra r34
>>
I liked hot asian women when I was younger and now I like them with cocks too.
>>
File: 1740686019587113.png (1.52 MB, 1248x832)
1.52 MB
1.52 MB PNG
>>107366390
indeed
>>
File: ComfyUI_09253_.png (1.8 MB, 1488x832)
1.8 MB
1.8 MB PNG
>>107366398
Of course anon, here's the 1girl
>>
File: ComfyUI_temp_vpsbk_00017_.jpg (628 KB, 1344x1728)
628 KB
628 KB JPG
>>
File: ComfyUI_00312_.png (3.19 MB, 1280x2048)
3.19 MB
3.19 MB PNG
Z handles reflections noticably worse than WAN. one of the advantages of being a video model with a temporal understanding and training data of videos with mirrors I guess. or maybe it's also because of turbo, we'll see on sunday

>>107366445
>you need to lay off the natural language prompting and just said
well if you read the whole thing you'd know why I was blogposting lel
>the gen times are rough.
the gen times are the only reason I'm addicted to Z image right now. It's actually faster than my ancient SDXL workflow since that required a hiresfix while this just does it in a single sampler pass


>>107366449
I can appreciate pretty much every form of female before menopause except indians, honestly

The chinese/asian beauty standard is not for me though. I like westernized looking asians more. too bad a lot of them have tattoos now

maybe that's why I like tribal women, either no tattoos or the tattoos are curvy and sexy not retarded shit like dogpaws and butterflies that are dead (99% of butterfly tattoos have the wings open too much like they're in a biology textbook aka dead, once you learn this you can't unsee it)
>>
>>107366461
same, huge veiny cocks with sagging balls or you're a bitch.
>>
What is the best method for NL captioning SFW datasets? TagGUI is outdated shit that still recommends Florence 2. Any good SaaS API that can handle 100 or so images without paying?
>>
>>107366486
yeah, the gen times of zim are extremly nice but it doesnt have the knowledge that chroma has so ill stick with it until i see what zim base can do.
>>
File: 1745779756376225.png (2.25 MB, 1752x1168)
2.25 MB
2.25 MB PNG
>>
I hate z fags so much
they are basically the former chroma fags
I hate that underdog thinkpad mentality
>>
>>107366470
>no huge flaccid dong peeking from under her skirt
tch, z-image SUCKS
>>
File: 1757807473532475.png (2.51 MB, 1752x1168)
2.51 MB
2.51 MB PNG
>>
>>107366551
I honestly wonder if the model has been overfit on its RLHF data, it does cars WAY too well
>>
File: 1757206198607237.png (2.42 MB, 1752x1168)
2.42 MB
2.42 MB PNG
>>107366557
Didn't they only do that for Turbo and not Base? I don't remember.
>>
File: this is good.png (1.54 MB, 1280x1519)
1.54 MB
1.54 MB PNG
>>107366551
lmaoooooo
>>
>>107366580
do one with e621
>>
File: ComfyUI_00330_.jpg (380 KB, 1280x2048)
380 KB
380 KB JPG
>z fags
cmon anon, zigger was right there

>>107366557
>it does cars WAY too well
if it's trained on portraits maybe its trained on girls posing with cars? I don't think it does cars *way* too well btw, just really good
>>
>>107366594
my zigger gens always come out blurry or with noise. i know adjusting the shift addresses some of it but they dont come out this clear. did you edit the workflow?
>>
File: 1738833726376168.png (1.46 MB, 1280x720)
1.46 MB
1.46 MB PNG
China, the master of AI coom
>>
File: 1748816975617014.png (2.33 MB, 1752x1168)
2.33 MB
2.33 MB PNG
>the word "MissAV" with "Miss" being in black and "AV" being in pink
>>107366586
I doubt it knows any logos more obscure than something like PornHub.
>>
File: zit.png (169 KB, 1112x732)
169 KB
169 KB PNG
>>107366609
>did you edit the workflow?
I didn't change anything other than using TAEF1 instead of the normal VAE which changes the image less than 0.1% so that shouldn't the difference. I'm bypassing the shift node like the original workflow was though
>>
>>107366627
i must be messing something up, prob a prompt skill issue.
>>
File: just saying.png (120 KB, 319x376)
120 KB
120 KB PNG
>>107366625
>MissAV
that one is good, but the goat is ThisAV desu
>>
>>107366641
the image you referenced still has the typical z compression artifacting
>>
>>107366505
joycaption or torigate
>>
File: ComfyUI_00356_.jpg (350 KB, 2048x1280)
350 KB
350 KB JPG
i hope zbase is better at pov hands reaching out
maybe i'll be able to reach the endgame of full-on gum inspection videos with squelchy audio by 2028
>>
>>107366653
I find turning off the bypass for aura flow, setting it to 7 and increasing steps to 12 cleans up most of the noise at the expense of being slightly slower.
Currently messing around with switching schedulers half way which is giving interesting results.
>>
File: ComfyUI_00360_.jpg (299 KB, 2048x1280)
299 KB
299 KB JPG
grab her by the pussy

>>107366681
>I find turning off the bypass for aura flow, setting it to 7 and increasing steps to 12 cleans up most of the noise at the expense of being slightly slower.
show us a comparison?
>>
why is z image so good at controlling the age of a girl?
>>
File: ComfyUI_00371_.jpg (291 KB, 2048x1280)
291 KB
291 KB JPG
hands reaching out seems to work better in landscape than portrait, which makes sense if you think about the portrait dataset used to distill and the where most pov hands image involving grabbing ass come from (landscape porn videos)

anyways thats enough brown women for now

>>107366697
>why is z image so good at controlling the age of a girl?
same reason WAN is so good at controlling the age of a girl. They got Gemini to guess the age and include it in part of the caption I'm assuming, or maybe the AI really is that smart. Either way I don't think they did something very new for Z image captioning compared to WAN
>>
File: 1762450705416891.png (3.05 MB, 1920x1080)
3.05 MB
3.05 MB PNG
Look at him go!
https://civitai.com/models/2174416/technically-color-z?modelVersionId=2448632
>>
File: 1760044107714229.png (2.24 MB, 1752x1168)
2.24 MB
2.24 MB PNG
It keeps doing "/ldg." instead of "/ldg/"
>>
>>107366747
Is that a supra
>>
>>107366751
its a sigma
>>
>>107366725
here ya go tribal girl enjoyer gramps
https://www.reddit.com/r/StableDiffusion/comments/1p9f6it/humans_of_zimage_races_cultures_and_geographical/

ill have to test this shit out tomorrow morning for sure
>>
File: 1742590755905482.png (1.7 MB, 1168x1752)
1.7 MB
1.7 MB PNG
>>107366751
That one was supposed to be a Chrysler but I'm just realizing I spelled it incorrectly.
>>
File: file.png (1.83 MB, 1024x1536)
1.83 MB
1.83 MB PNG
>>
>>107366775
JK I spelled it correctly nvrmind.
>>
How do I gen longer wan 2.2 videos than 81 frames?
Can I simply just increase the amount of frames or is there some hoops I have to jump through? I sometimes see vids on Civit that are way longer than 5s, so there must be a way that I'm missing.
Slightly related, how would I go about upscaling the vids during generation?
>>
>>107366194
what it really doesnt know is areola size but i wont hold that against it kek
>>
>>107366798
For simple prompts that involve a stationary camera and a character doing repetitive motions, yes you can just increase the frames. For anything more complex like the camera moving, a character appearing from offscreen or any drastic changes in pose the video will likely loop, but you can usually get away with 101-113 frames without looping. Other than that you can use painter long video nodes which will generate multiple videos using different prompts and it will combine them for you.
>>
File: Z-image turbo.png (2.9 MB, 1920x1080)
2.9 MB
2.9 MB PNG
>>
>>107366798
Set frames higher. It's 81 for 5 sec and 113 for 7 seconds. Wan is designed for 5 seconds anyways so any longer, the video just loops. A good workflow can upscale and grab last frame and extend scene.
>>
File: file.png (275 KB, 937x456)
275 KB
275 KB PNG
what can i do with these? i have not genned any images in like 2 years. can i make videos?
>>
>>107366864
train an actual finetune
>>
>>107366864
You can play Minecraft with raytracing
>>
>>107366864
Give them to me.
>>
File: Z-image turbo.png (3.51 MB, 1920x1080)
3.51 MB
3.51 MB PNG
>>107366851
>>
>>107366892
What I find interesting is that it makes the people in the background look really diverse and different, but as soon as it comes to doing portraits they have sameface.
>>
File: ComfyUI_07929_.png (1.43 MB, 944x1280)
1.43 MB
1.43 MB PNG
>>
File: flux2_bf16_c-0234.jpg (541 KB, 1920x1536)
541 KB
541 KB JPG
>>
File: ComfyUI_00049_.png (1.16 MB, 1024x1024)
1.16 MB
1.16 MB PNG
Z Image is fucking incredible holy shit
>>
File: 1748439707078160.png (1.71 MB, 2046x951)
1.71 MB
1.71 MB PNG
>>107366931
funny you say that
>>
File: ComfyUI_00056_.png (1.27 MB, 1024x1024)
1.27 MB
1.27 MB PNG
Yeah I'm thinking this Z image model is based.
Also what the fuck this thing runs quickly on my 3080 10gb it this actual witchcraft or some shit?
>>
>>107366971
>>107367033
lol'd
>>
did /ldg/ confirm that prompts in chinese work better than english?
>>
>>107367084
Gacha.
>>
man, lumina models (z-image) is fucking horrible on turing gpus. no bf16, sage attention doesn't work properly (it doesn't work with fp32). i'm sad
>>
>>107367084
Only one image comparison but yes.
>>
>>107366102
Chroma was only possible by breaking Flux's distillation basically. Plus Lodestone discarded Flux's useless parameters. It's a modified Flux Schnell. Allegedly the tune itself cost like $150k, and was only possible due to community funding and sponsors like Pony.
>>
>>107367084
It's not 100% but something to try if your prompt isn't working.
>>
>>107367132
>Plus Lodestone discarded Flux's useless parameters
now he wants to expand z-image into a 32b MoE
>>
>>107367159
can he do something useful?
>>
File: 4454412154.png (1.03 MB, 762x729)
1.03 MB
1.03 MB PNG
>>107367019
>>107366931
Kek, you're right.
>>
>>107367159
Visionary
>>
File: ComfyUI_09269_.png (2.12 MB, 1152x1152)
2.12 MB
2.12 MB PNG
>>107367159
>expand z-image into a 32b MoE
Source? I mean, I remember this frankenstein experiment https://huggingface.co/blog/segmoe

It did have cool results
>>
File: 1754078086078883.jpg (80 KB, 800x594)
80 KB
80 KB JPG
I have no fucking clue what I am doing. Is it possible you guys could train a retard on this matter? I would prompt slave away for anyone's help for like a month straight. I have comfyui installed and that's about it. I read through the rentry stuff but it's all just going right over my head.

Here is my computer.
AMD Ryzen 9 9950X3D
ASUS GeForce RTX 5090 Astral
MSI MPG X870E EDGE TI WIFI
SAMSUNG SSD 9100 PRO 4TB
DOMINATOR® TITANIUM RGB 96GB (2x48GB) DDR5 DRAM 6000MT/s CL30
>>
>>107366864
You can make 4 720p videos at a time at full precision full steps with WAN 2.2 which should take about 15 minutes

I would recommend vibecoding a webui for maximum comfiness assuming you have access to this compute permanently/for a while

>>107367084
>did /ldg/ confirm that prompts in chinese work better than english?
Actually I thought we confirmed that the Chinese prompt enhancer concerts the prompt to English. It's just that qwen is trained really well for both languages (an anon even tested russian and it worked too) so the outputted embeddings are similar enough, but may be different enough to do a concept in one language that it can't in another. So actually, you should keep trying your prompt in different but well-represented languages (maybe even Spanish too) until you either get what you want or rewrite the prompt and try again.

>>107366767
>here ya go tribal girl enjoyer gramps
Thanks. Amazonian looks very colonizable but I think qwen basically converged on that facial structure with Brazilian + tribal but I'll definitely try with it too next time as well. Some interesting islanders too. Filipinos are too light skinned and that's probably the portraits they trained on since Asians are more obsessed with being white than white people are (unfortunately)

>>107367108
>man, lumina models (z-image) is fucking horrible on turing gpus. no bf16, sage attention doesn't work properly (it doesn't work with fp32). i'm sad
No one is using sageattention for Z right now, and Q8_0 will get you 99.97% of the way to bf16 so stop complaining and be happy that solutions for your vramletness even exist
>>
>>107367198
What do you want to do, for starters. Gen images? Videos?
>>
>>107367198
>I have no fucking clue what I am doing.
I am unable to sleep and I am unable to get so I am willing to help you out for about 2 hours

First of all you need to tell me what you want to make with AI, since I have no clue what you are doing either

Second of all you need to tell me your OS and your level of experience with Python and the Command Prompt
>>
>>107367205
Truth be told, right now I just want to make a test video of a vaporeon running around a field of sunflowers.
>>
>>107367159
It's crazy that no ones beat scale-at-all-costs. Could you imagine? Zimg being... hah... something small? Like 6B? Absurdity. No way.
>>
>>107366230
Which version of it are you using? If you want significantly less limb errors could always switch to Chroma Flash HD.
>>
>>107367211
The true aim is to make hyper niche personalized porn for the wife and I to laugh at and get off too. Maybe just make shit to shitpost with too. Windows 11. And I used to be really familiar with both python and command prompt but I also haven't fucked around with either in like 2 years.
>>
>>107367198
how are you struggling to understand the basic concepts when you have enough money to buy that pc?

just download the smoothmix t2v workflow for smoothbrains from civit and prompt your vaporeon porn
>>
>>107367227
im using spark, flash looks like shit whenever i use it and keeps giving me anime randomly, which pisses me off.
>>
>>107367244
Well, spark is a meme tune. Fix your Flash settings then.
>>
>>107367239
I just fuck and exercise. I gave my brain away long ago. Thank you for the insight though. I greatly appreciate it.
>>
>>107367218
>>107367234
Okay cool I won't bother with technical stuff since I'm assuming you're intelligent enough to ask Claude for help with python fuckeru


The other anon might help you directly get to video, but I'd want you to learn the basics of comfyUI first and actually generating a basic image and starting to use the program

So how about you start using Z image which is the cool new text to image model we're all using right now

Read this
https://github.com/comfyanonymous/ComfyUI_examples/tree/master/z_image
It's very simple. Ask for help of you get stuck. I hope to see a vaporeon image from you in under an hour assuming you have fast internet

If you can figure this out, honestly I'm sure you can figure out the video example which is here

https://github.com/comfyanonymous/ComfyUI_examples/tree/master/wan
This is wan 2.1 which is the older text to video model. But it's much easier to setup. This will get you to second-to-state-of-the-art vaporeons

And then you can fuck around with wan 2.2 which requires a second sampler and to use two models in the same workflow. Which is twice as complicated
https://github.com/comfyanonymous/ComfyUI_examples/tree/master/wan22


This is my advice. Ask for help if you get stuck at any step. Please set up basic image generation first, because there's more to talk about with video models to speed them up so you're not spending 30 minutes per video on a 5090. Good luck anon
>>
>>107367254
spark works fine for me, how is it a meme and would you share your flash settings with me?
>>
Why it no werk
>>
>>107367256
your brain should still be working if all you do is fuck and exercise. also, download the i2v one too in case you also want to animate a vaporeon image instead of making one with the text2video one.

you should be able to figure out the rest, the workflow includes notes as well.
>>
File: ComfyUI_08020_.png (1.49 MB, 944x1280)
1.49 MB
1.49 MB PNG
>>
>>107367269
Yeah I was also gonna say if you want vaporeon Pokemon stuff you probably want to generate an image with an image model first and then give that to wan 2.2 with image to video. Wan text to video isn't the best or most consistent for anime
>>
lodestone is based for asking for the pretuned versions and aesthetic tuned versions to be released separately but shut the fuck up about pixel space and let them cook
>>
File: file.png (947 KB, 1024x1536)
947 KB
947 KB PNG
>>
the details on radiance are so fucked everyone knows it grainy as hell
>>
respectfully
>>
>>107367294
does pixel space even work or is it more bitnet fake hypeshit that doesn't amount to anything? it reminds me of that "unlimited detail!!!!" thing that claimed games would have unlimited atomically-deep graphics with no performance cost.
>>
File: 1764389183939090.jpg (174 KB, 1024x1024)
174 KB
174 KB JPG
>>
>>107367320
who knows
lode got further than ostris but still not all the way
play with the idea for the edit model but leave base alone or at least release base before fucking around with it further
>>
>>107367320
It works but it's inferior to latent space and I don't really understand the reason to pursue it. latent space is more like analog film while pixels are discrete like digital video

The value proposition is that you don't have to encode and decode the latent but that shit is basically instant with state of the art tiny AEs.
>>
>>107367320
It works in that it produces an image. Look at any radiance gen and see how much noise is still there.
>>
File: ComfyUI_08040_.png (3.09 MB, 1280x2048)
3.09 MB
3.09 MB PNG
>>
>>107367384
This is why I am on finasteride
Do people actually use computers like this? Even women sit with their laptops on their... lap
>>
just turn z image into a pixel-space x0pred 32b MoE model trained by merging 3 separate 512x512, 768x768, and 1024x1024 bakes without testing any of these ideas individually. all that was missing from chroma was compute, $200k simply wasn't enough!
>>
>>107367370
the value isnt speed but rather getting rid of a lossy operation
its trying to solve the "we asked AI to NOT change this image over and over again and it turned the cute asian 1girl into a fat brown 1girl" problem
>>107367378
>Look at any radiance gen and see how much noise is still there.
this
>>
>>107366078
pretty good but the face looks off, something about the features being out of place
>>
>>107367395
>getting rid of a lossy operation
Huh? How is decoding to latent space lossy? You can't do anything with latent space in terms of consuming the image except decode it. Calling that "lossy" would be like calling the observer effect of using a voltmeter "lossy", like yes you're technically changing the voltage by measuring it but but there's no other way to actually measure the voltage (pixel space and latent space wouldn't deterministically converge to the same thing so it's not apples to apples comparison)

Or should I actually start the fast.ai course on stable diffusion because I am missing some fundamental knowledge here
>>
>>107367262
It's clearly working but why isn't it put into text/string?
>>
File: file.png (985 KB, 1024x1536)
985 KB
985 KB PNG
>>
>>107367422
Anon no one fucking cares, export your workflow as a json and open your ComfyUI folder in Visual Studio Code and use your free Claude Sonnet 4.5 and give it your JSON and ask it this question
>>
>>107367429
That's not very local of you, anon.
>>
>>107367419
you can observe the effect yourself by encoding and decoding a latent multiple times its more pronounced on earlier models
you might remember the purple splotches that would arise occasionally with 1.5 which was partially solved with external VAEs but not entirely
latent space itself is compressed pixel space
sure modern VAEs have gotten REALLY good at minimizing this but its still there and with enough en/decodes it shows up
>>
>>107367446
Okay connect copilot or your favourite Viscose extension to your local OpenAI compatible or KoboldAI endpoint and use that then.
>>
but again the important part is no one i2i's with modern (non edit) models enough for it to show up so the benefits do not outweigh the cons of its current status which is grainy gens
maybe we will figure it out with zedit but theres simply no reason to do it with base right now
>>
File: ComfyUI_00401_.jpg (736 KB, 2048x2048)
736 KB
736 KB JPG
>>107367453
Turns out the combination of the two different prompts actually work, but it doesn't show up in the prompt window, the prompt that is in the prompt node gets frozen and displays forever until unhooked.

Thanks open source.
>>
>>107367428
Prompt?
>>
>>107367452
>sure modern VAEs have gotten REALLY good at minimizing this but its still there and with enough en/decodes it shows up
But we're not doing multiple encodes and decodes so this is basically irrelevant right?

Oh I guess hiresfix or other multiple pass workflows this matters.

Or maybe it doesn't since you can just use Upscale Latent

Maybe I will make a test for this (and make another slop rentry for it) because I am.interested in how well 16 Vs 32 Vs 64 channel VAEs compare. I have a feeling this is a negligible concern at 64 channels and probably even 32 though for the average workflow
>>
>>107367484
>Thanks open source.
You're only meeting open source halfway with an attitude like this, the cool part is that if you found the bug, you can make a pull request to the GitHub repo of that node and get it fixed
>>
>>107367498
And in open source I have the freedom to not report it and let others suffer.
>>
>>107367507
based spiteful free will enjoyer. fuck everyone if you got yours already
>>
>>107367490
zimg - Sabattier effect Photography in the style of Slim Aarons. Anaglyph 3D, calligram, symmetrical composition, ascii art, surreal, abstract, geometric, futuristic neo-brutalism. japanese graphic design aesthetics. the designer's republic
>>
loads of loras up
>>
>>107367507
>>107367523
I mean sure I think doing it for free is cuck coded too I just don't want to deal with maintaining private forks or upstreams so I'd rather it get merged into master

>>107367524
Schizo prompting with z base will be so kino bwos
>>
>>107367491
if youre interested anon was posting and comparing VAEs around the time flux first released as that was when you could clearly see how much better a 4ch is to a 16 for details
>But we're not doing multiple encodes and decodes so this is basically irrelevant right?
for regular imggen you are correct but not for edit models
no one edits a specific gen more than X times because it slowly degrades which again is due to the VAE
>because I am.interested in how well 16 Vs 32 Vs 64 channel VAEs compare
iirc 16 vs 32 is not worth it due to the additional training time vs the amount of detail retained but i could be misremembering
the big leap was 4 to 16 and either no ones trained on anything higher or if they did it didnt take off

the main issue is it doesnt need to be in base right now
okay thats the last time ill say it i promise
>>
>>107366147
Did ani say when anistudio's getting z-image support? i'm not gonna use anything related to comfy. hopefully ani has some free time this week.
>>
whats the best local model for horror, actual scary monsters and atmosphere?
>>
>>107367568
seconding that, not really interested in supporting comfy bullshit
>>
>>107367560
>iirc 16 vs 32 is not worth it due to the additional training time vs the amount of detail retained but i could be misremembering
Only reason I brought up 32 and 64 is because Infinity-2B uses a 32 channel VAE by default and it had like 4 different VAE pths in the repo and I was remembering that.

I forgot about edit models tbdesu because I'm also the guy who firmly believes (because of my experience fucking with Infinity) that autoregressive/edit models are not suited for local use cases anyways at least for the time being due to the high vram requirements and inability to quantitze the activations in a trivial manner so it literally slipped my mind
>>
samefagging this, i only use SD1.5 btw
>>
>>107367568
>>107367587
he gave up on anistudio after he was caught schizoposting with his trip on
https://desuarchive.org/g/search/username/Ani/tripcode/0gRLTHrqN2/type/posts/
>>
>>107367523
This kind of mindset is actually part of why the west has fallen.
>>
What the fuck is wrong with people who go through the effort of training a fucking lora and upload it to CivitAI only to scrub their goddamn samples of metadata?

Fuck you if you do that.
>>
>>107367603
ani didn't use secure trip and the schizo was able to mine it. he deliberately falseflagged as ani
>>
>>107367615
??? and how would you know that ``anon''
>>
>>107367616
it's obvious what happened schizo. your obsession with ani is showing
>>
>>107367608
Isn't that civitais fault for censoring shit?
>>
>>107367615
>>107367622
Are those posts also the schizo falseflagging as Ani, or are you just naturally so stupid you don't see everyone can see you refer to yourself in the third person?
>>
>>107367526
Yeah, interesting that you can get such good quality like the technicolor lora out of training the distilled Turbo version.

Makes you wonder how good training the base model will be. Then again base model trained loras might not work as well on Turbo, and if you can train on Turbo directly...
>>
File: file.png (1.15 MB, 1024x1536)
1.15 MB
1.15 MB PNG
>>107367554
It feels like diet Qwen with merged in ultrareal lora.
>>
>>107367636
not ani. i just like his work and think he deserves more support from us
>>
>>107367646
Very organic, anon.
>>
>>107367595
>due to the high vram requirements and inability to quantitze the activations in a trivial manner
perhaps zedit will fare better than previous models in this respect considering the nature of their paper
>>107367641
>merged in ultrareal lora.
turbo is aesthetically tuned for maximum 1girl so youre not entirely wrong
thankfully it sounds like base isnt and if we're lucky itll be the pretuned version
>>
>>107367646
Don’t know who this Ani guy is, still a newfag (been here like a week). What kinda projects does that dude even do?
>>
>>107367568
this weekend
>>
>>107367637
If the base model lives up to the hype of general knowledge and trainability while still being relatively easy to prompt for then I'm getting a gut feeling that this might be the photoreal model accessible enough to get Anti-AI people to start taking notice

I know people said this for flux, and then said this for wan, but this is really really photorealistic really fast

On the other hand, every 2 months normies seethe at Grok for being PedoHitler and then move on to the next current thing

>>107367652
>considering the nature of their paper
What is there to consider? Is it not autoregressive?
>>
>>107367658
His biggest project is samefagging the thread, pretending anonymous posters are supporting his broken UI when it has only one user (himself) (it fails to build on anyone else's machine), and ritual posting "comfyanon should be shot".
>>
>>107367658
ani was on a forefront of vidgen way before it became popular, hence the name. he also was moving models forward while working in japanese studio (ikhor labs, he was employed there). also helping comfy with his interface before comfy betrayed him
right now he's developing a c++ ui for imagegen
https://github.com/FizzleDorf/AniStudio
much faster than python crap
>>
I am SICK AND TIRED of chinamen spamming low effort ComfyUI workflows with runninghub referrals. Civitai seriously needs to ban that shit.
>>
>ritualposting
You just made me realize this isn't a blessed thread :(
>>
>>107367258
They're the same they've always been
https://files.catbox.moe/rgk5w1.png

>spark works fine for me, how is it a meme
One of the images you posted (ldg sign next to a car) looked very broken compared to what Flash can do.
>>
>>107367603
what was the schizopost?
>>
>>107367684
oh wow, thank you for the lore dump, anonymous!
>>
>>107367684
Based if true. Should check out the interface, I honestly kinda hate python.
>>107367673
You sound unhinged anon
>>
>>107367684
wow. comfy is an asshole
>>
>>107367696
For a week or two he was double baking trying to force AniStudio in the /ldg/ OP because /adt/ removed it from theirs, and spamming the legitimate threads trying to redirect people to his own bake. He forgot to remove his trip for one of these posts.
>>
File: file.png (2.03 MB, 1024x1536)
2.03 MB
2.03 MB PNG
>>107367670
There's an NB2 "it's so over" marketing push on X and 4chan rn, that will draw the fire first.
>>
>>107367704
he has a version much further than the current release but the schizo has a poopy melty across a week that ruined the threads so he's probably burnt out. if he comes back and adds zit support I'd support him
>>
>>107367670
>What is there to consider?
one could infer that the philosophy behind the entire suite of models is small but SOTA
not saying i comprehend everything in the paper but you get that from even the abstract and introduction
>>
>>107367704
>You sound unhinged anon
There's a reason /adt/ removed AniStudio from their OP and no one on /ldg/ uses bakes made with AniStudio in the OP. You're talking to the thread's biggest schizo after debo.
>>
>>107367734
debo became pretty much inactive more than a year ago. ani is a much, much bigger schizo and honestly deserves a rentry much more than debo does
>>
>>107367716
the only thing I see in your link is him trying to get anons to fill the oldest thread because you baked two more threads with and without anistudio

>>107367734
>There's a reason /adt/ removed AniStudio from their OP
haha, yeah. some schizo would split the bakes every thread and schizo spam both all day for a week. we had the exact same thing happen here because mods just can't range ban this unhinged nigger
>>
File: ComfyUI_08060_.png (2.69 MB, 1280x2048)
2.69 MB
2.69 MB PNG
>>
>>107367673
>>107367684
just checked his project. weird choosing ImGui over anything else lmao. well good luck. since Forge Neo already supports Z-Image i don’t need other UIs anyway kek
>>
>>107367608
The loras on Tensor were way worse. Like 90% of the loras had hidden prompts on the sample images and many didn't even tell you the trigger word.
>>
so is comfyorg bad? I get the red hat ick when I look at what they do publicly but betraying friends too? I feel bad for using comfy for so long
>>
>>107367724
>he has a version much further than the current release
do you have a source or did you make it up 'anon'
>>
>>107367757
comfy is just a bad person
>>
So, today is Saturday 4:00 PM in China. Where is the model? They seriously pushed into Sunday?
>>
>>107367718
Kino
>>
>>107367718
>There's an NB2 "it's so over" marketing push on X and 4chan rn, that will draw the fire first.
I'm so happy I have absolutely no idea what you're talking about. I love not knowing things going on in the world. I didn't even know who Charlie Kirk was until he got necked and obviously I was still online

>>107367726
Based on the team's behaviour towards embracing noob dataset (even if they will censor) and kicking out bad actors from their discord, this is probably the most aligned team ever

I wouldn't bet money on it, but if I was forced to I'd probably bet on an Ali team being the first to get a local version of video+audio/Sora2 at home

And then I'd lose the bet because tencent would pump out some shitty garbage that's still SOTA a month before the Ali team releases their model kek
>>
>>107367760
he shared a screenshot in one of the threads but I'm not going through the schizo bakes to find it. it looks a lot easier on the eyes
>>
>>
>>107366248
She's a chinese woman, though. Can it do an actual woman?
>>
>>107367773
I'm glad you're burned out, bro. Wish you were burned out on schizoposting as well.
>>
>>107366248
Hmm, so this is what it was RLHF tuned on I guess.
>>
>>107367684
I might try this out, C++ ui is what I wanted for a while
>>
>>107367782
>Wish you were burned out on schizoposting as well.
>t. schizo

anyways, I hope ani comes back. comfy needs a replacement because comfyorg is filled with grift chinks and indians now
>>
>a c++ ui for imagegen
c++ and python :]
>>
>>107367775
I completely understand that this woman is a 10.in Asian beauty standards but since I don't sexually subscribe to those standards she looks like a 6 trying to be an 8 which is making me uncomfortable

Also smoking isn't hot but I think that's either a generational difference or my mouth fetish and desire for a tasty clean mouth to lick
>>
>>107367790
Yeah, yeah. You can call me the schizo. But no one ITT used your bakes with AniStudio in them. Strange.
>>
>>107367791
if we're being pedantic pytorch is c++ and python too, kinda
>>
>>107367791
I think python is only there for conan or running scripts in the app. there isn't anything else suggesting it relies on it at all.
>>
kinda wanna try training lora with actually good nudes + detailer using ostris adapter, but i’m way too fucking lazy to prep the dataset, haeughh
>>
>>107367811
vibe code it by talking into your phone microphone
>>
File: 1733189648732055.png (1.76 MB, 1168x1752)
1.76 MB
1.76 MB PNG
>>107367524
>Sabattier effect Photography
neat concept
>>
>>107367788
go for it anon, it's worth it
>>
>>107367839
the last time I tried I couldn't get it to work from the release but I could build my own binaries. if it still loads the model every gen it's probably not worth using atm.
>>
>>107367775
Is that a sigma
>>
File: file.png (1.02 MB, 1024x1536)
1.02 MB
1.02 MB PNG
>>107367822
>Sabattier effect Photography
I don't think it understands the concept sadly :(v
>>
File: file.png (2.55 MB, 1024x1536)
2.55 MB
2.55 MB PNG
>>
File: 1753026928275530.png (3.09 MB, 1168x1752)
3.09 MB
3.09 MB PNG
>>
>>107367904
You could probably gen some kino playing cards with Z
>>
File: ComfyUI_07917_.png (1.42 MB, 944x1280)
1.42 MB
1.42 MB PNG
>>
Kino gen hour
>>
File: 1751640236794309.png (2.73 MB, 1168x1752)
2.73 MB
2.73 MB PNG
>What the ace of spades playing card would look like if designed by Claude Monet, a winter landscape painted in his signature impressionistic style.
>>
File: file.png (1.59 MB, 1024x1536)
1.59 MB
1.59 MB PNG
>>
File: ComfyUI_temp_tlpbk_00020_.png (3.02 MB, 1152x1664)
3.02 MB
3.02 MB PNG
if Zimg gets porn knowledge from the noob dataset, can it use it to gen 3dpd?
>>
File: 1742930840159974.jpg (983 KB, 2336x1752)
983 KB
983 KB JPG
>>
>>107367953
>>107368044
nice
>>
>>107368022
You'll be able to gen more than that yes
>>
>>107367695
>One of the images you posted
that wasn't me, but thanks for the workflow.
>>
OK, Z is good. But There's a problem. Well, two. Most of the time 1girls come out looking asian even if you specify "white", "caucasian", "western", even "jewish".
Also, their crotches are always smooth, and trying to unsmooth them summons horrors.
How do we fix this?
>>
>>107368058
>How do we fix this?
By waiting for the non distilled pretuned base.
>>
File: file.png (2.66 MB, 1024x1536)
2.66 MB
2.66 MB PNG
>>107368058
Add in a random western female name. I haven't tried those ethnicity but Slavic gives me white women.
>>
File: z-image-race.jpg (3.95 MB, 3129x5950)
3.95 MB
3.95 MB JPG
>>107368058
caucasian should work but try something more specific, plenty to choose from
>>
File: ComfyUI_08003_.png (1.33 MB, 944x1280)
1.33 MB
1.33 MB PNG
>>107368058
>How do we fix this?
Loras?
>>
>>107368069
"Jessica" gave me a Chinese woman (my IP range can't post images).
I guess I'll try Penelope Featherstonehaugh next.
>>
>>107368058
If you mean detailed vag, you can't. Only way is with a lora. Just wait for the base model drop. Tho even when the base comes out they’ll probably gut the hardcore nsfw same as the current turbo ver.
>>
File: 1740317683699058.png (2.9 MB, 1168x1752)
2.9 MB
2.9 MB PNG
>>107368051
i couldnt really get the design/art to cover the entire card, it always seemed to stay mostly in that center section
prolly issue with turbo
>>
File: file.png (1.1 MB, 1024x1536)
1.1 MB
1.1 MB PNG
>>107368096
Meet Marilyn Minter
>>
>>107368058
That's the cost of westernoids letting chinks win. When you use an empty prompt, most of the gens are faces of asian women, so it's pretty baked in there.
>>
>>107367779
What like a Filipino?
>>
>>107368096
Now we're talking
https://litter.catbox.moe/4kyesr6oemwqycm3.png
>>107368107
It's not surprising. As an European, I'm pretty used to united states culture to be shoved down my throat and I guess it's China's turn now to throat-fuck me. At least I've been learning the language so I can enjoy it undubbed.
>>
>>107367932
whats the prompt to gen the roastie?
>>
File: file.png (2.82 MB, 1024x1536)
2.82 MB
2.82 MB PNG
>>
File: 1742154442468740.png (3.45 MB, 1440x1920)
3.45 MB
3.45 MB PNG
the distilled lora training doesnt work very well. it seems to have a very hard time capturing the likeness of faces compared to flux. it works well enough up close, but as soon as you zoom out for a full body shot the face becomes almost unrecognizable.

the lora was trained on 4000 steps, 3 curated 1024 pixel buckets, around 60 images with plenty of full body photographs, rank 4 and the rest default settings.

the same exact dataset made an excellent lora for chroma, so the images themselves should be fine. i hope this is a byproduct of the sketchy distilled training method or because of some new caption rules, and not a sign of model/6b limitations etc.
>>
>>107368107
>That's the cost of westernoids letting chinks win.
Well, if it was the west winning, the default would be a black person

I'm fine with china taking the throne
>>
>>107368192
>if it was the US winning
Fixed that for you.
>>
>>107368170
I like the juxtaposition of stylistic elements.
>>
This model is OK
https://litter.catbox.moe/wtbt531izi44pq51.png
>>
I can't get z to do facesitting, it's not as easy as 'sitting on head, face between buttocks', it just won't do it

I also seem to have great difficulty getting one person to grab the crotch of another person, I can get their hand close with "reaching down" but not straight up grabbing
>>
>>107368211
After experimenting a bit it's clear to me it's a typical distilled model. Very pretty looking images, but shitty once you try to go outside of the few concepts they baked into it.
I predict that the base model is not going to be better than flux dev.
>>
>>107368173
It's certainly not perfect but way better than I expected for a distilled model, bodes well for the full one. Wish I could post my gens of it here but it will just enrage the jannies.
Settings I used in screenshot (just stolen from someone on r*ddit), only 14 images in the dataset cause I just had some laying around and wanted to see what happened. Took approx 1h 40m to train.
As for using it, a strength of 0.85/0.9 seems to work well, honestly results are better than my attempts to make loras for SDXL etc already.
>>
>>107368231
Sorry for being a noob, but what tool is this?
>>
>>107368263
ai-toolkit
>>
>>107368231
rank 32 is crazy for 14 images, how big are those safetensor files? and if its a emma watson lora, you're training it on a person the model already knows, which in theory should make it way easier for it to learn. i guess i'll try sigmoid next, but i'm unsure if i'm going to be bothered making any more distilled loras.
>>
>>107368265
Thanks
>>
>>107368231
>Rank 32
Please don't do this
>>
>>107368269
Yeah for sure it helps it has a rough idea of who she is already.
>>
>>107368229
Z-Image Turbo is already much better than Flux dev

And as always, the true potential of a model is shown when you start training loras/finetunes, this is the reason SDXL is still so widely used
>>
>>107368269
also
>>
Could Z Turbo survive being raped by 13M images though?
>>
>>107368231
Diffusion-Pipe dev is also implementing Z-Image Turbo training, seems everyone is too eager to wait for base, can't blame them.

https://github.com/tdrussell/diffusion-pipe/issues/462
>>
File: 1746328348905172.png (1.93 MB, 984x1317)
1.93 MB
1.93 MB PNG
god damn it. i thought "lying on her back" generating up-side-down women was a FLUX specific problem, but the same exact thing happens with z-image for some reason.
>>
>>107368275
To be fair we don't know where training Z-Image Turbo lands in terms of optimal Lora rank, but yes it's hard to imagine having to go over 16 when training a single person.
>>
>>107368292
I'd say turbo is better than schnell, but it suffers from the same narrowness. Maybe it's better than dev, I have not tested it enough. What I'm saying is they're hiding the flaws behind the "portrait-optimized" (1girl generator) distill, and that the base model is going to strike everyone as more mediocre. We'll see.

Do you have some good prompts for comparing Flux and Z turbo?
>>
Unless z-image can do good anime/hentai it's worthless
>>
>>107368333
It can do anime, but it looks chinese as fuck
>>
anon, your warning about that tranny censor guy was spot on, we're so fucking back. also, that chroma lode dude is in there too, discussing about his Chroma.
>>
>>107368331
I suspect the gap between the quality ceiling and floor will be widened which will cause many to think it's mediocre at least initially.
>>
>>107368346
nobody knows what youre talking about
>>
>>107368347
My dude, that's a cope if I've ever seen one. A preemptive cope no less. The truth is we just don't know yet.
>>
everyone's talking about the base model but if it's released, is going to be a 32b monster or the same size like flux de-distilled? It'll just be a lot slower, requiring more steps, right?
>>
>>107368353
prev thread
>>
>>107368359
Yes, I am coping hard. Coping is all we have left.
>>
File: Comparison.jpg (1.73 MB, 2880x2880)
1.73 MB
1.73 MB JPG
I'm not sure it trains better on subjects it has knowledge of. Just threw these two datasets from my wan loras at ZImage
Top no lora, bottom loras
It obviously has a decent understanding already of Rapunzel but not so much Helen Parr but it seems to have learned each equally as well.
Also the prompt was "gang sign" not peace "peace sign"
>>
File: zimage_00083_.png (2.24 MB, 1024x1024)
2.24 MB
2.24 MB PNG
>>
does the z stand for zhang
>>
>>107368374
Looks like it picked up the characters spot on

Did it learn faster compared to Wan ?
>>
How do you correctly set a local path for the model in Ostris? I keep getting the error about the config.json not being found even though I replicated the whole huggingface directory structure for z-image turbo.
>>
>>107368403
it stands for zlop
>>
File: file.png (47 KB, 233x109)
47 KB
47 KB PNG
>>107368325
>>
File: zimg_425_.png (2.53 MB, 1536x1536)
2.53 MB
2.53 MB PNG
>>107368362
>but if it's released
They already stated it will be released

>is going to be a 32b monster or the same size like flux de-distilled?
It's not going to be anywhere near 32b since the whole reason for this model is to prove that you can get great quality without huge scale, at most it will be 12b, but I doubt it will even be that
>>
>>107368406
>Did it learn faster compared to Wan?
Seems so, did 2000 steps in around half the time of wan (1/4 the time if you account for both low and high noise models)
It's hard to make a direct comparison though, I train wan with musubituner
>>
>>107367932
Put the Asian waifu head on the roastie body and you have perfection.
>>
>>107368464
>They already stated it will be released

I'm 90% sure they will but this would not be the first time a Chinese company has said they will release a model after experiencing some success and then renege on that promise for unknown reasons.
>>
File: flag of the internet.jpg (1.3 MB, 3824x2144)
1.3 MB
1.3 MB JPG
>>
>>107368484
When did this happen, what company and what model was officially announced to be released and then wasn't ?
>>
>>107368492
3D 2.5
Wan 2.5 was originally slated to be open source (if you believe the leadup)

There's also a few literal who video models that got to the "We are waiting for approval" stage and then ghosted.

It definitely happens.
>>
>>107368500
>3D 2.5
What was this model and by whom ?

>Wan 2.5 was originally slated to be open source
By twitter randos, nothing official at all
>>
>>107368513
Hunyuan 3D 2.5
And that was confirmed on discord by the staff.
>>
They're awake, active on HF. I trust the plan.
>>
File: zimg_437_.png (2.26 MB, 1536x1536)
2.26 MB
2.26 MB PNG
>>107368466
Nice what resolution(s) did you train at ?
>>
>get back from doing core day at gym
>lie in bed immobilzed and all I want to do is gen

I can solve this with a remote tablet, can't I?
>>
File: Wan garbage 3.jpg (162 KB, 1850x871)
162 KB
162 KB JPG
This is taking absolutely FOREVER. What have I done wrong?
>>
>>107366189
This
>>
>>107368530
1024 with 0.0002 learning rate
But they're a little overfitted so I'm retraining with more steps at 0.0001 lr
>>
>>107368551
>wan textimage2video
>huanyun
>2.2 vae

I dont have a reaction image for this.
>>
>>107368565
hold on, i do:
>>
>>107368565
>>107368574
There's nothing wrong with that
>2.2 vae
This is an actual thing for the 5B model, only 14b uses the 2.1 vae
>>
>>107368582
oh, i hadn't even looked at his workflow i just wanted an opportunity to call someone retarded since its been a while
HAHA RETARD >>107368565 his workflow's fine
>>
File: 1764382660487872.png (1.1 MB, 1200x1024)
1.1 MB
1.1 MB PNG
Are you READY for another beautiful day of BLACKED RICE WHORES? (after I buy my groceries)
>>
File: 1747019699077567.png (18 KB, 250x250)
18 KB
18 KB PNG
so how are you coping with the fact that you have to write your prompts in chinese to get descent results?
>>
>>107368551
If you need to cope with 5B then give up on video.
>>
>>107368597
youre obsessed
>>
>>107368602
just reorder the prompt
>>
File: 1764382268972453.png (978 KB, 1024x1024)
978 KB
978 KB PNG
>>107368602
>descent
>not decent

Good morning Zhaar
>>
File: 1749531631868919.png (316 KB, 587x441)
316 KB
316 KB PNG
>>107368612
no i'm trying to generate images of the 1995 space ship doom clone
>>
>>107368602
Erm, achksully *pushes up glasses* it's not necessary, It just helps certain prompts. Retard.

Like this one was pure english, starbucks and all. Kek i love that this fucking model has copyright knowledge, so good.
>>
>>107368618
Just a crippled chroma
>>
File: 1757108028248680.png (33 KB, 155x143)
33 KB
33 KB PNG
>>107368618
>>
File: 1764384224951926.png (1.21 MB, 1200x1024)
1.21 MB
1.21 MB PNG
>"Uooh, your 9B crotch smells so nice.. unlike Z-hang's pathetic 6B cock.."
>>
>>107368631
she lost a tooth from drinking too much pumpkin spice lattes every season, don't judge.
>>
File: ComfyUI_00125.png (2.69 MB, 1200x1800)
2.69 MB
2.69 MB PNG
>Workflow contains unsupported nodes (highlighted red)
>Remove these to run the workflow
>nothing highlighted because I applied the fixes for them
Fuck off, Comfy... they're working. I can't believe 'ol moneybag's open sores program can't tell when something in a workflow is actually functioning or not.

>>107368374
Since Z uses Flux's VAE, you should use one of the EQ VAEs out there when you train. The results are always much better with one than without.
>>
File: 1733871205282662.png (161 KB, 250x349)
161 KB
161 KB PNG
>finetunes will fix it
>>
>>107368652
What text encoder does Z use?
>>
>>107368328
>>107368275
>>107368269

Could rank 32 harm the lora or is it just the problem that its overkill?

Is rank 32 good if training a style instead of a single person?
>>
Usecase for Z-image? I'm being dead serious. It's suck in a middleground of good but not good enough.
>>
>>107368652
>The results are always much better with one than without.
Your image looks like it's made of fabric
>>
>>107368700
deslopper refiner
>>
>>107368687
from what i can understand a high rank basically lets a lora learn many different things, but a high rank also becomes more dilluted meaning things like a persons likeness wont be as rigid. a rank of 1 will give you as perfect likeness as possible, but it's going to be very limited in doing things your lora wasnt specifically trained on.

so you should use a higher rank if its a broader concept, and use a lower rank if its a specific thing you want to generate.
>>
>>107368677
...anon
>>
>>107368723
makes sense, thanks
>>
Fresh thread
>>107368734
Fresh thread
>>107368734
Fresh thread
>>107368734
>>
>>107368724
Anon what nigger, is it just the same as flux on vae and text encoder?
>>
>>107368687
>Could rank 32 harm the lora
Theoretically a lower rank is more 'focused' and thus better when training a single concept (assuming it's not a lot of images, like thousands)

>if training a style instead of a single person?
Rank 16 should be more than enough for a style unless you train a huge amount of images of said style
>>
>>107368777
interesting
>>
File: file.png (1.46 MB, 1152x896)
1.46 MB
1.46 MB PNG
>>107368574



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.