[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107460114

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
File: qwen_00105_.png (1006 KB, 832x1248)
1006 KB
1006 KB PNG
>>
File: ZImage_00855_.png (966 KB, 1152x896)
966 KB
966 KB PNG
>>
File: is_it_really_base_2.png (2.61 MB, 1621x1579)
2.61 MB
2.61 MB PNG
We might not get the pre-train base model. We might get the SFT one with slop merging.
>>
File: zimage df11.jpg (835 KB, 2048x2048)
835 KB
835 KB JPG
I was bored so I messed with this a bit
https://github.com/mingyi456/ComfyUI-DFloat11-Extended
I have 12gb vram card so I typically run z-image bf16 with circa 2gb offloading for 1024p images.
In this setup df11 actually runs SLOWER than bf16 despite easily fitting into my VRAM. Apparently the decompression has an overhead.
So I tried 1520x1520 image (relying on offloading more) and the gap got smaller. This made me hope that it would run faster on 2048p but when I tried it, I had the same story as 1520x1520. No idea how that works but it is still very slightly slower than bf16.
Maybe this shit is more useful for /lmg/? But I doubt anyone would bother with it over Q6_L or whatever.
TLDR: Not worth it
>>
>>107463254
why does ai still struggle with feet?
it seems like there hasn't been much progress on it compared to hands
>>
>>107463290
I don't care that much as long as it doesn't hurt NSFW finetuning.
>>
File: file.png (2 KB, 180x54)
2 KB
2 KB PNG
270$ a few months ago btw loooooooooool
>>
>>107463257
Yes, it means that our tuning will have to waste time undoing their tuning, and will probably produce worse results due to conflicts. The SFT itself can be thought of as the 'slopification' step.
>>
>>107463308
Most photos don't show people's feet but you can see people's hands easily.
>>
Is chroma still being actively worked on by lodestone or is the model done?
>>
>>107463312
It will hurt all turning
>>
>>107463296
If you have Ampere card, bf16 is hardware accelerated and other stuff isn't. This is why bf16 is still faster even when its getting offloaded.
>>
File: ComfyUI_00235_.mp4 (1.97 MB, 832x640)
1.97 MB
1.97 MB MP4
>>107463308
what's wrong with the cat's feet? i dont know im not a foot fag
>>
File: 1736521275153099.jpg (165 KB, 732x1000)
165 KB
165 KB JPG
>>107463318
theres millions of high quality stock feet pics
no excuse at this point
>>107463344
the cat is missing a toe
>>
>>107463316
>will have to waste time undoing their tuning,
Why? Why would we need to undo it? Isn't it just an aesthetic alignment? How do you know the base model's state is more favorable to a NSFW tune than this?
>The SFT itself can be thought of as the 'slopification' step.
Pre-train images don't look any better>>107463290
Especially the fox looks sloppier in pre-training.
>>
My preferred cope is what that other anon said in the previous thread, i.e. that they rushed out Turbo purely to shit on BFL and the base model wasn't quite finished yet.
>>
>>107463332
I see.
Makes sense I suppose.
Maybe useful to earlier GPUs then.
>>
>>107463367
this is basically what that one chinese leaker said, that base still hasnt finished training
>>
File: ComfyUI_00237_.mp4 (428 KB, 832x640)
428 KB
428 KB MP4
>>
>>107463325
no its all z now
>>
>comfy cloud
>7400 credits ≈ 5.5 GPU hours per month.
>$35
imagine paying for this shit
>>
>>107463393
hosting a gpu renting service is one of the only ways to make money with ai so i cant blame them
>>
>>107463325
no idea but for now, there's chroma spark https://huggingface.co/SG161222/SPARK.Chroma_preview or https://huggingface.co/silveroxides/Chroma-Misc-Models/tree/main

i havent had the chance to try it out yet
>>
As an anon who likes genning, Turbo released a week ago and I still haven't covered 1% of it, got lot more to try.
Why are you so anxious for base?
Did you already explore everything Turbo can do? These models are deep as hell and you at least a year to start to know one properly.
I'll admit with shame that I've been on SDXL this whole time and still don't know it fully yet.
So I ask again, why are you so anxious for base?
>>
>>107463325
I think he is still training Radiance?
Last update seems to be 9days ago.
Speaking of which, can anyone tell me how much more VRAM Radiance uses and how slower it is compared to base chroma, just a ballpark figure?
I am curious about that.
>>
>>107463393
ComfyUI is SaaS adware and should be removed from the OP.
>>
>>107463353
the base model has a very wide range of possible outputs - this includes 'bad' outputs. the examples shown include some of those. sft or similar tuning constricts this range out outputs to a much smaller set that the creators think will be suitable. when we talk about 'slop' we're really referring to a model having strong preferences in its output style (implicitly ones we disagree with) that are difficult to override. to avoid slop, you want a model with the widest range of possible outputs and lowest level of baked-in preference, even though this easily allows for 'bad' output. think sd 1.5
>>
File: ZImage_00902_.png (1 MB, 1152x896)
1 MB
1 MB PNG
>>107463383
cool!
>>
>>107463245
Will I be able to generate short video clips on a 4070, or is this card too weak?
>>
>>107463325
At this point would pretty stupid to keep working on it, might at as well burn that money, at least would warm your house for a while
>>
>>107463367
>>107463427
Even with a much better image quality I rather gen under 30 seconds and upscale later than taking 5 minutes for a 1girl.
>>
File: ZImage_00905_.png (802 KB, 1152x896)
802 KB
802 KB PNG
2am, going to sleep gn
>>
>>107463440
yes but it will take 20-30 mins
>>
>>107463466
fvark
i'll just learn how to draw
>>
>>107463440
people are genning videos on 3060s brother
>>
>>107463427
>Why are you so anxious for base?
High quality COOM tunes need base, that's all.
In vacuum I would be grateful to alibaba for just giving the turbo for free but we NEED a sane size and good quality model to espace SDXL hell. (No chroma failbake doesn't count unless someone succeeds at unfucking it)
>>107463423
He stopped training it and seems to moved on to something else.
Maybe the anons accused it of being a grift were right, or maybe it genuinely didn't work out.
>>
>THE BASE WILL COME OUT IT JUST HAS TO YOURE JUST IMPATIENT IT WILL COME OUT IT WILL IT WILL IT WILL
it's just sad at this point honestly
>>
>>107463368
I had a table of the supported floating point/int formats but can't find it.
But anyways only the very latest cards benefit from the more exotic float/int formats.
>>
>>107463470
If it's as bad as this anon >>107463466 said then it's not worth it lol
>>
>>107463466
only if you dont add any of the speed boosts. looking at around 6-8 minutes maybe even less depending on the resolution
>>
>>107463455
>5minutes
Oh you're a VRAMLET, you're a second class citizen, your opinion is worthless. Delete it so it doesn't clog up the thread.
Thanks.
>>
>>107463431
Same speed, but the VRAM reqs go up like crazy since you have no vae compression and are rawdogging it.
>>
>>107463480
lol its up to you, dont let 1 random sway your pursuits, research and make up your own mind
>>
>>107463434
Makes sense.
But it should be still better than distill for finetuning and hopefully still receptive enough.
>>
>>107463185
So instead of a yapping model, now we have one that has frozen faces because it's trained on stills kek.
>>
>>107463488
Fair enough, I'll look more into it.
>>
>>107463487
>Same speed
Really? Interesting
>but the VRAM reqs go up like crazy since you have no vae compression and are rawdogging it.
I understand that but I was curious what this crazy roughly equals to. Say 16 extra gigs compared to base chroma for same resolution?
>>
Fuck chroma is so good at making ugly fucking bastards. Might gen a shit ton and train a lora someday.
>>
File: Z-image turbo.png (1.27 MB, 1280x720)
1.27 MB
1.27 MB PNG
>>
File: 00031-2447428105.png (1.16 MB, 896x1152)
1.16 MB
1.16 MB PNG
>>
File: ComfyUI_00188_.mp4 (253 KB, 640x640)
253 KB
253 KB MP4
>>107463254
>>
>>107463533
>pornably
>>
>>107463529
The one good thing about Chroma being made by a furry is that it produces a decent werewolf
>>
File: zturbo_00006_.png (1.55 MB, 1024x1024)
1.55 MB
1.55 MB PNG
i got the nag working, i had to use some guy's fork. i have been using a1111 this whole time, switching to comfy for this. will have to install adetailer and figure out inpainting and stuff with this retarded ui
>>
Reminder that ComfyUI secretly logs your prompts
>>
>>107463556
>i got the nag working, i had to use some guy's fork.
are you using the right parameters?
>cfg 1, nag_scale 3, nag_tau 1, nag_alpha 0.25, nag_sigma_end 0.75
>>
>>107463525
No nothing like that. But on my 12gig on q8 I oom on radiance when genning in 1920*1088 while normal is perfectly fine.
>>
File: Zurbo_00015_.jpg (1.3 MB, 3328x1792)
1.3 MB
1.3 MB JPG
Vidya
>>
>>107463552
Well, I'll admit, some furries are suspiciously talented
>>
>>107463570
Oh shit, link the code line from github so I can comment it out. Thanks anon!!
>>
>>107463586
Tis is radiance furina?
>>
>>107463427
The issue I have is how it wants to stick to a look. Smaller adjustments often don't get recognized at all. Sometimes it feels like it forgets a part of the prompt after a while. Maybe it's an issue with English prompting but it doesn't make it less annoying.

There's also the problem with random seeds doing very little to change the result.

If base is good for loras and finetunes then it should open up a whole new dimension to z-image.

>>107463455
>5 minutes
What the fuck. My ZIT gens take like 10-20 seconds.

>>107463440
You can do it but you'll have to compromise with time spent per gen, resolution and quality. Having done it with a 2070 I don't think it was worth it. Higher res wan genning gives so much better results.
>>
>>107463570
my network mode is set to public and security level is weak should i be worried?
>>
>>107463586
May I request something?
While moving on to the sampler node Comfy is supposed to dump a line like
loaded partially; 8206.90 MB usable, 7981.13 MB loaded, 3758.42 MB offloaded, 225.00 MB buffer reserved

this to the terminal/cmd
While testing both at the same quant (q8 or whatever) and genning an image at the same resolution, can you show how these two values differ?
>>
https://civitai.com/models/2198268/zit-miku?modelVersionId=2475118
Finally, Miku, exactly what Z-image turbo lacked!
>>
>>107463659
*these two values differ between normal chroma and radiance
>>
File: zturbo_00019_.png (2.82 MB, 1920x1088)
2.82 MB
2.82 MB PNG
>>107463573
yeah i copied the settings from the reddit post.
>>
>>107463690
pretty cool image anon
>>
>>107463570
what does logging here mean? log to where?
>>
File: 00046-1033016205.png (1.44 MB, 896x1152)
1.44 MB
1.44 MB PNG
aaaah, this brings back memories...
>>
>>107463738
your prompts get logged to the comfyui server
>>
>>107463738
Andy's logs?
>>
>>107463740
When will this stupid bitch apologize for her car?
>>
>>107463753
I did a recursive grep and there is nothing.
>>
>>107463779
Apologize for her car? wat?
>>
File: Styles comparison.jpg (2.78 MB, 4095x1536)
2.78 MB
2.78 MB JPG
>>
>>107463738
https://github.com/Comfy-Org/ComfyUI-Manager/issues/2193
>>
Tomorrow will be released, right?
>>
>>107463843
nobody believed the anon warning anons about this for almost a year, nobody bats an eye. some random fucking redditor says it's happening and suddenly it's a surprise.
>>
I get Forge users, I get Swarm users, but I can't wrap my head around Comfy users when they're talking about all these technical aspects of Comfy.
Are they using Swarm as a frontend, or they are using undiluted Base Comfy
>>
>>107463780
thanks I just needed for somebody who knows whats going on to lay to rest my suspicions. I've been genning ... some fucked up shit lately
>>
>>107463843
>>107463875
What are the api calls doing?
>>
>>107463843
>>107463875
how do I block this?
>>
File: 3555622322.png (1.18 MB, 896x1152)
1.18 MB
1.18 MB PNG
>>
>>107463892
Using Forge
>>
File: 664076285.png (1.14 MB, 896x1152)
1.14 MB
1.14 MB PNG
>>
>>107463906
>>107463893
>asians
Comfy damage control squad?
>>
comfyorg collapse 2026
>>
File: qwen edit 2509_00004_.png (753 KB, 1160x896)
753 KB
753 KB PNG
>>
>>107463928
Powerfull
>>
Wansisters, is there a way to make native wan generation as fast as kj nodes? When messing around with the SVI workflows I noticed how retard fast it genned after the 1st gen.
>>
>>107463948
u mean wankers
>>
File: q8.png (22 KB, 1151x170)
22 KB
22 KB PNG
>>107463659
q8
>>
File: 00051-3897548588.png (1.02 MB, 896x1152)
1.02 MB
1.02 MB PNG
>>107463897
How do we know Forge isn't logging?
>>
>>107463948
Haven't tried the SVI workflows, KJ wasn't really faster for me than native tho (assuming both use ltx2v and sageattn or w/e setup you use)
>>
File: radiance.png (9 KB, 1122x98)
9 KB
9 KB PNG
>>107463963
>>107463659
Radiance q8
>>
did any training ui or inference ui add longcat-image yet?
>>
>>107463897
>Gradio
>block spying
kek
>>
>>107463890
I assume it is just building a local cache of https://api.comfy.org/nodes which has 329 pages of JSON.
>>
>>107464047
makes sense
>>
>>107463958
Kek you're not wrong

>>107463998
Yeah kj nodes used to be painfully slow months ago for me but for whatever reason its now almost twice as fast as native. Yeah their workflows are lightx2v. Also yes on the sageatten but it's all woct0rdho including radialattn https://github.com/woct0rdho/ComfyUI-RadialAttn

There's SVI 2.0 now but still waiting on the 2.2 workflows to drop https://github.com/vita-epfl/Stable-Video-Infinity/tree/svi_wan22
>>
>>107463963
>>107464010
I appreciate it, merci
>>
>>107464084
>It's also recommended to install SageAttention, and add --use-sage-attention when starting ComfyUI. When RadialAttention is not applicable, SageAttention will be used.
So does that work with PatchSageAttentionKJ node? I don't like applying sage globally.
>>
>>107464084
i don't have much experience with that but i can't really see why it'd be slower if you also use radial attention and maybe also torch compile in native.

hard to diagnose I suppose
>>
File: FISH.png (1.3 MB, 896x1152)
1.3 MB
1.3 MB PNG
>>
im upscaling a video with seedvr2. i was using a workflow that used this to upscale images and its really good. i dont really know what i am doing but i uploaded a video nodes so it upscales each frame then il going to combine it back to a video
>>
>>107463245
Coming back after a while, have we seriously not got anything better than SDXL for fully uncensored base model? You can layer LoRAs but not ideal exactly for anything complex. I feel like slop generation hasn't progressed much and kinda saddens me, am I missing something? SDXL feels so dated has worse prompt understanding than a Indian
>>
File: 1960220049.png (1.41 MB, 1024x1536)
1.41 MB
1.41 MB PNG
>>
File: ComfyUI_temp_qhhpa_00005_.png (3.31 MB, 1280x1280)
3.31 MB
3.31 MB PNG
oink
>>
>>107464162
We are most likely getting a decent Z-Image porn tune in the following months if the Chinese don't fuck us over with the base model release.
We have Chroma which knows insane amount of stuff including NSFW but also too fucking schizo to be used reliably. If you have a powerful card you can make the seed lottery work.
SDXL has a retarded ancient text encoder so yes it is notoriously awful at understanding prompts.
>>
>>107464130
I have no idea. I do use "Model Patch Torch Settings" then enable the fp16 bit but I use that for native only.

>>107464135
Wish I knew. Its a shame because its so fast it also kinda ignores prompts even with loras, then again I havnt had the time to properly do an in depth test. Speaking of speed boosts, if this ever releases https://github.com/dvlab-research/Jenga we could be looking at image speeds for video
>>
>>107464176
>>107464180
prompt?
>>
File: 1675872417.png (1.8 MB, 1024x1536)
1.8 MB
1.8 MB PNG
>>
>>107464180
*SNNNIIIFFFF, this smells like chroma
>>
File: comfyui trash.jpg (18 KB, 426x415)
18 KB
18 KB JPG
>>107464160
im so sick of this fucking garbage. from now on i am going to have to safe every single intermediatory product from anything that is produced by this trash and have mini work flows set up that i can initiate. what a waste of fucking time.
>>
>>107464207
they should make those life sized chink bots with shells like this
>>
is this even a blue board anymore or what?
>>
File: ComfyUI_temp_qkqtm_00001_.jpg (442 KB, 1600x1152)
442 KB
442 KB JPG
>>
>>107464160
>upscaling a video with seedvr2
Is it better than using an "oldschool" upscale model?
>>
File: ZIT_00631_.png (2.28 MB, 1152x2048)
2.28 MB
2.28 MB PNG
>>107464273
WDYM? This is clearly blue board content.
>>
Finally got z-image to do a shot like this!
>>
File: 202134449.png (1.7 MB, 1024x1536)
1.7 MB
1.7 MB PNG
>>107464264
They'd sell
>>
File: ComfyUI_temp_qhhpa_00014_.png (2.67 MB, 1040x1440)
2.67 MB
2.67 MB PNG
>>
File: latina girl closeup.jpg (367 KB, 1408x966)
367 KB
367 KB JPG
>>107464287
ive used upscalers in the past for images and they always fuck things up, but this is really good. anything it messes up its because there isnt enough information in the base image. its not 100% perfect, I would still try to run it through img2img to get more details. but if i had an image, then i upscaled it, then i downscaled it back to the original size then it has more details, it doesnt seem to mess up the lighting or colors very much
>>
>>107464334
You're using the ComfyUI-SeedVR2-VideoUpscaler?
>>
File: ComfyUI_temp_qhhpa_00017_.png (1.98 MB, 1040x1440)
1.98 MB
1.98 MB PNG
>>
>>107464300
topkek

can't hack me head, I have the app from Ubuntu software store "Extension Manager" and the extension "Grayscale Windows" installed.

>>107464303
prompting strategy for the view?
>>
>>107464362
It didn't seem to really work until I told it WHAT she was doing? I don't fully understand it myself yet.
>>
File: 1737899093461916.png (1.71 MB, 1120x1440)
1.71 MB
1.71 MB PNG
>>
File: 1737036941184105.png (1.26 MB, 1120x1440)
1.26 MB
1.26 MB PNG
>>107464392
>>
>>107464357
no i used this work flow that had the upscaling in it https://civitai.com/models/1376005/photoflow-z-image-turbo-qwen-chroma-wan-2221-sdxl-t2i-text-to-image-txt2img-workflow. i short circuited the upscaling so i could upload images directly. then i tried to upload a video and feed it through so it would batch upscale each frame then recombine the frames into videos. i wasnt away there was an upscaling node just for videos
>>
File: ZIT_00640_.png (2.41 MB, 1536x1536)
2.41 MB
2.41 MB PNG
>>107464303
>A close up, ultra wide lens angle, of a
Doesn't seem to do anything for me.
>>
File: 1659193385.png (1.71 MB, 1024x1536)
1.71 MB
1.71 MB PNG
>>
>>107463296
>12gb
why not use Q8?
>>
File: ComfyUI_temp_qhhpa_00021_.png (2.56 MB, 1040x1440)
2.56 MB
2.56 MB PNG
>>
Is there anything like mmaudio yet where I can add sound to existing video for local? Not talking but sounds
>>
>>107464429
I thought mmaudio was already local
>>
>>107464429
>mmaudio
That is local... I think hunyuan also released one
https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley
>>
>>107464412
sexo
>>
>>107464410
Like I said, I don't truly understand it yet.
>>
Should I buy a 3090 for slop generation?
>>
>>107464084
How much speed up do you really get with this over sage?
It seems there are no pre-built wheels available outside of windows. I am curious if it is worth compiling.
>>
>>107464458
ZIT doesn't seem to like camera control much. Maybe there's some secret sauce we need to discover.
>>
how long until they start banning local?
>>
>>107464422
I don't want to degrade quality too much, which is already decreased in the distill.
I was thinking about trying that when (if) base model releases and more steps are needed.
I can bear the current gen speed.
>>
>>107464471
Seems to like 'overhead photo'? Maybe because it was a trendy photo style at one time?
>>
>>107464464
buy a 5090 while you still can
>>
>>107464445
Heh, I mentioned local because I know some dickweed is going to mention api shit.

>>107464448
Oh it even has comfy integration, thanks!
>>
can i img2img z-image?
>>
>>107464481
I'll try that once my gpu is done with these vids.
>>
>>107464490
Yes, >>107464374 is an image to image using z-image
>>
>>107464481
like skate park fisheye
>>
>>107464464
There are better purchases but it can be worth it.
Definitely go second hand though. Not worth it first hand.
>>
>>107464501
Man, 'fish eye' really does good! Can't believe I didn't think of that!
>>
>>107464515
zoo wee mama! It works really well!
>>
>>107464409
nice did you upscale a video then? or just the image
>>
>>107464529
Prompt for this?
>>
why would you release a schnell version of an unfinished base model?
why wouldnt they release the base model if it was finished when they made the schnell version?
>>
>>107464554
"A super close up, fish-eye lens photo of a girl sitting on the ground outside of a bar at night. Fresh snow is on the ground around her. She has a cigarette between her lips, smoke coming off it. She's doing a peace sign over her eye."

Man, gotta say, learning about new things is fun!
>>
>>107464464
i just got a used one recently
>>
>>107464556
>why would you release a schnell version of an unfinished base model?
because its still better than flux2 and nuking that off the face of the planet is a big win
>why wouldnt they release the base model if it was finished when they made the schnell version?
because it wasnt
>>
File: upscaled.jpg (2.07 MB, 3046x3046)
2.07 MB
2.07 MB JPG
>>107464549
no it failed >>107464231 been having problems with video combine and it sometimes shits the bed.
>>107464529
>>
>>107464468
The first generation is 2-3 minutes then hovers around 40 to 50 secs for each batch. The default workflow for SVI is at 8 steps but I set it to 4. Then again, my resolutions are tiny like 640 by 512 kek. Going from 15 minutes to generate 5 seconds to less than 5 minutes to generate 15 secs is pretty wild (obviously not counting loading the models). If you go for radial attn, make sure you use woct0rdhos sage, sparge and triton

>>107464474
After all the bullshit going on, wouldnt be surprised. Make sure you make many backups.
>>
>>107464588
Think you could upscale this?
>>
>>107464529
amazing how much better that looks in grayscale lol
>>
>>107464515
>AI is already this good
it's over
>>
>>107464485
>literally 5x the price of a used 3090
no

>>107464508
>There are better purchases
Like?
Used is the intention
>>
File: upscaled2.jpg (1.75 MB, 3046x3046)
1.75 MB
1.75 MB JPG
>>107464600
no, this is the first frame. i have to figure out why the video combine fails half the time. i am thinking i can just batch up scale every single frame then put them into a video again
>>
>>107464605
Damn, you're right. I might try some greyscale images here after these videos gen.
>>107464622
I think once video gets on par with image gen, then it's over.
>>
>>107463245
I have a RX 6800 currently, my experience with it has been quite dogshit possibly of no fault to AMD as I think it may be slightly defective. It works fine 99% of the time no artifacting even, but I randomly have my screen go black usually followed by a GPU reset, AI stuff is especially bad and can almost instantly trigger a crash unless I run it without a GUI. I'm thinking of getting a new GPU what are you guys experience with newer AMD GPUs e.g. 9000 series? Or should I just go back to NVIDIA? Newer NVIDIA cards seem to have weird issues on Linux, my 1080 was fine though. I assume it's just my retarded card losing silicon lottery hard because it has been good experience if it wasn't for the constant crashing, I think it's probably faulty VRAM modules just a hunch though
>>
Wen comfy?

>SGLang Diffusion + Cache-DiT = 20-165% Faster Local Image/Video Generation
>SGLang integrates Cache-DiT, a caching acceleration engine for Diffusion Transformers (DiT), to achieve up to 7.4x inference speedup with minimal quality loss.

https://www.reddit.com/r/LocalLLaMA/comments/1pg8jtk/sglang_diffusion_cachedit_20165_faster_local/
>>
>>107464643
Oh cool! Thanks! I need to figure out upscaling in SwarmUI.
>>
>>107464645
photography is crazy fun lol
>>
>>107464590
>make sure you use woct0rdhos sage, sparge and triton
These just look like prebuilt wheels of respective packages I don't think it specifically depends on these
Eh I might give it a shot.
>>107464638
5090
For used 3090 is the best value though.
>>
>>107464690
That's my primary use for this stuff! I just wish I was better at prompting, but I'm getting better. I'm so used to the old days of using tags. Natural language doesn't feel natural for this stuff lol.
>>
File: qie_00003_.png (1.31 MB, 1240x840)
1.31 MB
1.31 MB PNG
>>
>>107464720
zimage is ONLY natural language?
>>
>>107464756
No, but tags don't see to work as well as they used to. At least, from my testing. I could just be full blown retarded desu.
>>
>>107464756
>zimage is ONLY natural language?
not if you train your own/use a lora
>>
why are there so many nipple loras for zit
>>
>>107464693
>I don't think it specifically depends on these

Turns out all 4 were needed, I spent a month trying to figure out why I kept getting errors. I uninstalled regular triton and sage, then installedall woct's stuff and it worked. Thanks to an anon many, many threads ago he mentioned to not mix them.

https://github.com/woct0rdho/triton-windows
https://github.com/woct0rdho/SageAttention
https://github.com/woct0rdho/SpargeAttn
https://github.com/woct0rdho/ComfyUI-RadialAttn

Also for the fp16 bat file --use-sage-attention --fast fp16_accumulation --disable-api-nodes
>>
>>107464756
>>107464767
It is not trained on them specifically but "A woman, standing, alone, winter..." style prompting works somewhat ok since TE is smart.
Moderate length natural language paragraphs give best results (Anatomy errors start to appear when you give too verbose prompt)
>>107464776
A lot more easier to train cunts, cocks and sex and also useful for coom
>>
>>107464776
because it cant do them and they are the basic thing for everyone genning anything even remotely nsfw
>>
>>107464776
>>107464783
funny enough all the nipple loras destroy whole model
>>
best zit nipple lora so far?
>>
>>107464795
they fuck details but you can bring them back with just using any dual sampler workflow to fix details for the last few steps like one at top of https://civitai.com/models/2093591
>>
>>107464780
Where do we get lora now if not civit?
>>
is there a nsfw version of this thread? i checked the nsfw boards and its pure sloppa
>>
>>107464777
I skimmed through commits and triton seems to be a generic pre-built but he seems to have made important changes to sage and sparge over base repos.
This makes it even more tedious since I don't want to recompile sage.
I am certain I won't bother now, but the knowledge is useful so thanks.
>>107464805
There is not a single good place whatsoever to get loras.
Some people hide based stuff on huggingface, but by design it is ass to find.
>>
>>107464676
I forgot to add their blog, it talks more about it https://lmsys.org/blog/2025-11-07-sglang-diffusion/

>We are excited to introduce SGLang Diffusion, which brings SGLang's state-of-the-art performance to accelerate image and video generation for diffusion models. SGLang Diffusion supports major open-source video and image generation models (Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux) while providing fast inference speeds and ease of use via multiple API entry points (OpenAI-compatible API, CLI, Python interface). SGLang Diffusion delivers 1.2x - 5.9x speedup across diverse workloads. In collaboration with the FastVideo team, we provide a complete ecosystem for diffusion models, from post-training to production serving. The code is available here.

>Optimize Wan, FastWan, Hunyuan, Qwen-Image series, FLUX
>Support LongCat-Video

Possible comfy coming https://github.com/sgl-project/sglang/issues/13024
>>
File: 1751828639826060.png (1.43 MB, 1120x1440)
1.43 MB
1.43 MB PNG
used chroma for like a year and this is the first furry gen I made.
>>
File: 1764171647478574.png (1.47 MB, 1120x1440)
1.47 MB
1.47 MB PNG
>>107464856
do furries use chroma? like on /trash/?
>>
>>107464891
I tortured myself by looking through the furfag thread and they all seem like genned on illustrious.
>>
>>107464821
/gif/ has a video thread
/aco/ threads are kinda shit
>>
>>107464821
when anon posts lewdcatbox yes
>>
>>107464676
>>107464837
call me when it's available for comfy
>>
which gguf for flux is the best in terms of lightweight to quality ratio?
>>
>>107464941
Flux 1?
Nunchaku.
>>
>>107464967
do you need to install anything extra to use nunchaku or can I just use it?
>>
File: 1755732081621441.png (1.04 MB, 832x1248)
1.04 MB
1.04 MB PNG
>>
>>107464993
furshit belongs in >>>/trash/
>>
File: 1764824030033989.png (937 KB, 832x1248)
937 KB
937 KB PNG
Damn it lost her with the pose
>>
File: bfsh3_00001_.png (1.44 MB, 1328x904)
1.44 MB
1.44 MB PNG
>>
>>107464982
You need to install comfyui-nunchaku custom node.
>>
>>107465013
The fact that it can't consistently gen one of the most prominent furfag characters that it has almost certainly seen thousands of images of during the training, speaks volumes about Chroma.
>>
>>107463245
I'm coming back to the generating gaem and it looks like all my models were left on the dust (SD v1-5 Chad here). What are the kewl cats using now? What's the best base model? What about inpaint models? Are they still a thing?

SANKIUUUU in advance, my fellow prompterers.
>>
>>107465023
but I'm on forge
>>
File: 1741059702606007.png (1.06 MB, 832x1248)
1.06 MB
1.06 MB PNG
>>107465038
These are Z. Maybe with a longer prompt I can get more hits
>>
>>107465051
You shouldn't be on forge if you don't like missing out on stuff.
But if you insist on remaining use Q8. Maybe nf4 if you are a turbo vramlet.
>>
File: 1742601414619512.png (1.42 MB, 1120x1440)
1.42 MB
1.42 MB PNG
>>107464913
bizarre. I do see it in their sticky. ironic the furry made chroma yet his community seems to have ignored it.
>>
File: ComfyUI_00602_.png (3.9 MB, 1432x2144)
3.9 MB
3.9 MB PNG
New model from the Noob guys: NewBie-image-Exp0.1
>We are thrilled to introduce NewBie-image-Exp0.1, released by NewBieAi-Lab. This model utilizes a brand-new NewBie architecture designed on the foundation of Next-DiT. We have combined Gemma3-4B-it with Jina CLIP v2 to effectively enhance the model's text comprehension capabilities. Additionally, we utilized the FLUX.1-dev 16-channel VAE to provide richer details. The current dataset consists of approximately 12 million images (including the complete Danbooru dataset up to October 2025 and 1/4 of the e621 dataset). Trained on 8x H200 GPUs for 10 epochs (approx. 17,500 H200 hours), it now supports characters and art styles with a mean solo count of 150 on Danbooru. We sincerely thank everyone involved in the testing and training process. Thank you for your support, and we hope the open-source community continues to thrive!
Huggingface model is walled but you can download on Civit: https://civitai.com/models/2197517/newbie-image
Already supports LoRA training too: https://github.com/NewBieAI-Lab/NewbieLoraTrainer
It's only a v0.1 so it's probably not very good in its current state.
>>
>>107465052
Well should have clarified that.
Z-Image can do very few people and characters consistently, unsurprising.
Also this image is weirdly hot and no I am not a furry.
>>
>noob in december of 2025
>>
>>107465083
This isn't the Z-image mode that was promised, but has potential anyways
>>
File: 1758558632682889.png (1.22 MB, 832x1248)
1.22 MB
1.22 MB PNG
Just needed a longer prompt
>>
>>107465092
well. it's fast. maybe a wf that gens and then has qie prompted permanently with "fix the hands"?
>>
>>107465071
People like ease of use.
SDXL can be run comfortably on a decade old mid range GPU, or hell even on phones with a distill lora.
Ease of use triumphs quality (And chroma struggles a lot to do that consistently despite much greater potential than SDXL)
>>
is longcat actually good?
>>
>>107465117
>Ease of use triumphs quality
hence why ZIT easily overtakes Chroma
>>
>>107464939
Funny enough, I just randomly checked leddit again and apparently, there is a comfyui implementation https://github.com/xlite-dev/comfyui-cache-dit

However, its in chinese (use a tranlator firefox addon) and hasnt been updated in 3 months, heh
>>
>>107464678
No problem!
>>
>>107465013
this is good
>>107465052
this is furfaggotry
>>107465090
seek help
>>
>>107465083
I am seeing lots of gibberish text in their examples despite flux vae and a modern enough text encoder. This thing seems very under-trained. I suppose normal for V0.1.
But maybe it will have potential, we will see.
Lumina is needlessly slow for its size though, so I expect it to get overtaken by Z-Image booru tunes if Alibaba doesn't fuck us over.
>>
>smol base model + multiple loras
or
>big base model that understands many concepts
?
>>
>>107465121
nope
>>107465152
I am attracted to the very human body in the middle, the funny head just spices it up. I am not into anthropomorphic animals.
>>
Don't forget to disable prompt logging in comfyUI. it's on by default
>>
>>107465137
>and hasnt been updated in 3 months
either it works so well it doesn't need to be updated or it sucks and nobody bothered to maintain it
>>
>>107465156
you know you can do this shit with noob, right?
>>
I'd rather have animetroons than fucking furfags
>>
>>107465166
100% the latter.
It is very difficult to teach certain concepts as loras, much less combine many of them reliably.
The only argument for the former is that smaller models tend to be easier to inference. But that stops being worth much if you are turning it into slop factory.
>>
>>107465175
Where do you disable it?
>>
>>107465188
Judging by the fact that we are just now hearing about this massive revolutionary beast that provides 7x speed up, it's not hard to guess the answer.
>>
>>107465198
this guy gets it

furfaggots ruin everything
>>
>>107465203
but people like zit more than chroma
>>
>>107465064
I enjoy forge because it's easy to use. I have comfy too but I just use it for the stuff that forge can't do.
Thanks for the help.
>>
>>107465166
100 trillion parameter model trained on all data ever in history distilled into a 4 bit 4 step MoE that fits into 24gb vram and 64gb ram
>>
>>107465044
Someone PLOX guide me!
>>
You can tell someone is a zoomer when they post Elsa and judy Hopps porn
>>
File: eff.png (45 KB, 384x719)
45 KB
45 KB PNG
How do these two differ?
>>
>>107465240
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
>>
>>107465215
Chroma knows a lot things but can't actually reliably gen them in an aesthetically pleasing way.
ZiT doesn't know much but it can gen what it knows consistently.
>>
>>107465090
>Well should have clarified that.
he doesn't need to clarify anything, you just assumed incorrectly and looked like a retard.
>>
>>107465206
you can't anymore. comfyanon removed the option to turn it off
>>
>>107465188
Look at the issues, since its in chinese no one has heard of it, some one is complaining about z kek https://github.com/xlite-dev/comfyui-cache-dit/issues

>>107465212
Hopefully they update it and it actually does what it says. I gotta head to bed but testing this bitch out tomorrow. The basic node apparently is 模型加速 which is "Model acceleration" according to google translate.
>>
>>107465257
where does it save them to?
>>
>>107465193
link me a noob that can do judy so i can goon
>>
File: 1741788798362176.png (1.09 MB, 832x1248)
1.09 MB
1.09 MB PNG
>>107465241
Any Disney or children characters from that time really
>>
how long does wan2.2 usually take to gen videos on a 5080?
>>
>>107465083
They made some sexy architecture choices. Didn't expect this community to pick up on Jina.
>>
>>107465289
bout 3fiddy
>>
>>107465083
>>107465295
uuh when are they making it easy to use? i'm not downloading a whole another comfy for this
>>
>>107465083
>We have combined Gemma3-4B-it
would be better to use the derestricted version, no?
>>
>>107465083
I don't know about the people who made Noob. Is this really by them? Why isn't it uploaded under the account that uploaded Noob?
>>
>>107465316
Probably started training this thing well before that.
>>
File: 00989-590185063.png (1.21 MB, 800x1200)
1.21 MB
1.21 MB PNG
>>107465281
I use Elsa to drain my balls on the regular
>>
looks like there is initial support now
> but the VRAM is high

https://github.com/sooxt98/comfyui_longcat_image
https://github.com/meituan-longcat/LongCat-Image/issues/8
>>
>>107465083
why would i use this instead of illustrious/noob? genuinely asking
>>
>>107465316
For the last time there is no evidence that the text encoders are censored.
If the model can't draw cunts, it is because UNET has never seen enough cunts during training, not because the text encoder doesn't tell it to draw a cunt.
>>
File: 1734727628545752.png (394 KB, 894x665)
394 KB
394 KB PNG
>>
>>107465241
something is wrong when you want to fuck cartoon characters from your childhood

everyone like that is always a cringy weirdo
>>
>>107465357
And before someone acts pedantic I meant censoring for drawing images.
Of course gemma is very heavily censored when you try to chat with it.
>>
>>107465366
at least elsa is supposed to be a hot woman, not a fucking animal like judy hops
>>
File: newbie-0.1-sample-civitai.png (2.71 MB, 1432x2144)
2.71 MB
2.71 MB PNG
>>107465356
clearly almost all new models had more resolution flexibility and better prompt comprehension.

maybe you wouldn't yet use it in this state of training tho
>>
>>107465275
Let us know how it goes
>>
>>107465251
It's becoming a confusing mess of "how did we get caught up in this shit show in the first place?"
>>
File: 00985-2270860513.png (1.08 MB, 800x1200)
1.08 MB
1.08 MB PNG
>>107465366
the artists knew what they were doing when they designed her
>>
>>107465241
when you look at thse fanbases a lot are also "disney adults", retards who are old but just follow the current thing
>>
>>107465374
>better prompt comprehension
no one here has ever taken advantage of better prompt adherence or comprehension since you're all 1girl, standing enjoyers (derogatory)
>>
>>107465371
I mean yeah but there were plenty of hot women in the cartoons of my childhood and I don't feel a particular urge to gen lewds of them

>>107465386
okay I can somewhat understand elsa but it's still weird when you do it too much

when I was using civitai regularly I'd run into accounts genning lewds of far less sexual cartoon characters and those accounts always looked the same
it got to the point where I'd see one gen and I could guess with a very high accuracy that the account would be full of that shit
>>
>>107465400
because it's all noob/illu can do, unfortunately
>>
File: 1760334363835403.jpg (47 KB, 720x803)
47 KB
47 KB JPG
sampling images during any point in training?
training at lower res to test the lora first?
no thanks, i have an instinctive 'feel' the lora will be good and when to stop.
>>
>>107465241
Judy Hopps is enjoyed by furry gooners and Elsa is enjoyed by Disney adult manchildren of all ages.
There are better candidates for a zoomer alarm.
>>
>>107465404
what are you, gay?
>>
>>107465083
Wait a second. Are they saying they trained on top of a model (Next-DiT), or trained from scratch without exposing the model to any real photography and non-booru data?
>>
>>107465251 >>107465381
At least there is progress and there are reasonable options for future progress.

radiance IMO is getting better at it... slowly. but IDK if it'll reach the state where you can prompt characters like on noob/illustrious

z-image obviously has lots of potential if the large dataset finetuners get the base model

qwen is extremely capable but the GPU power it'll take to finetune that one is silly

neta-yume lumina also still is working
>>
>>107465421
nope I just don't have a weird obsession on cartoon characters
>>
>>107465427
anon...
>>
File: 1755774449905619.png (308 KB, 551x987)
308 KB
308 KB PNG
>>107465414
i can already sense how the lora will turn out just by looking at the training data
>>
>>107465404
you are way too concerned and bothered with how much other men want to fuck or gen lewds of their waifus. you're probably a serious hypocrite or just a jealous homosexual

1girl
>>
>>107465430
I'm sorry but you're weird and what you gen is cringe
>>
>>107465425
>Thanks to Neta.art for fine-tuning and open sourcing the Lumina-image-2.0 base model.
>>
>>107465400
this is partly because of limits on 2girls or more things or more complex poses or w/e that are also due to prompting power
>>
>>107465436
waifuing elsa is like waifuing a whore

she's all used up
>>
>>107465437
you're on 4chan
>>
>>107465446
yeah I know that's why I'm not surprised to find your kind here
>>
>>107465439
Alright.
Was that a good model? I wasn't here for it.
>>
>>107465359
It's not a patreon link????
>>
File: 1757599086726275.png (2.37 MB, 1752x1168)
2.37 MB
2.37 MB PNG
>>
File: 1745119594261791.png (522 KB, 853x1000)
522 KB
522 KB PNG
>>107465434
comfyui?
i just read the Q8 weights once and can run realtime inference in my mind's latent space
>>
>>107465466
who's that supposed to be?
>>
>>107465359
>69 minutes of video to tell me how to train a thing
Couldn't this be done in a shorter video?
>>
>>107465455
Quality is alright but it is slow.
>>
>>107465475
No one in particular only the style
>>
>>107465475
1girl
>>
File: 1762545555635797.png (2.38 MB, 1752x1168)
2.38 MB
2.38 MB PNG
>>
File: 1763153227261068.png (547 KB, 500x500)
547 KB
547 KB PNG
>>107465474
weights? i source fly agaric outside my local methyl isocyanate chemical plant, which i then consume before i place myself down in front of a black canvas
>>
File: ComfyUI_temp_qbxgp_00003_.png (1.83 MB, 1040x1440)
1.83 MB
1.83 MB PNG
>>
File: ComfyUI_08961_.png (1.6 MB, 864x1280)
1.6 MB
1.6 MB PNG
>>
File: file.png (81 KB, 500x460)
81 KB
81 KB PNG
>>107465454
>>
>>107465543
>not N. Higgers
tch
>>
File: file.png (206 KB, 781x493)
206 KB
206 KB PNG
>>107465529
inference? i calculate the entire diffusion process by hand and take decades to make my 1girl standing looking at viewer, saves on electricity
>>
>>107465547
it is not full of things I don't like though
>>
the quality of gens in these threads is inversely proportional to the quality of the models released

you must strive to be better genners
>>
File: ComfyUI_09004_.png (1.52 MB, 864x1280)
1.52 MB
1.52 MB PNG
>>107465560
N. Iggers is a different author tho
>>
Small custom node that might be of interest for ZIT:
https://github.com/ChangeTheConstants/SeedVarianceEnhancer

It diversifies outputs.
>>
File: 1752216279704758.png (2.42 MB, 1752x1168)
2.42 MB
2.42 MB PNG
>>
File: 00012.png (1.03 MB, 1464x2008)
1.03 MB
1.03 MB PNG
>>107463245
Trying to transfer an outfit to a picture of a character. Should I use Nano Banana Pro or Qwen Edit?
>>
>>107465632
How do you run nano banana locally? Wrong thread for that. Qwen Edit is pretty good at it tho.
>>
>>107465614
kek
>>
>>107465646
>>107465646
>>107465646
>>107465646
>>
File: Qwen_t2i_bent-b_.png (685 KB, 864x896)
685 KB
685 KB PNG
>>107465539
drop the prompt
>>
>>107465654
what's up with the square pattern
>>
>>107464410
she's ultra wide, alright
>>
>>107464805
why not civit?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.