[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Ingrate Contrarian Dipshittery Edition

Discussion of Free and Open Source Diffusion Models

Prev: >>108027322

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
https://github.com/Haoming02/sd-webui-forge-classic/issues/671 He noticed
>>
>>108028569
>>Maintain Thread Quality
>https://rentry.org/debo
>https://rentry.org/animanon
we know you are mentally ill but could you stop including your schizo nonsense in the OP?
>>
Blessed thread of frenship
>>
>oh, a new ldg thread, surely the schizo op listened and included my wrapper!
>ctrl-f ani
>THERE IT IS HAHAHAHA IM IN THE O-
>Anima
ACKKKKKKKKKKKKKKKKKKKKKKKKK
>>
File: ComfyUI_temp_sgxha_00016_.jpg (312 KB, 2560x1440)
312 KB
312 KB JPG
Interesting, large resolutions seems to work.
>>
>>108028599
>ACKKKKKKKKKKKKKKKKKKKKKKKKK
thank god you are finally hanging yourself troon
>>
File: spam.png (1.17 MB, 1152x864)
1.17 MB
1.17 MB PNG
>>108028599
>it's a comfy collab
>if i spam fud in the general comfy will go bankrupt
>if comfy goes bankrupt people will buy commercial licenses for my wrapper
>>
>>108028627
what exactly is working? it looks terrible
>>
>Z-image
What went wrong?
>>
File: kj7il.png (3.57 MB, 2048x1152)
3.57 MB
3.57 MB PNG
Schizophrenia
>>
>score_9

Ew, proper sloppa tag.

>>108028635
Overall the characters aren't broken.
>>
>e621 is required for a kino anime model
sad. ill wait a bit longer then.
>>
File: anima_output_121414.png (1.14 MB, 832x1216)
1.14 MB
1.14 MB PNG
>>
File: Anima_00088_.png (637 KB, 896x1152)
637 KB
637 KB PNG
>>108028108
0.6B bf16 is 1.2gb, which is roughly the size of clip_l and clip_g combined.
Maybe some meaning as to be a true SDXL successor or some shit.
>>
>>108028721
What? You can prompt anima just fine with danbooru tags or NLP.
>>
>>108028721
e621 has never made anime models better
>>
>>108028735
danbooru isn't enough. i loathe 90% of the posts on e621 but somehow it gives models sovl.
>>
File: 877844554877.jpg (512 KB, 2138x1209)
512 KB
512 KB JPG
The model truly is impressive because it's more refined in certain aspects, but in my tests it still is behind Newbie (and I'm guessing NetaYume as well) at prompt following (though it's better with text and results are very aesthetic). Newbie still has a better understanding of raw artistic style control.

Here's the prompt I gave Anima (re-formatted from the Newbie XML version):

>masterpiece, best quality, score_9, year 2025, highres, safe, 1girl, 2b_(nier:automata), nier:automata, painterly, impressionism, brushstrokes, An artistic, monochrome black-and-white illustration of 2B from NieR:Automata sitting at a restaurant table. The style is a unique blend of detailed manga linework and painterly impressionism, featuring thick, visible brushstrokes and impasto textures. 2B has her signature short white hair and black headband, leaning one hand against her chin while her other hand gently pets a cat lounging on the table beside her. In the foreground, a wine bottle and a half-filled wine glass sit next to a plate of food. The background consists of blurred restaurant windows and shelves, rendered with soft, atmospheric strokes that contrast with the sharp, rhythmic hatching of the character's clothing and the cat's fur.
>>
>>108028721
just put only score_9 and you will instantly get your ponyv6 sepia sovl
>>
2026 and we still gotta do that "\" shit before a parenthesis if there's a parenthesis in the series or artist name... fuck...
>>
>>108028735
yeah it handles either much like NetaYume, but just is mostly way more stable than NetaYume so far. And faster
>>
>>108028748
yeah in my 10 minutes of testing i couldn't really get it to do painterly stuff at all sadly.
>>
Soul.
>>
>>108028748
That kinda stuff looks better on newbie because newbie can only do that kinda stuff. Almost all gens on it looks smeary which just kinda works out in this scenario
>>
So uh... what megapixel size can I gen at without getting these gay ass
 ,W (188, 125) should be divisible by spatial_patch_size 2 
errors
>>
>>108028731
Even then, the 1.7B model seems like a better sweetspot
>>
>>108028772
writing this down for the retrain/re-license
>>
>>108028680
Outdated mediocre model. In 2024-2025 it would be good, but in 2026 it's meh.
>>
>>108028680
Chinese eyes too small to see BRC (Big Russell Cock) coming their way
>>
Damn it, I have to go to the gym.

I need to set up a way to gen on my pc through my phone.
>>
>>108028770
wat
>>
File: Anima_00089_.png (566 KB, 896x1152)
566 KB
566 KB PNG
>>108028772
Obviously large te would have been better. 1.7B is also still vramlet friendly, can be run under 4GB vram fine.
But still, 0.6B one is working surprisingly good for its size.
>>
>>108028791
it puts in perspective how outdated 12B T5-XXL is
>>
>>108028770
Anima? 0.5m to 1m, it's been mostly trained on 512 pixels so far
Any other model? 1m to 2m
>>
>>108028794
what the fuck even are those errors he's getting though, like where / how is that a thing
>>
>>108028770
125 is indeed not divisible by 2 anon
>>
>tdrussel doesn't have a discord server

I don't think you guys understand just how much we have won here. There won't be any sabotaging by duplicitous furries.

Every time some promising model comes around, furries rock up just to poison the dataset or training. I've heard of some even lobbying for synthetic (slop) datasets, or for the addition of low-quality niche fetish content. I swear they do this intentionally too.
>>
File: 4545564456451.jpg (355 KB, 2138x1209)
355 KB
355 KB JPG
>>108028748
Actually that's cut off kek
https://files.catbox.moe/sl91it.png

Here's another one, Newbie understood this prompt a bit more since it needed to split it into panels
https://files.catbox.moe/7stv4h.png
>>
>>108028805
i don't think that's a thing, Chroma is actually LESS similar to Fluffyrock than I expected, overall. Anyone who actually fell for the "muh Chroma anime" or "muh Chroma realism" meme was always retarded
>>
>>108028770
What latent size are you using? I realized that going to 1532 or beyond causes the same error. If you want a good aspect ratio, decrease the lower number instead (e.g. 1280x720)
>>
>>108028808
I'll wait for Anima lora or any finetune, as the default art style feels too “AI” for my taste.
>>
>>108028819
>muh fynetoonz
every time
>>
bloatmodel fetishists getting BTFO hard lately
>>
>>108028786
Look into Tailscale, and if you want to self host your own server Headscale is what you want.
>>
>>108028826
acckkk
>>
File: 1743112217405131.png (1.25 MB, 832x1248)
1.25 MB
1.25 MB PNG
>>
>omg it's small and fast!
dalits self-owning lately so much lol
>>
>>108028833
what's a dalit
>>
>>108028816
Chroma is different because the creator was too autistic and determined to be sabotaged by others.

It was also different because up until that point, all large scale finetunes were ruined by stupid decisions to cater to furries, or models were shadowdropped.

Anyone remember when noob was training? It had so much potential until some dickhead added a differently captioned non quality-filtered dataset full of furry shit from e621 and pushed for the text encoder to be finetuned too. This caused the model to freak out so hard that it couldn't do basic human anatomy without filling the negative prompt with every furry tag imaginable.
>>
weird, my nano banana pro generates images at 4k in under 10 seconds. maybe something is wrong with your comfyui workflow?
>>
>>108028584
based and Haomingd
>>
Now that we have ramtorch and similar systems local models should be ~32B parameters, there is literally no reason to use toy models. And yes I know some thirdie will think this post is insincere
>>
>>108028738
never used noobvpred?
>>
>>108028824
yes, waiting~
>>
>>108028848
i'll ram my torch up your bum if you catch my drift
>>
>another troonjak rentries thread
>another failed garbage troon thread
why are anons so fucking retarded? why do they keep falling for ran bullshit?
>>
File: i2iUpscale_00003_ (3).jpg (2.4 MB, 3840x2144)
2.4 MB
2.4 MB JPG
>upscaling with zit
Kino.

>>108028828
Thanks.
>>
>>108028845
how many ((((((((komfybucks)))))))) does it cost tho
>>
>>108028855
they don't really care the board has a shit skidmarks of unfilled /ldg/ threads. it's rude
>>
File: Anima_00098_.png (1.04 MB, 832x1216)
1.04 MB
1.04 MB PNG
>>108028819
Just use @artist tag.
You can give generic style descriptions as NLP too. It works
>>108028826
Kek so true.
>my ten thousand dollar RTX 6000 pro purchase will give me superior gens over those filthy vraml-AAAAAACCCCCCKKKKKK!
>>
>>108028848
genuinely, why not just trained big models and distill them properly? z-base is dogshit compared to z-turbo but when z-turbo released everyone celebrate it as the greatest thing ever. so clearly distills give you faster speed and better quality, so why not just train something big and good like qwen 2512 and distill that
>>
File: 1505741832521.jpg (26 KB, 293x251)
26 KB
26 KB JPG
>distills
>better quality
>>
>>108028860
anons really are gullible as fuck. they don't stand for what they believe in (if they even believe in anything). they want cozy breads but don't do anything to achieve them
>>
anon shouldve asked dtrussell why he chose to include pony style score tags i fucking hate them so much
>>
>>108028863
cause you still have to load the model into memory. Qwen 2512 could be better also, it's not even on Qwen 3 for the TE
>>
>>108028870
oh god i'm scooooooring
8 uppppaahahaadsdsgsgf
>>
>>108028862
>@artist tag
aighty, imma try em, thanks!
>>
>>108028872
>cause you still have to load the model into memory
ok but did anyone who wasn't brown have an issue doing that with z-image?
>>
File: file.png (1.41 MB, 1024x1024)
1.41 MB
1.41 MB PNG
>>108028837
This guy is a dalit, you may also hear him called a brahman sometimes but I assure you they are the same thing.
>>
>>108028870
the censor tags are bloated too and he says they are required sometimes since the model will produce non euclidean slop without them
>>
>>108028855
>>108028860
>>108028868
Do you think anon will fall for your posts this time
>>
>>108028879
no but qwen way chungser than Z
>>
>>108028837
Another word for indian
>>
File: 1744686988958542.png (1.84 MB, 832x1248)
1.84 MB
1.84 MB PNG
>>
>>108028886
i can gen pretty decently on a 4090 with fp8, 24gb is the minimum anyway and a qwen turbo would be insane. lets leave those poorfags in the dust together, just you and me
>>
>>108028855
Kill ani
>>
File: 4545456465456.jpg (311 KB, 2138x1209)
311 KB
311 KB JPG
>>108028808
The fridge POV results are much better than what I could get with Newbie on average though in terms of prompt adherence and overall polish
https://files.catbox.moe/ucp7u9.png

I found this model has a way better representation of objects and backgrounds in certain prompts (though not all).
>>
File: 1745142563531500.png (1.8 MB, 832x1248)
1.8 MB
1.8 MB PNG
>>108028833
>>108028882
only jeets care to differentiate jeets
>>
>>108028896
isn't this post is against united states law?
>>
>>108028903
where is the noob/ill comparison? nobody actually uses newbie or neta
>>
>>108028908
That's why miscasteing makes them seethe so much :)
>>
>>108028911
It is. Ran probably doesn't care because he's spamming threads from his proxies.
>>
>>108028794
>>108028817
so im just not able to gen at a higher res than 1mp? not even upscale and a second pass?
>>
>>108028915
how would you know what makes indians seethe? sus
>>
>>108028911
Probably
Kill ani
>>
>>108028922
damage control? big izzat loss!
>>
File: z-image_00017_.png (944 KB, 1024x1024)
944 KB
944 KB PNG
i don't know what z-image has against reimu but my test booru finetune so far uniquely renders her this way lol
tbf 1:1 aspect is relatively underrepresented in the dataset so maybe that's the issue. i should try a tall aspect
z-image is godly though. training on a wide dataset of booru images, it produces images that look like real drawings. anima is too much of a sidegrade from SDXL
>>108028731
i spent a while last year attempting to distillation train Qwen3 0.6B on T5-xxl as an experiment. i have a lot of output chroma images with Qwen3 0.6B. i don't think it's capable of matching T5-xxl but it's absolutely capable of producing coherent semi-prompt-following images. I wish I'd tried Qwen3 4B because i'm pretty sure it would be able to work as an alternative... 1.7B maybe too, I tried it once and it was doing an okay job.
>>
File: 1765311608330200.png (1.75 MB, 832x1248)
1.75 MB
1.75 MB PNG
>>
File: z-image_00208_.png (1.79 MB, 944x1280)
1.79 MB
1.79 MB PNG
>>
File: q3_20250927164457.png (567 KB, 512x512)
567 KB
567 KB PNG
>>108028937 (Me)
Actually I'm going to post some of these bad Qwen 0.6B + Chroma gens
>>
File: file.png (747 KB, 1024x1024)
747 KB
747 KB PNG
>>108028908 >>108028915 >>108028922 >>108028926
Call a jeet Pakistani sometime they'll freak out.
>>
File: 1752437772713222.png (772 KB, 832x1248)
772 KB
772 KB PNG
>>108028937
>anima is too much of a sidegrade from SDXL
i think till come down to what happens first; tongyi releasing the turbo sauce / anon figuring it out OR anima being "finished". the comparatively slower speed of base seems to be a determinant for many. and "sdxl anime but with a 16ch vae" is what anons wanted for a long time.
>>
What's the token limit of the 0.6B qwen? Still 8k or smaller?
>>
File: 1766883730315627.png (940 KB, 1024x1024)
940 KB
940 KB PNG
>>108028808
Seems to work if you start the prompt with "it's a 2 panel manga"
>>
File: s_mask_2_2e4.png (384 KB, 512x512)
384 KB
384 KB PNG
this is the power of chroma
>>
File: Anima_00099_.png (1.24 MB, 832x1216)
1.24 MB
1.24 MB PNG
This model sometimes decides that there has two be a second person or another viewing angle in your prompt and really fights you for it lol. Need to experiment more for a consistent workaround.
>>108028937
>i spent a while last year attempting to distillation train Qwen3 0.6B on T5-xxl as an experiment.
Interesting. I know this is a long shot but would you mind sharing training code? I am curious about how that's done.
>i don't think it's capable of matching T5-xxl
So far I haven't run into anything it consistently can't do that t5 excelled at. There are certain "world knowledge" limitations with small models, but it's also so much newer and better "per weight", for the lack of better word.
>>
>>108028632
literally why are you so obsessed? is it impossible for different people to be angry with "comfy" org? is it some retarded shill tactic to call anyone against "comfy" ani?
>>
>>108028963
>This model sometimes decides that there has two be a second person or another viewing angle in your prompt and really fights you for it lol. Need to experiment more for a consistent workaround.
The model inherited a lot of sdxl-style nonsense.
>>
>>108028964
Yes. That's the typical cumfart damage control. Before that it was "voldy shills".
>>
File: 5644564564565.jpg (543 KB, 2138x1209)
543 KB
543 KB JPG
>>108028903
It does quite well with giantess but it's one area where Newb still edges it out a bit

https://files.catbox.moe/gr45j8.png

Another comparison (more difficult so both are unpolished, but Newbie has the basic composition a bit more correct just objects/streets/people less refined than Anima)
https://files.catbox.moe/mdk00i.jpg
https://files.catbox.moe/o765mf.png

These are both models with great potential once they're both fully trained. Rooms for improvements? Anima could use a bit more painterly style knowledge to let the DiT shine.
>>
>>108028956
You are also forgetting modern text encoder that can handle complex natural language and rectified flow that anima brings over sdxl.
>>
File: 1768852714319145.jpg (1.62 MB, 2048x2048)
1.62 MB
1.62 MB JPG
>comfy is getting objectively more bloated, unstable and insecure with every new release
>more and more new saascuck shit added all the time, new api shit, new comfy coin bullshit
>You're definitely shilling some other interface if you're against all of that though!!
>>
>>108028937
>z-image is godly though. training on a wide dataset of booru images, it produces images that look like real drawings
Unless there is a turbo I really don't care, Waiting that long for images is a big killer for me. No point in this stuff anymore at that point
>>
File: ComfyUI_00578_.png (763 KB, 1024x1024)
763 KB
763 KB PNG
>>108028960
Nice, but she should be interacting with the cat on the left panel directly, should be a bit more like this (first Newbie gen I got for the prompt)
>>
>>108028987
but we certainly don't need more efficient uis. you're crazy to suggest that
>>
>>108028956
Yeah but those of us finetuning z-image base are going to do step distillation one way or another. AFAIK the thing missing from our knowledge is the RL model they used. And i'll be honest, I think base is still better than any other local model and definitely any local model with a permissive license. But we'll find a way to match the RL model. I think the architecture itself is the important part. Which is why finetuning still produces real-looking images
Re: the image - this is an image you get when you start distillation training the Qwen3 0.6B text encoder for Chroma
>>
File: z-image_00211_.png (1.4 MB, 944x1280)
1.4 MB
1.4 MB PNG
>>
>>108028993
relax. cumfart is all there is and will be forever
>>
>>108028848
> ramtorch
Does it work on AMD or Intel cards?
>>
>>108028993
Where did UI come into this? I am sure he will support for anima on forge neo if that's your thing
>>
>>108028855
What can we do?
>>
File: 1753352763518810.png (1.72 MB, 832x1248)
1.72 MB
1.72 MB PNG
>>
>>108028998
It's just a memory management strategy, it shouldn't require any special hardware
>>
>>108028998
why would you care about poothon garbage
>>
File: t4_12000_t_mask.png (491 KB, 512x512)
491 KB
491 KB PNG
Here's the Qwen 0.6B Chroma image that proves you can smash precise text recognition into a tiny model
>>
>>108028999
all python uis will be memory hogs forever by definition
>>
File: 1764576137578245.png (1.86 MB, 832x1248)
1.86 MB
1.86 MB PNG
>>
>>108028999
His bot broke and replied to wrong post.
>>
alright I trained f2k and zim on the same dataset of a person with very similar settings and while both produce solid results i think f2k takes the cake simply because you can use the lora on the distill without any visible degradation, while zit completely shits the bed with a zim character lora.
also f2k seems to properly learn details like moles while zim completely ignores them. i like that zim can use negative prompts but i guess you can also do that with f2k with some workaround
zim takes about a minute for 30 steps on my 5070ti while the f2k distill took 20 seconds for 8 steps (both res_2s)
>>
>>108029014
>you're a bot if you don't like comfyui
Maybe you are a bot?
>>
>>108029018
Looking at the image comparison you posted makes me agree with you wholeheartedly
>>
File: 1744649646463269.png (1.67 MB, 832x1248)
1.67 MB
1.67 MB PNG
>>
File: 1755959921656426.png (1.44 MB, 832x1248)
1.44 MB
1.44 MB PNG
>>
>>108028998
>>108029006
No, it's not
> A memory-efficient linear layer implementation that keeps parameters on CPU
> and transfers them to GPU on-demand using asynchronous CUDA streams.
>
> This approach interleave compute and data transfer, making it useful for:
> - Very large models that don't fit in GPU memory
> - Scenarios where GPU memory is limited but CPU memory is abundant

>>108029007
Because AI, especially training, is Python.
>>
>>108028913
>where is the noob/ill comparison
I didn't make one because I don't test prompt following on noob/ill but it'll probably be utter humiliation because CLIP doesn't follow prompts as well as any of the newer text encoders.
>>
>>108028950
>>108028962
>>108028994
>>108029009
Not gonna spam you guys anymore but anyway, discarding the whole 0.6B Qwen Chroma thing for a second, all the z-image nonbelievers are going to feel quite stupid soon. A booru model trained on this is going to be insane. It's literally going to be print-your-own-booru-image. Luddos will scream & cry trying to identify the image as synthetic
>>
File: 1750459231438837.png (1.72 MB, 832x1248)
1.72 MB
1.72 MB PNG
yeah this killed illust
>>
File: jgt6.png (2.14 MB, 2048x2048)
2.14 MB
2.14 MB PNG
>>
>>108029045
You just described a memory management strategy. It's literally just layer offloading but in a way that doesn't slow down inference speed
>>
>>108029075
btw can i just say, thank fuck for musubi tuner and its dev. every other training project is shit
>>
>>108028828
>not just setting up openvpn or wireguard
>implement PROPRIETARY solution instead
lol, bunch of literal retards
>>
>>108029066
>Luddos will scream & cry trying to identify the image as synthetic
I don't get this kind of childish spite desu, why do people keep trying to shove AI art into the face of people who don't wanna see it, this only creates hostility, I don't pretend my gens are hand drawn, what do these people get out of it
>>
File: 1753878176446491.png (2.45 MB, 3138x768)
2.45 MB
2.45 MB PNG
>>108029018
>>108029030
i'm still experimenting with zit because its really stubborn. made some improvements, so now it doesnt *completely* shit the bed but it's still pretty uncanny to me
>>
File: ihu7.png (1.82 MB, 2048x2048)
1.82 MB
1.82 MB PNG
>>
File: 1762954427053381.png (40 KB, 225x225)
40 KB
40 KB PNG
>>108028986
>>
>>108029018
klein has a turbo lora available so that you can control how much distilled is in the base gen. Why isn't there an equivalent for z-image?
>>
How could i hate api nodes when theyre just so damn good?
>>
>>108029085
Tailscale is just Wireguard without having to open ports or fuck with config files, the client is open source and if you want to self host Headscale is an open source server (you will have to open ports for that or faff around to run it over a tor hidden service). Also the Wireguard data never goes through their servers they just coordinate clients without opening ports/behind cgnat.
You clearly have no clue what you're talking about. https://tailscale.com/blog/how-tailscale-works go have a read, if you're literate and capable of silencing your schizophrenia demons long enough.
>>
>>108029018
Yeah, that's Flux.2's VAE in action for those closeup details, plus I'm sure BFL did extra tuning to ensure it's not slopped and the colors are accurate this time around, unlike Z where colors feel washed.
>>
>>108029149
you're still relying on a relay/coordination server outside of your control to do the discovery phase, and that's 100% closed source
tailscale is NOT local, fuck off shill
>>
>>108029120
>gretafag
Oh you should've said that initially so I could more easily disregard your opinion
>>
>>108029136
Just wait a few days until someone inevitably makes it or create the diff lora on your own.
It's not difficult.
>>
>>108029173
eh, i just needed a character to experiment with and its not like we need yet another sydney sweeney or scarlett johansson lora so whatever
>>
Which model can cope better with shitty datasets? O tried some old datasets that were small and/or blurry on ZIT and the resulting loras were pretty impressive given what it had to work with. How do ZIM and the kleins fare in this regard?
>>
>>108028850
They are just trolling/retarded, anyone with even a remote speck of understanding of this garbage knows that including e621 would colossally improve the model's capabilities, not only because e621 tagging is far superior its just more datapoints, even r34 is a good source in this regard, but when you say that the only thing a retard sees is "furry = bad"; that being said including e621 would also greatly increase training costs, which is probably why they never include it.
>>
File: file.png (422 KB, 1024x768)
422 KB
422 KB PNG
>>108029161
I self host my Headscale server and mentioned Anon could do the same, it absolutely is local and 100% open source and enables me and potentially him to access local and open source diffusion models running on our own hardware while outside of our home networks.
>>
File: 4830789756.png (273 KB, 768x768)
273 KB
273 KB PNG
>>108029205
I got banned for racism once for posting a image like that
>>
File: ComfyUI_09370.png (3.03 MB, 1440x2160)
3.03 MB
3.03 MB PNG
>>108029120
My ZIM LoRa on ZIT needs 1.75 strength before I'd consider it usable and 2.0 performs better when the face gets smaller or has to change direction. The overall performance though is a lot better the my ZIT-trained one (less occurrences of body horror and other little things a LoRa can introduce.).

Z-Image using Flux.1's VAE also means you can use one of the EQ-VAEs out there for even better quality (those helped out big time on Flux). I haven't trained one with EQ yet, but that's on my to-do list.
>>
>>108029214
Some jannies also don't like Hitler gens
>>
>>108029204
I would say that a small model might degrade by massive influx of furry slop, when there is very limited amount of weights to hold data and you are going to get contention at some point.
It's possible that noob for example failed to learn some characters and styles it otherwise could have learned due to furry data. Though again most likely superior tagging and additional data helped more than hurt it.
Furshit is fine as long as it is tagged out clearly.
BUT I don't mind models not touching it neither in the age of NLP. Anima can survive without e621 tags.
>>
>seaart pruned the loras of my waifu
sigh, is there anywhere to find loras of actresses/celebrities these days?
>>
>>108029232
Trained by yourself on your own hard drive. It's the only way.
>>
>>108029232
Probably on Chinese sites since cheeto hitler made deepfakes illegal in burger land.
>>
>>108029249
>>108029250
that's unfortunate. thanks fellas
>>
>>108029232
depends on the base model. check out /r/ realistic parody OP
i think there is a lot of zit
>>
>>108029120
>zit would just produce sameface
Time to get zim up for some hot Greta gooning
>>
>>108029197
Use edit model to unblur the images first.
>>
File: Anima_00114_.png (763 KB, 832x1216)
763 KB
763 KB PNG
>>108029214
Whatever helps them feel like a woman, amirate?
>>108029232
huggingface but they get periodically jannied.
Is civarchive still around? Maybe there too.
But training your own is the best.
Most lora trainers are jeets who can't train anything even remotely passable.
>>
>>108029197
>How do ZIM and the kleins fare in this regard?
absolutely unforgiving imo
i ((enhanced)) my datasets with klein 9b
something along the lines of "remove artifacts and make high quality"
just be careful it doesnt change too much
>>
>>108029305
kek also thats a nice style anon
>>
>>108029250
What do chinks use? Does modelscope have loras? Tensorart and Seaart are both just as dogshit as civit.
>>
>>108029314
Something unpronounceable I'm sure.
>>
>>108029298
Yeah I know but I'm just interested, tells you something about a models learning ability, and sometimes unblurring chsnges the image too much or introduces unwanted things etc
>>
File: 1747105793653478.png (2.41 MB, 1088x1632)
2.41 MB
2.41 MB PNG
>>
https://www.modelscope.cn/models?name=z-image&page=1&tabKey=other&tags=LoRA
Oh they have. But the filtering is fuckign ass
>>
>>108029075
That's from ramtorch sources and that's implementation of the strategy using cuda.
>>
>>108029323
Catbox please
>>
>>108029225
Danbooru is nowhere near close enough data to saturate that model, doubt they have a anime dataset large enough to, it's just much much cheaper (because money is always the constrain with this stuff) since its less stuff to learn.
>>
File: 1767669869294236.png (2.46 MB, 1088x1632)
2.46 MB
2.46 MB PNG
>>108029343
still learning the model https://files.catbox.moe/n5vvfd.png
>>
File: 1753954352906631.png (2.92 MB, 1088x1632)
2.92 MB
2.92 MB PNG
>>
>>108029204
Actual bullshit, there is no evidence e621 data made it good and way more evidence that it made it worse, to this the vpred Illustrious model that was never released was way better than noob vpred as was NAIV3 which is still the best XL anime model. Both don't have a lick of e621 in it.
>>
>>108029361
Thanks.
Interesting to see so many style stuff in the negatives. Also having TT in the negatives while prompting raven is certainly a choice.
And ughh can't say I am a fan of trannies neither but I don't think schizo prompts about trans stuff in the negatives are helpful.
Just add futanari if you are afraid of the model accidentally genning dick girls.
>>
File: file.png (132 KB, 1385x1203)
132 KB
132 KB PNG
>>108028964
>is it impossible for different people to be angry with "comfy" org?
>>
>>108029393
>the vpred Illustrious model that was never released was way better than noob vpred
Yes because the only difference between the two was the dataset and not any hyperparameter amirite?
>as was NAIV3 which is still the best XL anime model
Now that's a real argument, but again, how do we know NAIV3 wouldn't have been better if it included e621? Because I'm pretty sure the difference between it and the other models is not merely dataset.
>>
>>108029306
I assume 9b is better at preserving things like facial features or overall coherence than 4b right? I upscaled things with 4b and it often introduced a shiny slopped look to skin, is 9b better here?
>>
File: 642.png (1.02 MB, 768x1344)
1.02 MB
1.02 MB PNG
gm ai sisters
>>
File: 1768643812126900.png (2.93 MB, 1088x1632)
2.93 MB
2.93 MB PNG
>>108029399
>Also having TT in the negatives while prompting raven is certainly a choice.
its an old trick from nai/illust/noob to rid characters of their canon style. im not editing any of these old noob prompts save for the primer
>I don't think schizo prompts about trans stuff in the negatives are helpful.
they are basically required for naked noob which is where these prompts are from
>>
>>108029407
Illustrious creator himself blamed e621 dataset on why noob was worse.
For NAI they did train an e621 model later on which was not that well liked if I remember. There is no way the didn't try what you said but saw the results weren't worth it which is why they made it separate.
>>
>>108029425
Yeah that makes sense.
I had a long schizo list of negatives for noob too. (Although I switched to a more concise one later on)
>>
>>108029189
find some hotty
it's like the anons endlessly experimenting with epstein or trump, at least have the grace of playing with nice girls instead of this genuinely mind rotting stuff
>>
>>108029430
makes sense, hopes up that cumfy's dataset is up to match any resemblance of a quality model
>>
>Multiple threads-long meltdown over an anime booru tag prompting model
Was it worth it?
>>
>>108029430
>Illustrious creator himself blamed e621 dataset on why noob was worse.
The illust creator was seething at Noob too? I should've guessed fucking keeeekkkkkkkkkk
>>
>>108029328
>nothing nsfw
I guess they block it all? Unless it's behind login
>>
>>108029446
Don't they have their own civit clone? NSFW will be probably buried like on HF. Only for people in the know.
>>
Anons playing with video models, is ltx2 finally getting rid of its awful sound quality and random unmoving photo gens?
And is nsfw finally working without looking ridiculously bad?
>>
>>108029413
>is 9b better here?
honestly i felt it was pretty good, yeah. just try it yourself. sometimes it introduced wrinkles that were not there before, but you can always reroll
>>
>>108029461
No and not really. It will be some time before it catches up to 2.2, and I expect wan 3 will be a thing by then.
>>
File: juri han anima 2.png (628 KB, 832x1216)
628 KB
628 KB PNG
Haven't seen this many watermark/text hallucinations in the bottom for a really long while. And text, watermark are in the negatives too. (Though it is a lot more coherent than when sdxl hallucinates them, you can really feel the qwen vae.)
>>
>>108029461
Sorta kinda, it will honestly need that vae+model update that they have planned to really fix everything but they released a new sampler and some settings you can fingle around with to improve quality. Model has potential but consider this one a beta and just fuck around with it
>>
>>108029489
Is this with watermarks in negatives?
>>
ltx-2 is so fun (default i2v template in comfy, set frames to 240 or 10s)

https://files.catbox.moe/6fivxj.mp4
>>
>>108029535
"with an american style accent":

https://files.catbox.moe/4h0cyf.mp4
>>
>>108029535
How come nobody has taken this whole thing to refer to him as Dorito pope again?
>>
>>108029529
Yes but I think it misunderstood prompt all together. The "You are an AI assistant..." stuff started appear on other seeds at top. It's kinda wild how it interprets the prompt sometimes.
>>
>>108029551
>You are an AI assistant
Nigga why are you using this shit. It's a text encoder.
>>
>>108029551
Anima doesn't need that part only the lumina models
>>
>flash attention + torch compile + klein 9b
why doesn't it work?
>>
>>108029489
you can have a 2nd pass with klein and something like "Remove watermarks and text"
>>
>>108029549
everyone thinks he is shill man now.

also there is a ltx2 video extend workflow, which can clone voices or movements, pretty neat imo

https://huggingface.co/RuneXX/LTX-2-Workflows/tree/main
>>
>>108029562
did you remember to set the CUDA_ENABLE_SPEEDHACKS=1 environmental variable?
>>
should i directly train on multiple resolutions or start with 512 for a couple thousand steps and then move to something higher? assuming the use of buckets of course
>>
>>108029555
Not out of necessity, but to improve quality:
You are an assistant designed to produce aesthetically pleasing, high quality images based on user prompts. <Prompt Start>
>>108029557
No not need, but from limited testing (admittedly needs more) it seemed to improve quality to me. This is the first problem I have run into it with it, after more than hundred gens. Might be worth it to keep it if it causes problems with prompts very occasionally.
>>108029568
I mean I can also just cut images too.
But good idea if watermark is too big and destructive to remove by cutting.
>>
>>108029569
lmao

it worked well with the geoff clip, video extend workflow: do skip first frames setting to pick a good start spot, then frame load cap 49 (or whatever) to pick a good end point.

https://files.catbox.moe/d5o0y8.mp4
>>
>>108029415
>zomg! faggotry guys!!111
>>
>>108029578
Just realized how many extra its are there lol.
>>
>>108029578
>it seemed to improve quality to me
Because it just crunched the numbers. It's the same as the padding slop for ZiT. If you want system prompts, you'd need to use a direct LLM loader that supports it and then turn the output into conditioning.
>>
File: 9187229.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>>108029562
I asked Klein and he said picrel
>>
>>108029562
FA doesn't work with imagegen. Torch compile is pure ass and you'll have to recompile anytime you change prompt or add lora or change lora strength; wasting any time you could've saved anyway. Use -fast fp16_accumulation if oyu want speedhacks.
>>
>>108029580
https://huggingface.co/RuneXX/LTX-2-Workflows/blob/main/LTX-2%20-%20V2V%20(extend%20any%20video).json

this is a very good workflow, the other one works but this is more refined imo
>>
File: 9731595259.png (1.67 MB, 880x1536)
1.67 MB
1.67 MB PNG
>>108029581
Really glad you liked it anon
>>
>>108029651
>xhe only posts gens
>xhe doesn't actually sell the figurines that'd sell very well
>>
>>108029647
for example, doritos pope with the workflow (if you use distilled ltx bypass the lora below the model loader)

10s extend of the first part: it also has nodes for smoothing out the audio.

https://files.catbox.moe/kmsa2l.mp4
>>
>>108029578
>>108029591
Actually despite removing it the problem persisted in some other gens.
So it seems to be something else.
>>
File: Výstřižek.png (43 KB, 1058x589)
43 KB
43 KB PNG
Tried a ZiB lora (on fp8)
Barely any change in output pic. Do I let it cook for more epochs or bump up the LR even more?
>>
144 seconds to extend/make a 10s clip on a 4080, ltx2 q8 distilled. this model is meme magic.

https://files.catbox.moe/bljr06.mp4
>>
>>108029699
i had to train twice as much compared to my character lora i trained on zit
>>
GEOFF IS BREAKING THE CONDITIONING

https://files.catbox.moe/wu3xj3.mp4
>>
>>108029742
More epochs or LR?
>>
>>108029748
I've doubled the epochs and it was still only able to achieve %70~ likeness, but that also might be my sampling settings. Gonna try tripling it and see what happens
>>
NovaAnimeXL 15 description
>I think this will be the last version before Z-Image version drops
NovaAnimeXL 16 description
>Z-Image Base came out last month so I guess I'll switch the base model into it after someone create Z-Image Illustrious model. Nobody knows whether this is the last version or not but I'm looking forward for newer structures
Why did I expect anything from that retard
>>
>>108029443
Anime booru tag prompting models are literally the only thing of value to come out of AI
>>
>>108029762
skill issue
>>
how does z have more realistic gen than flux 2? is it an architectural difference?
>>
>>108029773
Flux always was shit however Zit is very rigid. Unfortunately Zib is not a replacement so that's the trade off you have. If you want more flexibility you are generally forced to use Flux.
>>
>>108029773
it all boils down to the dataset they used for rlhf https://arxiv.org/abs/2512.11883
>>
>>108029773
It doesn't.
>>
>>108029763
I can see what you're prompting right in this thread buddy and I don't see a lot of skill
>>
>>108029549
>this whole thing
you mean the thing that came out a year and a half ago?
>>
File: ComfyUI_09412.png (3.25 MB, 1440x2160)
3.25 MB
3.25 MB PNG
>>108029773
The excessive quantization to get it to run on consumer hardware certainly isn't helping.
>>
>>108029651
I didn't one bit. I think you should have your own board.
>>
Is there a way to leverage VRAM from 2 NVIDIA GPUs for image generation in SDXL-like models (Illustrous/Pony for example) on WebUI Forge?

I got a 3060 Ti and a 5070 at hand and I'm wondering if I could use one for something like inference only and the other to load weights separately.
>>
>>108029824
How much quantization? A 24GB or 32GB GPU isn't enough?
>>
>>108029834
absolutely not
>>
>>108029699
DO NOT USE WARMUP STEPS WITH PRODIGY
Prodigy has its own warmup like logic. It messes with it.
>>
>>108029839
>>
>>108029834
Nope. Extremely difficult to implement under normal circumstances and basically impossible with your dinky little forge UI.
>>
>>108029839
no you can run full flux2 with a 3090 (if you have a shit ton of ram (like me ;)))
>>
>>108029699
>01/02/2026
>thinks epochs convey any meaningful information
oh well good luck on your trials and errors
>>
>>108029854
Damn. I passed on buying a 3090 to pair with my 96GB of RAM when I had the chance for a reasonable price, so I have been condemned to use a 12GB GPU as punishment.
>>
https://files.catbox.moe/x7u325.mp4
>>
>>108029856
It was 13 steps per epoch. 60 pic dataset.
>>
>>108029842
Good to know.
>>
>>108029863
>the random guy going "yeah"
kek
>>
>>108028791
I don't really know how these text encoder + image diffusion models work, but is there a reason the text part couldn't be done in RAM? 30b LLMs offer usable performance when most of it is in RAM. And here text is only a part of what needs to be done.
>>
>>108029848
12GB cuckbros... we wasted our savings...
>>
>>108029861
>I have been condemned to use a 12GB GPU as punishment.
16GB is available for under 500 on the used market.
>>
I made this thread
>>
File: Anima_00148_.png (1.17 MB, 832x1216)
1.17 MB
1.17 MB PNG
Not a /u/ fag but kino prompt there https://civitai.com/images/119355904
Ran it on anima instead.
>>108029888
You can do that?
There is some node, MultiGPU I think, that lets you run shit on CPU. It will be slower on system memory though, obviously.
Comfy will typically unload the text encoder from VRAM before running the unet, so there is typically not too much point in that.
>>
>>108029849
What about asyncdiff and stuff like that?
>>
>>108029912
make me out
>>
>>108029888
>is there a reason the text part couldn't be done in RAM?
isnt that people do with colossal models? cpu for text embedding and let the unet on vram?
>>
>>108029913
anima is such a slop
>>
>>108029840
Not even model sharing?
>>
>>108029808
but it is
>>
where is the self-refining video comfyui implementation?

get to it nerds

chop chop
>>
>>108029824
flux2 is okay once you really nail down what you want, and describe it in absurdly redundant detail
I think its problem is that it's too general, it has a bajillion parameters, but nobody gives a rats ass about 90% of them
klein is what happens when you strip the fat and focus on the one thing anybody cares about, bitches
>>
>>108029956
i tried it but it has errors with res4yl
>>
File: 00001-4017129897.png (1.24 MB, 896x1152)
1.24 MB
1.24 MB PNG
>Prompt: a cute bitch in the desert
>>
>>108029969
proof thats a bitch?
>>
>>108029971
Would it lie to me?
>>
>>108029986
>would a diffusion model lie?
tourist retard
>>
klein edit 9b to make fent man

ltx2 to animate it (i2v workflow from here is nice: https://huggingface.co/RuneXX/LTX-2-Workflows/tree/main)

https://files.catbox.moe/kmh6ua.mp4
>>
>>108029986
if you let it
>>
>>108029989
>tourist calling someone else tourist
>>108029992
I didn't
>>
File: 379708.png (1.86 MB, 1024x1024)
1.86 MB
1.86 MB PNG
>>108029989
the only retarded ones are the ones who stick to this website
picrel is a tourist retard, he looks happy, happier than anyone who seems to frequent here enough to know about the """culture"""
>>
the culture btw is an anonymous schizo blood feud
>>
For me the biggest improvement in achieving likeness was to be organized when collecting the dataset. Have a folder structure that has all angles and shot types and fill it and you will know what you are missing and what you have. Also you should have cropped 1:1 headshots from all angles.

And if you can't be bothered to manage the bucket sizes just use 1:1 ratio.
>>
File: Flux2-Klein_00059_.png (2.18 MB, 1088x944)
2.18 MB
2.18 MB PNG
time to bake, schizo
>>
>>108029912
Good job
>>
File: 003453456_.jpg (243 KB, 1344x768)
243 KB
243 KB JPG
Welp, seems like the character threshold is around 500-600 pics on danbooru at the very least, I tried with characters around 300 and 400 pics but it couldn't shit them out, not even close
>>
>>108029699
>he fell for the fp8 meme
>>
>>108030066
prompt? source is obviously a pepe but is it 1 or 2 images
>>
>>108029996
I made this general mongrel
>>
>>108030090
Presumably it should be because its not completely trained, else its total horseshit
>>
File: Flux2-Klein_00060_.png (2.19 MB, 1024x1008)
2.19 MB
2.19 MB PNG
>>108030093
just 1 image
>Transfer the image into a vibrant graffiti street-mural style.
>>
>>108030104
i dont believe you
>>
File: 1753147770685426.png (590 KB, 589x665)
590 KB
590 KB PNG
https://files.catbox.moe/ch5bgx.mp4
>>
>>108030107
jej
>>
Anybody else got this issue with Comfy that it corrupts checkpoints or TEs? Had this happen on Klein and now on Anima, for Klein I had to kept redownloading the TE because it got fucked up after every use, and now for Anima it's the checkpoint. Every time I close Comfy after running those models and then open it again later they're fucked up and I don't know what's causing it. It doesn't do that with WAN or LTX or SDXL.
>>
File: Flux2-Klein_00107_.png (504 KB, 704x768)
504 KB
504 KB PNG
>>108030108
>>
A LOT OF LOYALTY FOR AN OPENAI SHILL

https://files.catbox.moe/o1hgfk.mp4
>>
>>108030115
>FUD attempt #24526
>>
Let's say I am into realistic nude dolphins.
Would Wan2.2 still be the most adequate for the task?
>>
>>108030115
I have never heard of that and cannot even fathom how that would happen. Are you saying the weights on your computer have become corrupted?
>>
>>108030108
I cringed
>>
>>108030119
can you make him give birth
>>
>>108030149
it would be extremely painful
>>
File: o_00266_.png (3.76 MB, 2560x1536)
3.76 MB
3.76 MB PNG
>>
>>108030152
for you
>>
File: o_00268_.png (3.96 MB, 2560x1536)
3.96 MB
3.96 MB PNG
>>
>>108030166
enough shrooms
>>
If im training a character lora for klein9B with 70 images, should i also include a regularization dataset? I also want to combine it with other loras and use it for editing.
>>
File: o_00269_.jpg (1.43 MB, 2560x1536)
1.43 MB
1.43 MB JPG
>>
>>108030171
yeah
>>
>>108030175
Any good regularization dataset you can recommend? The ones i found online all look like sdxl slop. Or should i just grab 30 images off google and caption them myself?
>>
>>108030171
On paper, you should always use a regularization dataset, but in practice few people ever do.
>>
>>108030182
I'm checking, gimme a sec
>>
>>108030145
Yeah I have no idea how it's happening either. When I redownload the models it works just fine, but after closing Comfy and reopening it and loading the models again they either produce only patterned noise (Anima), or Comfy gives me an error for the TE (Klein).
AND JUST AS I MAKE THIS POST AND RUN MORE TESTS IT STOPS DOING IT WTF. Guess it's "nvm fixed :)" now.
>>
File: ComfyUI_temp_vvofp_00001_.jpg (740 KB, 1024x1536)
740 KB
740 KB JPG
>>108030196
NO THERE IT IS AGAIN AAAAAAAAAAAAA
This is the Anima output after reloading it.
>>
>>108030171
What do you want to regularize for? No one uses regularization images because no one cares about overfitting stray tags since its a single hot-swappable lora and not a fine-tune
>I also want to combine it with other loras and use it for editing.
Won't work 100% well unless they are trained together, lora weights conflict when they are trained separately
https://arxiv.org/abs/2311.13600
https://arxiv.org/abs/2412.04465
>>
>>108030033
its a good train of thought, but you're never gonna have enought training data to cover all those categories with quality images. your dataset should always be focused on teaching what you want to reproduce. for me it's loose dresses or tops that show the natural breast shape, sucking on things, sticking tongue out, breast shape when lying on her back, etc. of course you should balance all this with close-ups, expressions you like, full body images, different lighting, poses. etc. but if you're too focused on just autistically trying to cover every single angle, you're gonna get a lora that's good at making images of her just standing, and not much else.
>>
File: 1740305203967921.png (111 KB, 700x691)
111 KB
111 KB PNG
lmao video extend is gold. cia talking about moot:

https://files.catbox.moe/7zlwbv.mp4
>>
File: 1739568488558801.png (259 KB, 916x779)
259 KB
259 KB PNG
>its another catbox video gen episode
>>
>>108030229
where is my fent spam
>>
File: Flux2-Klein_00110_.png (527 KB, 704x768)
527 KB
527 KB PNG
>>108030229
>>
>>108030229
Wow
Incredible
10 seconds of shitty ai generated voice
>>
I'll take fentposting over obsessed namefaggots any day
>>
new thread

>>108030237
>>108030237
>>108030237
>>
>>108030220
The base model should take care of the poses. What you are teaching is the shape. And, yes, it's quite easy to fill all those categories most of the time.
>>
>>108030204
Are using the Flux2 specific latent?
>>
>>108028761
soulkino, catbox?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.