[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Diffusion Models

Prev: >>107914123

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Flux Klein
https://huggingface.co/collections/black-forest-labs/flux2

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>NetaYume
https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0
https://nieta-art.feishu.cn/wiki/RZAawlH2ci74qckRLRPc9tOynrb

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
>4 collage images from the same person
>>
blessed thread of frenship
>>
So... Which version of Klein should I use? Mainly for I2I. There are like 4 versions
>>
File: 1739453998251996.mp4 (2.02 MB, 896x896)
2.02 MB
2.02 MB MP4
>>107915572
>>
File: 1759959785340943.mp4 (3.7 MB, 768x1152)
3.7 MB
3.7 MB MP4
>>107915649
>>
File: ComfyUI_temp_zpepi_00007_.png (2.77 MB, 1824x1248)
2.77 MB
2.77 MB PNG
>>107916429
kinda sus since they usually come with more images, but it does stroke my e-peen so I'm conflicted
https://files.catbox.moe/edfqjg.png
>>
>>107916442
the one that doesn't have "base" in it, and you have the choice between 4b and 9b, if you can go for the biggest one obviously
>>
>>107916448
i bet that anon who loves you made it
>>
File: 1766691117876995.mp4 (2.14 MB, 1088x768)
2.14 MB
2.14 MB MP4
>>107915682
>>
>>107916450
3080ti can run 9b?
>>
>>107916455
>12gb
yeah, if you go for Q8
https://huggingface.co/unsloth/FLUX.2-klein-9B-GGUF/tree/main
https://github.com/city96/ComfyUI-GGUF
>>
>>
File: 1756071935865850.mp4 (1.81 MB, 896x896)
1.81 MB
1.81 MB MP4
>>107915707
>>
File: ComfyUI_temp_zpepi_00055_.png (2.86 MB, 1824x1248)
2.86 MB
2.86 MB PNG
>>107916451
It's a script that creates the collage, right? So how they picked those up? Unless they rolled the script until this selection came up
https://files.catbox.moe/91ikue.png
>>
>>107916474
awww you are so innocent
>>
>>107916470
That looks real fucking good. congrats
>>
>>107916485
big if true
>>
I feel very mixed about klein 9b. On one hand, it generates more than z-image did, on the other hand, it's worse at details. At least with my prompting style. Also, how are we going backwards with hands? (example not shown)

Also, wtf is this captcha system!?
>>
File: 1761845005535273.mp4 (3.79 MB, 768x1088)
3.79 MB
3.79 MB MP4
>>107915708
>>
>>107916492
Flux anatomy seems to be hereditary. Unfortunate.
>>
File: ComfyUI_temp_zpepi_00029_.png (3.34 MB, 1824x1248)
3.34 MB
3.34 MB PNG
>>107916482
sometimes knowing too much can be bad for one's health. I'll elect to maintain my ignorance on their motivation and just keep slopping
https://files.catbox.moe/2uvv50.png
>>
Is the Flux scheduler node better than using the normal scheduler? Karras is broken for Klein so the normal one seems useless
>>
>>107916496
I can tell...
>>
>>107916505
Just use whatever you want. The recommended settings are just as much of a guesswork as anyone else's settings.
>>
>>107916506
imagine the possibilities
>>
>>107916468
The king has returned
>>
File: 1749206957095916.mp4 (3.69 MB, 896x896)
3.69 MB
3.69 MB MP4
>>107915724
>>
File: 8768.png (1.29 MB, 1248x944)
1.29 MB
1.29 MB PNG
>>107916429
>noticing
>>
>>
File: 1756597931450942.mp4 (3.84 MB, 768x1152)
3.84 MB
3.84 MB MP4
>>107915801
>>
>>107916492
it's for sure not worse at details lol, Flux.2 VAE slaps harder than my dads belt
>>
File: bitmap.jpg (248 KB, 1824x1248)
248 KB
248 KB JPG
https://files.catbox.moe/hhhv6e.png
>>
>>107916492
nice cleavage, did you try res6s for better outputs?
>>
>>107916533
are statue titties really b&?
>>
>>107916419
>>Maintain Thread Quality
>https://rentry.org/debo
>https://rentry.org/animanon
why is this still in the OP?
>>
>>107916528
yeah, I'm kinda dissapointed of that new VAE, it doesn't seem to keep the details better than Flux 1 vae, I'm not seeing any difference with Kontext, it's still shifting the colors and shit during edit
>>
File: 1756575866070624.mp4 (3.72 MB, 768x1088)
3.72 MB
3.72 MB MP4
>>107915725
>>
>>107916505
Euler + Flux.2 Scheduler (the default) seems to be one of the more stable / correct options, but 4 steps is never enough, you want 6 to 10 always, much like Z Image.
>>
File: ComfyUI_temp_zpepi_00066_.png (3.18 MB, 1824x1248)
3.18 MB
3.18 MB PNG
https://files.catbox.moe/8fsx14.png
>>
>>107916533
in your experience, does wan work better for t2i than zit and klein
>>
>>107916544
I think you're going blind anon
>>
File: ZI_00015_.png (3.42 MB, 2304x960)
3.42 MB
3.42 MB PNG
>>107916540
rather not risk it
https://files.catbox.moe/xjjiwx.png
>>
Threadly reminder that Klein sucks referencing styles.
>>
File: 1741911833112507.mp4 (3.75 MB, 1152x704)
3.75 MB
3.75 MB MP4
>>107915833
>>
File: ComfyUI_temp_bvdga_00010_.png (3.19 MB, 1824x1248)
3.19 MB
3.19 MB PNG
>>107916551
WAN works with a very specific style, what I can only describe as "documentary" or "photojournalism" style. I also found a cool vintage anime lora for it as well. So the answer: it depends on what's the best style for the idea. I discovered that ZiT does early CGI really well, for example
https://files.catbox.moe/mcvuey.png
>>
>>107916574
friendly?
>>
File: 1762748839346526.mp4 (3.82 MB, 1088x768)
3.82 MB
3.82 MB MP4
>>107915835
>>
It's funny how this community wants an unslopped model and we get Klein 4B and 9B with variations in the seeds.
Then they complain that the anatomy isn't as good as Z-Image or Qwen, which are slopified models for benchmaxxing.
For the complainers, just throw your Klein gen into Z-Image on a second pass at low denoise. Ez.
>>
File: ComfyUI_temp_zpepi_00069_.png (2.95 MB, 1824x1248)
2.95 MB
2.95 MB PNG
>>107916577
User friendly? WAN takes significantly more than ZiT: 30s versus 2m10s on a 5090 with my workflow
https://files.catbox.moe/wcv7he.png
>>
File: 1765411838611330.jpg (459 KB, 1250x1566)
459 KB
459 KB JPG
>>107916600
>Then they complain that the anatomy isn't as good as Z-Image or Qwen, which are slopified models for benchmaxxing.
wtfN Z-image is not a slopified model at all, it's the best realistic local model and it's great at anatomy, this model (unlike the others) has only been trained on real data
>>
File: 1760591744012559.webm (961 KB, 784x784)
961 KB
961 KB WEBM
>>107916506
>>
>>107916620
kek
>>
File: 1759601375011325.mp4 (3.74 MB, 1088x768)
3.74 MB
3.74 MB MP4
>>107915855
>>
File: 1754151737521545.mp4 (2.75 MB, 1088x768)
2.75 MB
2.75 MB MP4
>>107915880
>>
File: 1746531501518294.png (3.65 MB, 1504x2256)
3.65 MB
3.65 MB PNG
>>
File: 1753316118278857.mp4 (3.64 MB, 1088x768)
3.64 MB
3.64 MB MP4
>>107915892
>>
File: ZI_00021_.png (3.39 MB, 2304x960)
3.39 MB
3.39 MB PNG
>>107916615
I agree; I get pretty good results with ZiT, it's very good at prompt adherence but the lack of composition variance can be a problem
>>107916633
love the fishes swimming around the aircraft
https://files.catbox.moe/yvhehz.png
>>
>>107916546
I conclude with this.
>>
>still no wan updates

if they release a wan model that does 30 seconds and can generate audio, I will never ask for a new video model ever again.
>>
File: 1750265623365886.mp4 (3.69 MB, 1088x768)
3.69 MB
3.69 MB MP4
>>107915917
>>
File: fk9b_00104.png (1.61 MB, 960x1440)
1.61 MB
1.61 MB PNG
>>
>>107916615
idk to me it seems all the Z image outputs look the same, and it has zero variation
it's definitely overtrained to the maxx for perfect looking people photorealism, hopefully we get more variety with z-base and finetunes (yes i know, chinese culture)
>>
File: 1766730942818845.mp4 (3.77 MB, 768x1088)
3.77 MB
3.77 MB MP4
>>107915929
>>
File: fk9b_00105.png (2.45 MB, 960x1440)
2.45 MB
2.45 MB PNG
>>107916695
>>
File: dramatic1girl.mp4 (3.02 MB, 832x1280)
3.02 MB
3.02 MB MP4
>>
File: 1737681368098398.webm (2.16 MB, 640x960)
2.16 MB
2.16 MB WEBM
>>107916525
>>
File: 1742157577423166.mp4 (3.81 MB, 1088x768)
3.81 MB
3.81 MB MP4
>>107915965
>>
File: ComfyUI_temp_zpepi_00072_.png (3.43 MB, 1824x1248)
3.43 MB
3.43 MB PNG
>>107916700
>this is what the heralds of the antichrist actually believe
https://files.catbox.moe/b29a7a.png
>>
File: 1768181660982255.jpg (955 KB, 1504x2256)
955 KB
955 KB JPG
>>
>>107916734
i made this image
>>
File: 1761592497095071.png (149 KB, 417x350)
149 KB
149 KB PNG
>>107916525
Klein can do that? time to use this model
>>
>>107916748
Yeah, I saw it somewhere in a montage comparing models and decided to take a crack at it as well
>>
>>107916757
thief
>>
File: 1742006270974134.mp4 (2.44 MB, 1024x768)
2.44 MB
2.44 MB MP4
>>107915970
>>
File: 1758863762857921.mp4 (3.85 MB, 1088x768)
3.85 MB
3.85 MB MP4
>>107915994
>>
File: fk9b_00124.png (1.65 MB, 960x1440)
1.65 MB
1.65 MB PNG
>i do not consent to anyone animating or replicating my intellectual property!
>>
>>107916761
Bad artists copy. Good artists steal.
>>
File: 1741770175325696.mp4 (3.71 MB, 1088x768)
3.71 MB
3.71 MB MP4
>>107916005
>>
File: edited.jpg (1.53 MB, 1024x2048)
1.53 MB
1.53 MB JPG
woah, didn't realize that klein can edit! Pretty cool!
>>
>>107916761
please saar do the needful and be giving me your PO box so I can be returning it to you saar
https://files.catbox.moe/2pr93t.png
>>
>>107916784
coma patient
>>
>>107916784
bruh, that's why Klein is so hyped in the first place, it's great at editing shit
>>
>>107916789
can you gen balu giving birth to mogli
>>
>>107916797
sure, I'll put it in the queue
>>
File: 1764648941073619.mp4 (3.69 MB, 1088x768)
3.69 MB
3.69 MB MP4
>>107916012
>>
File: 1753086557786780.mp4 (3.62 MB, 1088x768)
3.62 MB
3.62 MB MP4
>>107916062
>>
i thought we would have some discussion and not just spamming worthless videos
>>
File: 1768414186108688.mp4 (3.7 MB, 1280x640)
3.7 MB
3.7 MB MP4
>>107916109
>>
File: 1757544165215738.png (3.05 MB, 1203x1805)
3.05 MB
3.05 MB PNG
>>107916741
>>
The random fucking amputees klein does is driving me crazy. Regression compared to old flux.
>>
File: ComfyUI_temp_zpepi_00080_.png (3.18 MB, 1824x1248)
3.18 MB
3.18 MB PNG
https://files.catbox.moe/kdrre8.png
>>
File: 1766280530553733.mp4 (3.69 MB, 1216x704)
3.69 MB
3.69 MB MP4
>>107916118
>>
>>107916833
i am trying more steps and it seems to happen less often but not never.
>>
>>
>>107916833
stinky
>>
File: Klein_Edit_00145.jpg (3.12 MB, 1920x1072)
3.12 MB
3.12 MB JPG
>>107916784
That's literally the best part!
>>
>>107916844
>>107916833
use nag on high
>>
>>107916823
/ldg/ is finally returning to its /sdg/ roots
>>
z base not being released led to this
>>
>>107916833
Changing samplers tends to fix this for me
>>
>>107916865
there's almost no point in checking /ldg/ until base is out.
>>
>>107916865
>>107916873
what no ZiB does to a mofo
https://files.catbox.moe/l7cxxl.png
>>
how do I show an image in a node in comfyui? load image works but it only works if your local output folder has that image
>>
File: 1749016256940592.mp4 (3.75 MB, 1088x768)
3.75 MB
3.75 MB MP4
>>107916151
>>
File: 1746250041410766.mp4 (3.89 MB, 1088x768)
3.89 MB
3.89 MB MP4
>>107916125
>>
File: fk9b_00139.png (2.21 MB, 960x1440)
2.21 MB
2.21 MB PNG
>>107916852
i'm pretty happy just hitting gen again most of the time but it is a little annoying
>>
can klein inpaint?
>>
File: 1742520846545474.mp4 (3.77 MB, 768x1024)
3.77 MB
3.77 MB MP4
>>107916204
>>
File: 1763978391500747.mp4 (2.68 MB, 1408x576)
2.68 MB
2.68 MB MP4
>>107916231
>>
I always wanted to ask but never did
why does /g/ have three different diffusion generals? I guess /adt/ is for ilxl and nai and such but what's the difference between/ldg/ and /sdg/?
>>
What is the current status of sound generation? Does anyone here use MMAudio?

Are there any good solutions for generating NSFW sound for an NSFW input video?
And what about generating character voices locally? Can this be done effectively?
>>
>>107916936
ace step 1.5 will release in two weeks
>>
File: 1742346230992533.mp4 (3.03 MB, 960x832)
3.03 MB
3.03 MB MP4
>>107916262
>>
>>107916932
right now there's basically no difference between /ldg/ and /sdg/. could probably just merge them.
>>
File: Chroma_00001 (1).png (2.78 MB, 1520x1040)
2.78 MB
2.78 MB PNG
>>107916909
cute vid
>>107916797
I couldn't get it to actually birth him; best I could do was this
https://files.catbox.moe/sji4da.png
>>
>>107916829
not bad!
>>
>>107916948
anon he is a minor
>>
File: 1756245745400571.mp4 (3.77 MB, 1088x768)
3.77 MB
3.77 MB MP4
>>107916474
>>
File: 00549-981739542.png (608 KB, 512x768)
608 KB
608 KB PNG
Was looking through my old gens and found this from 2023, don't believe I ever uploaded it.
>>
>>107916954
I am aware; looks like he's eating her out, which was as far as I could take it by prompting; maybe the birthanon can develop it further from there.
https://files.catbox.moe/fcbd0c.png
>>
>>107916961
you couldn't make something like that 23, stop lying
>>
>>107916962
>balu
>her
>>
File: ComfyUI_temp_pmzcr_00001_.png (3.68 MB, 1824x1248)
3.68 MB
3.68 MB PNG
>>107916962
>>107916954
forgot to upload img
>>
File: 1750710291152429.mp4 (3.69 MB, 768x1152)
3.69 MB
3.69 MB MP4
>>107916468
>>
>>107916966
in this image, it is a female, since the idea was to make balu birth mowgli; i did manage to create birthing scenes in Chroma before, so it was quite frustrating that I can't crack it with this one
>>
File: 00413-207853073.png (482 KB, 768x512)
482 KB
482 KB PNG
>>107916965
I can't tell if you're rage baiting me or not, but here's the catbox
https://files.catbox.moe/04r14d.png
>>
File: x_gynvse.png (1.48 MB, 1536x1024)
1.48 MB
1.48 MB PNG
>>
>>107916986
ai slop
>>
File: 1768771115186727.mp4 (3.87 MB, 896x896)
3.87 MB
3.87 MB MP4
>>107916492
>>
File: 1742518727789388.mp4 (3.69 MB, 1088x768)
3.69 MB
3.69 MB MP4
>>107916502
>>
File: 1765909003781940.png (368 KB, 1000x500)
368 KB
368 KB PNG
>>107916784
>>
>>107917010
now make the poke birth her
>>
>>107917010
Listen, I'm slow, I know this.
>>
File: 1737361601306009.mp4 (2.3 MB, 1024x768)
2.3 MB
2.3 MB MP4
>>107916524
>>
File: 1765949670901284.mp4 (3.67 MB, 768x1152)
3.67 MB
3.67 MB MP4
>>107916525
>>
File: fk9b_00155.png (1.82 MB, 960x1440)
1.82 MB
1.82 MB PNG
i'm starting to think the anatomy issues are coming from prompt adherence, where z-image just decides if something is confusing it's going to just make it up if you say something "his hands are in the air, his right arm is bent" it just adds more arms, even though technically it makes sense.
>>
Is the wait coming to an end?
>https://github.com/Comfy-Org/ComfyUI/pull/11979
>>
File: 1740254254177658.webm (1.3 MB, 960x640)
1.3 MB
1.3 MB WEBM
>>107916986
>>
>>107917044
>not culture
>however, 14 more days to go

lets see what predditor milks this one for updoots
>>
>>107917045
kek, based
>>
File: 1765716539007784.mp4 (3.73 MB, 1088x768)
3.73 MB
3.73 MB MP4
>>107916533
>>107916549
https://i.4cdn.org/r/1768881279400611.mp4
>>
File: 1757473639704677.mp4 (2.77 MB, 1408x576)
2.77 MB
2.77 MB MP4
>>107916556
>>
File: many.mp4 (2.83 MB, 1280x864)
2.83 MB
2.83 MB MP4
>>107916959
kek
>>
File: 1759059180938919.mp4 (3.78 MB, 1088x768)
3.78 MB
3.78 MB MP4
>>107916574
>>
File: 1767198753739868.mp4 (3.7 MB, 832x1024)
3.7 MB
3.7 MB MP4
>>107916615
>>
File: fk9b_00172.png (2.32 MB, 960x1440)
2.32 MB
2.32 MB PNG
>>
Flux seems to be able to understand fix hands but doesn't seem to be able to really copy art styles that well.
>>
File: ComfyUI_temp_pmzcr_00010_.png (3.37 MB, 1824x1248)
3.37 MB
3.37 MB PNG
>>107917092
>>107917089
>>107917088
>>107917086
Very cool gens. Congrats
https://files.catbox.moe/40t98y.png
>>
I'm very skeptical of the guy who said he trained a Klein lora with only 12gb VRAM. I have a 16gb card and the Ostris trainer just hangs while loading the transformer.
>>
oh wow these are some extremely low quality posts!
obviously intentional!
someone is being naughty naughty and not following the rules!
>>
File: fk9b_00144.png (2.09 MB, 960x1440)
2.09 MB
2.09 MB PNG
>>107917114
i could see it being possible if you unloaded the te but even that that's really right
>>
>>107916780
dark
>>
File: ComfyUI_00446_.png (3.92 MB, 1536x2048)
3.92 MB
3.92 MB PNG
>>
>>107916932
/sdg/ was inhabited by cancerous namefags that formed a circlejerk. /ldg/ was created to get away from those people. it is now mostly an abandoned husk. /adt/ was created by a troll in an attempt kill /ldg/. this same troll made several local diffusion themed generals at the same time but only /adt/ managed to stick. it's just a generic AI anime image dump now. There's no experimentation, sharing of workflows, technique discussion or anything. just mindless anime ai pics spam. /ldg/ is still the main local diff thread
>>
File: ComfyUI_06766_.png (1.2 MB, 1360x768)
1.2 MB
1.2 MB PNG
>>107916839
>the roaring knight

Is there a working workflow for klein edit + lora loading?
Specifically the 9b version.
>>
File: 1760371145803096.mp4 (3.7 MB, 768x1152)
3.7 MB
3.7 MB MP4
>>107916703
>>
So heunpp2 definitely gives better results for image editing compared to euler. It usually does a better job at hands and anatomy, and also creates more coherent backgrounds. The biggest downside is that it's really slow and makes brighter images.
>>
>>107917109
I think they messed up the hands in the training, Klein tends to give male hands to women, I bet Z-Edit will mog the shit out of BFL once again
>>
File: 1744601185638029.mp4 (3.68 MB, 768x1152)
3.68 MB
3.68 MB MP4
>>107916695
>>
File: 1764509005958170.mp4 (3.73 MB, 1088x768)
3.73 MB
3.73 MB MP4
>>107916734
>>
File: 1761094441226534.mp4 (3.67 MB, 896x896)
3.67 MB
3.67 MB MP4
>>107916669
>>
File: 54545645656484.png (77 KB, 1003x331)
77 KB
77 KB PNG
Bros local music is so back, I could hardly tell the difference between this gen and what hypothetically Udio would give for same kind of lyrics.

https://files.catbox.moe/zg6sjx.mp3
>>
File: 1741060004578121.mp4 (3.78 MB, 768x1152)
3.78 MB
3.78 MB MP4
>>107916741
>>
File: fk9b_00186.png (2.53 MB, 960x1440)
2.53 MB
2.53 MB PNG
>>107917168
>>107917162
thanks they're terrible and also useless. i regret posting my images in the first place.
>>
File: 1759260514334445.mp4 (2.66 MB, 1408x576)
2.66 MB
2.66 MB MP4
>>107916661
>>
File: ComfyUI_temp_pmzcr_00016_.png (2.92 MB, 1824x1248)
2.92 MB
2.92 MB PNG
https://files.catbox.moe/6seuza.png
>>
File: 1745751795897819.gif (1.61 MB, 480x270)
1.61 MB
1.61 MB GIF
>>107917044
>>https://github.com/Comfy-Org/ComfyUI/pull/11979
HOLY SHIT ITS HAPPENING
>>
File: Flux2-Klein9B_00306_.png (446 KB, 1440x1024)
446 KB
446 KB PNG
hmm
>>
>>107917063
>>107917044
>14 more days to go
why?
>>
>>107917178
Can it generate only instrumental music without voices?
>>
>>107917211
Isn't this just pulled from the diffusers/transofrmers or whatever pr?
>>
>>107917044
Hans woke up Chang it seems
>>
>>107917044
don't care. klein is better.
>>
File: ComfyUI_temp_pmzcr_00005_.png (3.54 MB, 1824x1248)
3.54 MB
3.54 MB PNG
>>107917211
stop this madness. for how long can chinese culture live rent-free in my head? I was waiting for zib to be released before I slopped out, but the wait was too great
https://files.catbox.moe/sfs706.png
>>
>>107917219
he needs the model to test it out though? it means alibaba gave it to him
>>
>>107917228
>he needs the model to test it
Never stopped anyone.
>>
>>107917044
OHHHHHHHHH COMFY BETTER NOT BE FUCKING AROUND

THEY SAW KLEIN AND THEY ARE RESPONDING

ITS HAPPENINGGGGGGGG
>>
File: 1738177470991340.png (94 KB, 224x224)
94 KB
94 KB PNG
>>107917044
>>107917221
>Hans woke up Chang it seems
BASED, GIVE IT TO ME NOWWWWWWW
>>
so 2026 actually a good year for local?
ltx, klein, z-base soon, acestep 1.5 soon
and who knows what else, it's only january
>>
why are people so hell bent on shitting this general up
i come back after a busy ass day and this guy is still posting his slop
do you seriously have nothing better to do?
>>
>>107917241
so far. maybe if we're lucky we'll get wan 3 locally.
>>
File: ComfyUI_temp_pmzcr_00019_.png (3.77 MB, 1824x1248)
3.77 MB
3.77 MB PNG
>>107917242
if we had something better to do, we wouldn't be here on some anonymous basketweaving forum
https://files.catbox.moe/rlpqvy.png
>>
>>107917221
Not awake enough, Chaim (ltx) should further taunt Alibaba with more video comparisons
>>
>>107917254
im specifically talking about the guy that's making meaningless video edits
he's basically spamming
>>
>>107917264
>he's basically spamming
don't hesitate to report
>This post seems to be an automated spambot
>>
>>107917218
>Can it generate only instrumental music without voices?

Of course
https://files.catbox.moe/j8kzu0.mp3
>>
File: 1749391890567071.jpg (36 KB, 612x574)
36 KB
36 KB JPG
i'm fucking impressed with ltx i2v. the motion is great
2026 is local
>>
>>107917278
2026 is really starting great, ltx, klein, soon z-image base, and this music model sounds good too
>>107917178
>>107917275
>>
>>107916750
Do what?
>>
>>107917044
>https://github.com/Comfy-Org/ComfyUI/pull/11979
can't believe they really waited for Klein to be released before making their move, as if they knew Klein was a forced to be reconed with, not a single soul thought Klein would be good, is there's some chink employees stealing the goods from BFL or what? lol
>>
File: Flux2-Klein_00366_.png (2.8 MB, 1248x1664)
2.8 MB
2.8 MB PNG
sneed oils

>>>/wsg/6075874
>>
HELLO???
https://github.com/Comfy-Org/ComfyUI/pull/11979
https://github.com/Comfy-Org/ComfyUI/pull/11979
https://github.com/Comfy-Org/ComfyUI/pull/11979
>>
>>107917275
Going to try this out.
>>
>>107917178
> I could hardly tell the difference between
You must be deaf then.
>>
>>107917282
If it's your first time hearing it, it's insane what it can do, you've heard nothing yet.
https://files.catbox.moe/jc3fgz.mp3

It also knows multiple languages
https://files.catbox.moe/7pqlbx.mp3
>>
>>107917296
Why are they hell bent on stealing BlackForest thunder? Chinese culture, probably.
>>
>>107917317
>HELLO???
BASED DEPARTMENT?? >>107917044
>>
>>107917319
Model ain't out yet, release date set at 27th.
>>
>>107917275
wow this is an awful example
>>
>>107917183
its a bot, its been spamming the thread for the past week
>>
>>
File: 1762471293192508.png (117 KB, 640x366)
117 KB
117 KB PNG
>>107917328
>Why are they hell bent on stealing BlackForest thunder? Chinese culture, probably.
poor bfl, Alibaba is ready to dunk on them hard for the second time, feels personal kek
>>
Went back to 2.1 Vace and can't believe its this good. It's even better using it with the recent models that are coming out.
>>
>>107917367
ltx is bad, I dont understand the hype honestly, at least for anything that requires action wan (even 2.1) is superior. You can IMMEDIATELY tell when something has been done with ltx as the video comes out ultra slopped.
>>
bros i cant believe we're getting base
>>
>>107917372
I use put my Wan gens into LTX 2 for vid2vid and audio sync.
The prompt following isn't good like Wan, but the long videos are worth it.
It just sucks cause of all the model swapping and ram. These guys really need to team up so we can have one good model.
>>
>>107917044
this is weird, usually when comfy implements a Alibaba model, the PR and the weight release arrive at the same time, it's the first time the PR merge arrived before, maybe he just copy/pasted the code from diffusers or whatever and this has nothing to do with Alibaba making a move
>>
>>107917317
it would be hilarious if base was """delayed""" all this time purely because they were waiting for BFL to release klein
>>
>>
>>107917351
The same thing happened to /hdg/: at first it were OP/collage wars, then came the spam with not caring mods, and finally the general became unusable and dead. Seems like /hgg/ we'll have to leave the thread to the schizo and his unrelated links in the OP and move to another general.
>>
>>107917367
>>107917372
vace 2.1 is great. works well with the light loras, and particularly some of the later high noise 2.2 light loras. https://github.com/bbaudio-2025/ComfyUI-SuperUltimateVaceTools is good for longer vace vids. careful with the light loras (still havnt found the perfect balance yet), too long a length and it begins to fry
>>
>>107917383
Do you still have hope it will be good, as in, Klein good? Their own chart showed that the model sucks.
>>
>>107917321
Well, it almost sounds like it, it missed some apart so of the lyrics in middle, but for sure the rest was Udio tier voice coherence.
>>
File: 1754258002971220.png (905 KB, 1369x1200)
905 KB
905 KB PNG
>Grok got cucked
i- If i only now how to set up Local models..........
>>
>>107917473
base is not the same as klein. klein is used for editing. we need to wait for z image edit.
>>
i love foooooocus. why the fuck isn't everything this easy
>>
what is the audio model you guys are using
>>
>>107917506

There's a gazillion youtube video on how to set up comfyUI and flavor of the month models.
>>
>>107917473
the point of it is not to be good it's to be trainable (and if so desired, re-distilled afterwards)
>>
>>107917528
I know but the models are so BIG and i only have 6MBps download speed
>>
File: f4b.jpg (274 KB, 2048x2048)
274 KB
274 KB JPG
>>107917506
What did the SaaS do now? Ban porn again?

Anyhow it's not hard, just spend that hour or w/e and set your local models up.
>>
>>107917506
They're just shooting themselves in the foot. Compared to Claude, Gemini it was never good anyways. Why use them if they're just going to cuck themselves now? That was their only good thing.
>>
File: Untitled.png (22 KB, 635x319)
22 KB
22 KB PNG
>>107917317
why'd they remove omni from model zoo though?
>>
>>107917539
probably not worth getting x banned altogether in europe
>>
>>107917537
You can upload Skimpy stuff on Grok and get it animated..... until few hours ago.
>>
File: r4b.png (2.21 MB, 848x1488)
2.21 MB
2.21 MB PNG
>>107917534
oh no, it'll take a bunch of minutes of your computer's time. bruh/gurl just get the models.

>>107917551
was it particularly noteworthy? i didn't really even see much of it on the wider internet. the anime shitposters don't seem to have reposted many skimpy animated anime shitposts for example.
>>
>>107917414
>t.
>>
>>
>>107917414
repeat ad infinitum
>>
>>107917566
>bruh/gurl just get the models.
There are too many variation of it. For example Wan 2.1 models is like 16gb and the variations of it has the same size. My HDD gonna cry.
>>
>>107917539
It can still generate porn and is too dumb to censor it if you prompt in roundabout ways. That's all this shit is literally good for anyway.
>>
>>107917580

>2026
>HDD for AI models.

Skill issues. Upgrade your job buddy.
>>
>>107917566
You can make your favorite anime character suck cocks and do anal (vaginal is hard) in grok..... Until few hours ago

>>107917594
I have 2tb SSD and i dont want to use it for Slops
>>
>>107917580
yes well then load just ltx2 and one wan2 in some q5-q8 gguf format or w/e, or just one of them. gguf tends to have a good size-quality tradeoff.

alternatively get more SSD/HDD space when it starts bothering you
>>
File: inpainted_00023_.png (1021 KB, 1280x720)
1021 KB
1021 KB PNG
>>107916231
Dude! Fallen Angels and Chungking Express are two of my favorite movies of all time. ZiT understands Wong Kar Wai style as well.
>>
>>107917598
> You can make your favorite anime character suck cocks and do anal (vaginal is hard) in grok..... Until few hours ago
SaaS does it again, kek. I personally don't see why these services including the ones in the USA always end up anti porn now, but I guess that's how it is.

Gotta go local I suppose. BTW full animated NSFW is mostly still on WAN (usually with LoRA, though "all in one" checkpoints where the LoRA have been somewhat adequately included exist), not LTX2.
>>
Does inpainting work with edit? When I try to prompt it to change a small detail but leave the rest of the image intact it still warps and zooms the the image in slightly
>>
>>107917636
How much it distorts can depend a lot on the edit model and what you're doing.

In almost all cases it's going to be more precise if you draw a mask (or use a segmentation model to get a mask or whatever you prefer to define where the edit happens).
>>
>>107917598
having models on a HDD isn't bad if you don't switch models/loras often. like if you plan on using wan the entire day it's perfectly fine.
>>
>>107917636
it does, I use crop and stitch nodes
>>
Bro this NSFW klein Lora on civtai is fucking cooked. I think I'll have to burn my computer.
>>
File: 1754976896594289.png (1.11 MB, 3219x1842)
1.11 MB
1.11 MB PNG
>>107917506
I can set up simply gen tool like SwarmUI
man but FUCK ComfyUI. its ANYTHING but COMFY
>>
5090 bros, how much power have you cut? Since the market is fucked, I want to increase the longevity.
>>
>>107917706
>downloading workflows from civitai
looks like a you problem
>>
>>107917372
Any examples? I have never seen good action from wan (Unless you mean sex lol)
>>
>>107917683
>>107917696
Thanks. I just used the inpaint model conditioning node to replace the regular reference conditioning nodes and it worked surprisingly well.
>>
>>107917732
Ok can you share your workflows for simple image edit for Flux 2 Klein ?
>>
What is the best TTS for lewd audio gens?

Also what is the best software/model for generating audio from video?
>>
File: 1751308631665772.png (164 KB, 1486x1090)
164 KB
164 KB PNG
>>107917771
from the official templates, you stupid brainlet
>>
File: 1748452698292275.jpg (117 KB, 1590x1025)
117 KB
117 KB JPG
>>107917807
Weird since my workflow look like this
>>
>>107917818
kill yourself shitposter
>>
tried to train a simple celeb character lora for klein. 4b, 9b, 512, 1024, captions, no captions, rank 16, rank 32, timeshift 1, timeshift 3, aitoolkit, onetrainer, 3k steps, 6k steps, ... and they were all completely shit. like absolutely deformed and unusable. i used a set of 20 clean images that worked perfectly well with zit.
so either this model is trash or something is seriously wrong with the code
>>
>>107917859

Ostris AI toolkit seems to work right out the gate with default settings. What are you using?
>>
File: Flux2-Klein_00189_.jpg (285 KB, 1328x912)
285 KB
285 KB JPG
>>107917859
OneTrainer comes with so bad default settings it pretty much guarantees your lora will look like shit
>>
what went wrong with comfyui?
>>
>>107917807
now do one of wan 2.2 i2v with lora
>>
>>107917899
it looks even worse. low res and grainy with down syndrome while zit gave really nice results for the same dataset

>>107917874
like i said, tried aitoolkit and onetrainer. maybe some incompability with specific python/torch versions and klein?
>>
wassup chat :3
>>
>>107917920
>it looks even worse. low res and grainy with down syndrome while zit gave really nice results for the same dataset
Yeah it can be really terrible. Breaks limbs and fingers like nothing else.
>>
>>107918063
not only limbs and fingers for me, there simply is almost no face likeness at all. 50% at best
>>
>>107917905
that's an incorrect question
comfy is broken from the start, just like the whole poothon ecosystem
whole thing needs to be destroyed
>>
>>107917905
>>107918160
get some sleep ani
>>
>>107917771
https://github.com/BigStationW/ComfyUi-TextEncodeEditAdvanced/blob/main/workflow/workflow_Flux2_Klein_9b.json
>>
>>107917044
https://github.com/Comfy-Org/ComfyUI/commit/8ccc0c94fa0d8e43fffe7190e6a36551a53df54a
>Make omni stuff work on regular z image for easier testing.
doesn't that mean that he doesn't have z-image omni?
>>
Is IndexTTS2 the best voice generation model?
>>
File: 1742130190028415.webm (633 KB, 800x768)
633 KB
633 KB WEBM
>>
>load all webms in gallery view
>too many requests
>lasts 30+ mins
big angry
>>
>>107918285
Hmm. Not sure which is best, but I think IndexTTS2 is one of the top candidates, yes.
>>
looks like there's a good new 3B music model:
https://heartmula.github.io/

could be something to use instead of LTX's strange music
>>
File: te3.png (1.9 MB, 2048x2560)
1.9 MB
1.9 MB PNG
>@grok is this true?
>>
>>107918342
>Good
Listen to the voice on those samples, then listen to ACEStep 1.5 anon-kun... Night and day difference. HeartMuLa is just another shitty SongBloom tier model that is good at certain genres and songs, but is not creative with instruments (and hence the musicality heavily suffers compared against SOTA models like Udio or Suno) so you can't do much with it. ACEStep 1.5 is the first good local music model that will bridge the gap.
>>
>>107918231
yeah, maybe they gave inference code to prolong the 2MW status, I just need a bdsqlsz tweet now and ill calm down
>>
>>107918366
truth nuke, when will ACEStep be released though?
>>
>>107917135
Criminally underrated.
>>
>>107918366
> future unreleased model may be better
some model will eventually also be better than acestep 1.5, what good is that if it's all unreleased?
>>
File: 1751630039159037.png (2.16 MB, 1632x928)
2.16 MB
2.16 MB PNG
>>
File: 1755771743461685.png (2.37 MB, 1280x1184)
2.37 MB
2.37 MB PNG
tis time for torture
>>
File: 1762267997457369.png (2.43 MB, 1312x1152)
2.43 MB
2.43 MB PNG
>>
File: 1757239362860982.png (2.16 MB, 1088x1376)
2.16 MB
2.16 MB PNG
ponder the aroma
>>
>>107918439
*blows raspberry on teh foot :3*
>>
File: 1755875048997382.png (2.29 MB, 1664x1216)
2.29 MB
2.29 MB PNG
https://www.reddit.com/r/StableDiffusion/comments/1qhv0g1/flux_klein_gives_me_sd3_vibes/
Uh oh, bretzel sisters, our response?
>>
>>107918416
Any time soon? I doubt it. Everyone else who is fully open is putting out crap in terms of music models. Only ones who actually had a good model was YuE guys, but they never iterated on it nor gave any updates so their goal wasn't at all to catch up to Suno/Udio. ACEStep from day 1 that has been their goal, plus they want to surpass them in their next iteration and it's obvious. Being open to community feedback like this, interacting with the community is what sets them ahead instantly. Just seeing how much a massive leap 1.5 is over 1.0 is enough intuition and all you need to know that they will eventually do exactly what they set out to do (though after 1.5, the powers that be will definitely notice and it might not be the same E.G. they could sell out or worse because music mobs don't fuck around, but it's nice that we're at least getting 1.5 before they can shut it down).
>>
File: 1761179698725173.png (2.51 MB, 1088x1376)
2.51 MB
2.51 MB PNG
>>107918448
here 1 more foot, thanks for your (you) patronage.
>>
>>107918490
Plebbitors are funny. The best model by far for all those prompts and much more is Chroma, because it knows anatomy better than anything else. And that was trained on de-distilled Schnell, and now we have Klein which is superior to that.
>>
8bit keygen music? ACEStep 1.5: Fuck yes!
https://files.catbox.moe/567tjd.mp3
>>
File: 55645646546.png (36 KB, 1091x227)
36 KB
36 KB PNG
>>107918388
>when will ACEStep be released though

Let's hope it's true and there's no indefinite "2 more weeks" like Z Base.
>>
>>107918409
>>>/wsg/6075913
>>
File: 1744557287304790.png (3.22 MB, 2580x1216)
3.22 MB
3.22 MB PNG
Klein Edit is pretty based
>>
>>107918494
sounds good? it still needs a release though. remember when people thought stable diffusion was probably committed to local open models?
>>
>>107918652
Yeah it is, especially because it's so fast.
>>
>>107918668
One thing about ACEStep is their license changed at one point, went from non-commercial to fully open recently, still has potential to change right at release and screw us all over (let's hope that's not the case), because while a fun model, a non-commercial license is useless and restricts freedom .
>>
File: Flux2-Klein_00239_.png (2.56 MB, 848x1216)
2.56 MB
2.56 MB PNG
By Albrecht Duerer. I think zit does woodcut-like images better.
>>
>>107918652
>celeb shit
end your life
>>
>>107918652
Now ask it to transfer the hairstyle
>>
>>107918652
>Klein Edit is pretty based
no lies detected, and I didn't expect BFL to make such an uncensored model, Kontext used to do nothing when asking this kind of shit, it has censorship layers, this model has none
>>
File: Flux2-Klein_00241_.png (2.08 MB, 848x1216)
2.08 MB
2.08 MB PNG
By NC Wyeth.
Looks nothing like Wyeth, actually. Rather, like some sort of period artist cluster annotated by a vllm.
>>
File: Flux2-Klein_00242_.png (2.23 MB, 848x1216)
2.23 MB
2.23 MB PNG
Alphonse Mucha was abstracted, too.
>>
So whats the play now that the video spam is confirmed to be a bot made by Julien?
I say we all pool in some money to hire someone to beat him up
>>
File: 1742034252643473.png (700 KB, 2154x1354)
700 KB
700 KB PNG
>>107918766
>like some sort of period artist cluster annotated by a vllm.
a bad vllm, Gemini 3.0 has no issue recognizing such artists
>>
>>107918766
>>107918786
I haven't found a single artist style that works.
>>
File: 1764305295119801.png (2.93 MB, 1364x1040)
2.93 MB
2.93 MB PNG
>>107918766
you can do that with Klein
https://docs.bfl.ai/guides/prompting_guide_flux2_klein
>A woman drinking coffee in a bar, Use the style from the reference image
>>
File: 1752897409742372.png (1.85 MB, 1216x1728)
1.85 MB
1.85 MB PNG
how do i inpaint when editing with klein? it keeps changling details i want to remain the same
>>
File: 1761261146589110.png (2.46 MB, 1557x768)
2.46 MB
2.46 MB PNG
>>107918766
>>107918786
>>107918826
>>
>>107918787
Ran I hope you realize that this is a call to violence and is not very legal. Your meth abuse is finally making you go insane.
>>
>>
Fresh when new

>>107918851
>>107918851
>>107918851
>>
>>107918843
Silly anon, "call to violence" implies ths aubject is a human being, which Julisn has repeatedly shown it isn't
>>
>>107916920
yes just inpaint like you would any other model
>>
>>107918439
maiwaifu
>>
>>107918826
>>107918836
How do you add three reference images?
>>
>>107918971
take my workflow anon https://files.catbox.moe/c6qpph.json
>>
>>107918684
Yea, lets hope that is the case.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.