/g/ - /ldg/ - Local Diffusion General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/ldg/ - Local Diffusion Genera(...) 10/30/25(Thu)12:11:36 No.107054044

File: highlights_g_107049284_17(...).jpg (1.04 MB, 2620x1685)

1.04 MB JPG

/ldg/ - Local Diffusion General Anonymous 10/30/25(Thu)12:11:36 No.107054044 Archived

Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107049284

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Neta Yume (Lumina 2)
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd
https://gumgum10.github.io/gumgum.github.io/
https://neta-lumina-style.tz03.xyz/
https://huggingface.co/neta-art/Neta-Lumina

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo

Anonymous
10/30/25(Thu)12:12:49 No.107054061

Anonymous 10/30/25(Thu)12:12:49 No.107054061

File: 1756576992113833.png (274 KB, 1827x602)

274 KB PNG

Imagine the kino level of shitpost if we really get suno 4.5 at home

Anonymous
10/30/25(Thu)12:15:14 No.107054091

Anonymous 10/30/25(Thu)12:15:14 No.107054091

>all chroma shit
>probably own gens too
kys OP frfr

Anonymous
10/30/25(Thu)12:16:05 No.107054106

Anonymous 10/30/25(Thu)12:16:05 No.107054106

>>107054061
>rainbow
>woman avatar
....... is he cooking?

Anonymous
10/30/25(Thu)12:17:03 No.107054110

Anonymous 10/30/25(Thu)12:17:03 No.107054110

>>107054106
>woman
I think it's a man avatar, the hair is short

Anonymous
10/30/25(Thu)12:17:58 No.107054121

Anonymous 10/30/25(Thu)12:17:58 No.107054121

>>107054061
If it was so easy suno would have been as good as udio long ago, but udio was always better, so I have my doubt that local can be as good.

Anonymous
10/30/25(Thu)12:19:29 No.107054137

Anonymous 10/30/25(Thu)12:19:29 No.107054137

File: dmmg_0087.png (1.19 MB, 832x1216)

1.19 MB PNG

>>107054015
gonna depend on the model, lora strength, captions etc. this is generally the reason why you use nonstandard words to invoke the lora to avoid confusion in the model.

Anonymous
10/30/25(Thu)12:20:53 No.107054151

Anonymous 10/30/25(Thu)12:20:53 No.107054151

File: file.png (2.21 MB, 1472x1136)

2.21 MB PNG

Anonymous
10/30/25(Thu)12:21:24 No.107054155

Anonymous 10/30/25(Thu)12:21:24 No.107054155

>>107054121
this, suno is overrated as fuck, they always pretend it's at the same level as udio when it's definitely not

Anonymous
10/30/25(Thu)12:22:14 No.107054162

Anonymous 10/30/25(Thu)12:22:14 No.107054162

>caring about the fagollage

Anonymous
10/30/25(Thu)12:23:39 No.107054175

Anonymous 10/30/25(Thu)12:23:39 No.107054175

Pay debo no mind he's disabled

Anonymous
10/30/25(Thu)12:24:39 No.107054183

Anonymous 10/30/25(Thu)12:24:39 No.107054183

>>107054047
https://vocaroo.com/1lhI4LNQojvT
Udio 1.0 of course.

Anonymous
10/30/25(Thu)12:27:13 No.107054206

Anonymous 10/30/25(Thu)12:27:13 No.107054206

>>107054183
udio is amazing desu

Anonymous
10/30/25(Thu)12:28:51 No.107054221

Anonymous 10/30/25(Thu)12:28:51 No.107054221

>>107054061
I listened to his samples, they are struggling to get quality of YuE even.

Anonymous
10/30/25(Thu)12:29:26 No.107054227

Anonymous 10/30/25(Thu)12:29:26 No.107054227

So wan q8 gguf is like 95% of as good as fp16 and a little faster?

Anonymous
10/30/25(Thu)12:31:20 No.107054244

Anonymous 10/30/25(Thu)12:31:20 No.107054244

>>107054183
meh

Anonymous
10/30/25(Thu)12:31:36 No.107054248

Anonymous 10/30/25(Thu)12:31:36 No.107054248

>>107054221
show some of those samples here anon, I don't want to go to trooncord

Anonymous
10/30/25(Thu)12:38:38 No.107054322

Anonymous 10/30/25(Thu)12:38:38 No.107054322

>>107054227
with a little optimism and cope

Anonymous
10/30/25(Thu)12:40:10 No.107054339

Anonymous 10/30/25(Thu)12:40:10 No.107054339

>>107054227
yeah, the quality is really equivalent and it's 2x lighter in terms of size

Anonymous
10/30/25(Thu)12:40:50 No.107054350

Anonymous 10/30/25(Thu)12:40:50 No.107054350

>>107054227
I have the same speed almost with 16 vs q8 on a 5090, I just block swap half the model.

Anonymous
10/30/25(Thu)12:43:24 No.107054367

Anonymous 10/30/25(Thu)12:43:24 No.107054367

>>107054227
Half the precision, half as good
But being able to run it makes you twice as blind

Anonymous
10/30/25(Thu)12:50:08 No.107054430

Anonymous 10/30/25(Thu)12:50:08 No.107054430

File: ComfyUI_temp_hqdve_00082_.png (1.71 MB, 1328x1328)

1.71 MB PNG

Anonymous
10/30/25(Thu)12:51:55 No.107054444

Anonymous 10/30/25(Thu)12:51:55 No.107054444

File: file.png (182 KB, 760x715)

182 KB PNG

Contrastive flow matching is tight.

Anonymous
10/30/25(Thu)12:52:53 No.107054448

Anonymous 10/30/25(Thu)12:52:53 No.107054448

>>107054444
>Contrastive flow matching
what is that?

Anonymous
10/30/25(Thu)12:53:05 No.107054450

Anonymous 10/30/25(Thu)12:53:05 No.107054450

File: ComfyUI_08092_.png (2.04 MB, 1152x1152)

2.04 MB PNG

>>107054206
Yes, I really want to hope that local will catch up but that seems like a leap from nothing, not even SD, to a Dalle 3 tier music model. High quality manually captioned audio data is probably a must for such results, and then really good DPO process.

Anonymous
10/30/25(Thu)12:57:18 No.107054482

Anonymous 10/30/25(Thu)12:57:18 No.107054482

opensource emu3.5 with 32b, which according to the authors is supposed to be superior to nano banana in every way.
Looking at the sample images, I have my doubts about

Anonymous
10/30/25(Thu)12:58:00 No.107054492

Anonymous 10/30/25(Thu)12:58:00 No.107054492

>>107054448
It's a new version of flow matching that encourages the model to find unique paths which speeds up convergence, gives shaper results because it's not blending paths, and also encourages diverse results (because paths aren't blended).

Anonymous
10/30/25(Thu)12:59:11 No.107054498

Anonymous 10/30/25(Thu)12:59:11 No.107054498

>>107054492
I see, and I guess you're using that method to make a lora right?

Anonymous
10/30/25(Thu)13:00:10 No.107054508

Anonymous 10/30/25(Thu)13:00:10 No.107054508

File: file.png (119 KB, 258x265)

119 KB PNG

>>107054498
I'm using the method to train a 1.5B model from scratch.

Anonymous
10/30/25(Thu)13:00:21 No.107054509

Anonymous 10/30/25(Thu)13:00:21 No.107054509

>>107054444
test finetune or what

Anonymous
10/30/25(Thu)13:00:35 No.107054512

Anonymous 10/30/25(Thu)13:00:35 No.107054512

>>107054482
its chinkslop. if its not good they lie and say it is. if it is good they make you pay for it. the good thing about china isnt that its better, its that its cheaper

Anonymous
10/30/25(Thu)13:01:28 No.107054518

Anonymous 10/30/25(Thu)13:01:28 No.107054518

>>107054508
Tell us more. This sounds interesting.

Anonymous
10/30/25(Thu)13:01:51 No.107054524

Anonymous 10/30/25(Thu)13:01:51 No.107054524

>>107054508
nice, how much faster is it compared to the previous method?

Anonymous
10/30/25(Thu)13:02:50 No.107054530

Anonymous 10/30/25(Thu)13:02:50 No.107054530

>>107054508
i believe in bigma

Anonymous
10/30/25(Thu)13:02:52 No.107054531

Anonymous 10/30/25(Thu)13:02:52 No.107054531

>>107054482
even if it's true, 32b is just too big

Anonymous
10/30/25(Thu)13:04:02 No.107054542

Anonymous 10/30/25(Thu)13:04:02 No.107054542

>>107054518
It's just a revision of the Pixart model I'm working on. 1.5B, Pixart architecture with the HDM mlp, and Ostris's 16 channel VAE.

>>107054524
Insanely fast compared to MSE, doing like 0.02 loss per day which means a full from scratch model on a single 5090 in 50 days.

Anonymous
10/30/25(Thu)13:05:15 No.107054546

Anonymous 10/30/25(Thu)13:05:15 No.107054546

File: 1748619460298122.png (813 KB, 1920x1080)

813 KB PNG

>>107054531
>is just too big

Anonymous
10/30/25(Thu)13:05:18 No.107054547

Anonymous 10/30/25(Thu)13:05:18 No.107054547

>>107054531
32b for a model claiming to do everything they claim it can is quite impressive actually. the problem is benchmarks don’t hold up against reality.

Anonymous
10/30/25(Thu)13:05:30 No.107054550

Anonymous 10/30/25(Thu)13:05:30 No.107054550

>>107054444
Based quads

Anonymous
10/30/25(Thu)13:06:49 No.107054564

Anonymous 10/30/25(Thu)13:06:49 No.107054564

>>107054482
>according to the authors
And why should I believe authors this time? They lie as often as the common whore.

Anonymous
10/30/25(Thu)13:07:54 No.107054576

Anonymous 10/30/25(Thu)13:07:54 No.107054576

>>107054564
>And why should I believe authors this time?
you should never believe them, like everything you test it out by yourself and see that at 95% of the time it's a big nothingburger

Anonymous
10/30/25(Thu)13:09:41 No.107054591

Anonymous 10/30/25(Thu)13:09:41 No.107054591

>>107054542
Pretty cool tinkering. What sort of database are you using?

Anonymous
10/30/25(Thu)13:10:03 No.107054595

Anonymous 10/30/25(Thu)13:10:03 No.107054595

>>107054564
My text consisted of two parts. Why are you asking me something that I answer in the second sentence?

Anonymous
10/30/25(Thu)13:13:13 No.107054629

Anonymous 10/30/25(Thu)13:13:13 No.107054629

File: ComfyUI_08094_.png (2.05 MB, 1152x1152)

2.05 MB PNG

>>107054482
It's unfortunate that from Chinese all we get are models on par with Seedream but they are made out to be something more. I'll give them props for catching up to Seedream by using Seedream based synthetic slop in conjunction with 4o though.

Anonymous
10/30/25(Thu)13:13:35 No.107054636

Anonymous 10/30/25(Thu)13:13:35 No.107054636

File: file.png (154 KB, 771x712)

154 KB PNG

>>107054542
Sexy loss graph. The new training run is switching from a 3d perlin noise Automagic warm up to AdamW.

>>107054591
I scraped millions of images from duckduckgo but I also have e621, danbooru and gelbooru (200k images). Then a lot of lets plays from Youtube for games. A few movie screencaps from stuff I have / famous movies. It's generally pop culture and art centered.

Anonymous
10/30/25(Thu)13:14:50 No.107054642

Anonymous 10/30/25(Thu)13:14:50 No.107054642

>>107054636
so like the dark blue was the normal method and the light blue is the constrative flow thing?

Anonymous
10/30/25(Thu)13:16:23 No.107054655

Anonymous 10/30/25(Thu)13:16:23 No.107054655

Weird how no local models can break into the top 10 anymore in arenas. Shame they fell so far behind

Anonymous
10/30/25(Thu)13:16:31 No.107054657

Anonymous 10/30/25(Thu)13:16:31 No.107054657

File: ComfyUI_08093_.png (1.74 MB, 1152x1152)

1.74 MB PNG

>>107054629
Arguably, Qwen already did that though. We are stuck getting same model over and over again. One slightly more fancy than the previous.

Anonymous
10/30/25(Thu)13:17:04 No.107054659

Anonymous 10/30/25(Thu)13:17:04 No.107054659

>>107054642
No dark blue was me experimenting with a Automagic idea I had which basically was activating layers and parameters with 3d perlin noise much like how wind is simulated in a video game. Light blue is AdamW which is much faster than the other optimizer but I think the Automagic warmup was probably helpful especially for forcing the layers and parameters to be randomly and usefully activated.

Anonymous
10/30/25(Thu)13:21:13 No.107054706

Anonymous 10/30/25(Thu)13:21:13 No.107054706

>>107054636
>I scraped millions of images from duckduckgo
Yandex used to be so good for this. Now it barely works as image search.

Anonymous
10/30/25(Thu)13:23:15 No.107054715

Anonymous 10/30/25(Thu)13:23:15 No.107054715

>>107054706
Unfortunately all search results are fucking garbage now because AI bullshit is everywhere. I scraped early last year before the flood.

Anonymous
10/30/25(Thu)13:24:01 No.107054725

Anonymous 10/30/25(Thu)13:24:01 No.107054725

>>107054636
retard here, what exactly does loss mean in this context?

Anonymous
10/30/25(Thu)13:24:13 No.107054727

Anonymous 10/30/25(Thu)13:24:13 No.107054727

>>107054715
I think to be sure you should scrap every images before 2022, it's the last year before the AI flood

Anonymous
10/30/25(Thu)13:25:14 No.107054736

Anonymous 10/30/25(Thu)13:25:14 No.107054736

>>107054725
the loss is like the error between the image you trained and the recreation of the model, the goal is to get the lower loss possible

Anonymous
10/30/25(Thu)13:31:03 No.107054798

Anonymous 10/30/25(Thu)13:31:03 No.107054798

TTS with voice cloning capabilities that you can set up by using docker compose with the relevant options for your system, it comes with an api and GUI set up too.

https://github.com/devnen/Chatterbox-TTS-Server

Anonymous
10/30/25(Thu)13:43:12 No.107054902

Anonymous 10/30/25(Thu)13:43:12 No.107054902

>>107054725
0 loss means the model’s prediction perfectly matches the objective, denoising an image or finding a flow. In practice this is catastrophic failure and memorization if you ever get to 0. Normally for diffusion models (e.g. SDXL) loss is how well the image is denoised. Models like Flux have loss based on finding paths in latent space to the image. Contrastive flow adds an additional objective that paths must be unique.

Anonymous
10/30/25(Thu)13:43:20 No.107054907

Anonymous 10/30/25(Thu)13:43:20 No.107054907

>>107054657
you forgot slower every iteration

Anonymous
10/30/25(Thu)13:46:45 No.107054935

Anonymous 10/30/25(Thu)13:46:45 No.107054935

File: ComfyUI_00001_.png (871 KB, 1024x1024)

871 KB PNG

Hello people, its my first time using chroma, how can I get my pics to be higher quality? Higher steps or more negative prompts?

Anonymous
10/30/25(Thu)13:46:56 No.107054938

Anonymous 10/30/25(Thu)13:46:56 No.107054938

>>107054636
Cool stuff. It's a great learning experience.

Anonymous
10/30/25(Thu)13:47:44 No.107054947

Anonymous 10/30/25(Thu)13:47:44 No.107054947

File: ComfyUI_08102_.png (1.57 MB, 1152x1152)

1.57 MB PNG

>>107054706
Yeah, quite sad what they did to it.

Anonymous
10/30/25(Thu)13:50:03 No.107054969

Anonymous 10/30/25(Thu)13:50:03 No.107054969

File: SD35Medium_Output_262662.png (3.31 MB, 1280x1536)

3.31 MB PNG

>>107053778
3.5 Medium was stronger at the top end of its resolution range than the lower end TBQH, like generating at e.g. 1216x1600 would very often be way more coherent and better looking than 832x1216 on the same seed. So for a high res use case like this it might make sense especially if they've tuned it any past the base model. Attached pic is a native 1280x1536 one-shot Medium gen, for example.

Anonymous
10/30/25(Thu)13:51:14 No.107054980

Anonymous 10/30/25(Thu)13:51:14 No.107054980

>>107054736
>>107054902
i see
interesting and makes me wish i wasn't a retard

Anonymous
10/30/25(Thu)13:54:26 No.107055016

Anonymous 10/30/25(Thu)13:54:26 No.107055016

>>107054980
It's not that complicated and all the smart people already did the math for you. I'm half a retard and I just experiment with bleeding edge research other people have already done. Really to get into any of this you just have to drop into it and be willing to sweat, a lot of this shit is just tedious churning, especially captioning. For example I'm working on finetuning Joycaption to be better which requires handcaptioning thousands of images.

Anonymous
10/30/25(Thu)13:54:33 No.107055017

Anonymous 10/30/25(Thu)13:54:33 No.107055017

>>107054947
It's AI filtered walled garden.
Speculation wise, maybe Cuckflare and others restricted their web crawlers because of the geopolitical incidents. It's okay because Google can leech everything.

Anonymous
10/30/25(Thu)13:56:13 No.107055036

Anonymous 10/30/25(Thu)13:56:13 No.107055036

>>107054902
>Contrastive flow adds an additional objective that paths must be unique.
that's quite a smart idea when you think about it, it forces the model to not be lazy and work on every edge cases

Anonymous
10/30/25(Thu)13:57:05 No.107055043

Anonymous 10/30/25(Thu)13:57:05 No.107055043

File: 1730723114519001.jpg (1.49 MB, 2016x1152)

1.49 MB JPG

Anonymous
10/30/25(Thu)13:57:30 No.107055046

Anonymous 10/30/25(Thu)13:57:30 No.107055046

File: SD35Medium_Output_75544.png (3.32 MB, 1600x1216)

3.32 MB PNG

>>107054969
And this one's 1600x1216

Anonymous
10/30/25(Thu)13:59:05 No.107055061

Anonymous 10/30/25(Thu)13:59:05 No.107055061

>>107055046
illustrious is the peak of local diffusion. no other model comes close in terms of character knowledge, style knowledge, concept knowledge, and overall fidelity. Chroma is a disgusting blurry mangled mess, neta knows a fraction of the styles, qwen is a bloated stopgap that can’t even compete with seedream 3. SDXL is an absolute triumph and will likely not be surpassed for years

Anonymous
10/30/25(Thu)14:00:53 No.107055077

Anonymous 10/30/25(Thu)14:00:53 No.107055077

File: 00011-145286813.jpg (1.11 MB, 2000x2000)

1.11 MB JPG

Anonymous
10/30/25(Thu)14:01:12 No.107055080

Anonymous 10/30/25(Thu)14:01:12 No.107055080

>>107055077
depends on the scope of the finetune dataset. they'll probably manage to make the girls/boys hotter, among some other things. it's probably fixing the biggest popular issue in a bunch of months or so?

idk if anyone will get enough compute to train the boorus, real fashion/nsfw collections, cjk idols and so on.

Anonymous
10/30/25(Thu)14:03:21 No.107055096

Anonymous 10/30/25(Thu)14:03:21 No.107055096

>>107055061
(You)

Anonymous
10/30/25(Thu)14:03:47 No.107055100

Anonymous 10/30/25(Thu)14:03:47 No.107055100

>>107055096
i think people actually will start to finetune it with ramtorch or w/e. it'll likely be quite slow.

Anonymous
10/30/25(Thu)14:04:10 No.107055103

Anonymous 10/30/25(Thu)14:04:10 No.107055103

>>107055061
There's multiple small model projects now that have proven you can train a from scratch DiT model for a couple thousand dollars. The fact is many people don't want to stick out their neck especially if they're in North America or Europe.

Anonymous
10/30/25(Thu)14:04:54 No.107055110

Anonymous 10/30/25(Thu)14:04:54 No.107055110

>>107055103
Not really. Multigpu is mainly used for higher batch sizes, not learning rate. A single gpu would only go ~4 times slower than the typical setup, and you would only rake in more donations by going slow and doing incremental releases. Recent papers have proven that with modern optimizers the results are no worse too.

Anonymous
10/30/25(Thu)14:08:34 No.107055149

Anonymous 10/30/25(Thu)14:08:34 No.107055149

>>107055110
That's not what I said. There's nothing stopping you from taking HDM, scaling it 4x. The AMD 340m toy model was trained in 1.5 days. Pixart 600m is better than SD 1.5 and on par with SDXL. So you can assume a modern DiT 2B model could be trained in less than 30 days and be SOTA as a booru model.

Anonymous
10/30/25(Thu)14:10:35 No.107055167

Anonymous 10/30/25(Thu)14:10:35 No.107055167

>>107053799
>Udio partnership with UMG

https://desuarchive.org/g/thread/106957370/#106958310

So it has begun. First they will figure out which model is best, give their artists exclusive access to this model, and then give the general public a watered down version of the model, if the public gets a version at all.

Anonymous
10/30/25(Thu)14:11:28 No.107055177

Anonymous 10/30/25(Thu)14:11:28 No.107055177

Sage attention 3 when.

Anonymous
10/30/25(Thu)14:11:57 No.107055184

Anonymous 10/30/25(Thu)14:11:57 No.107055184

>>107055167
Remember, Udio is no joke compared to everyone else. Going after them specifically is very strategic.

Anonymous
10/30/25(Thu)14:19:01 No.107055277

Anonymous 10/30/25(Thu)14:19:01 No.107055277

Finished baking male vocals version of:
>>107053777
Lyrics I posted here already:
>>107053946
https://voca.ro/1muX2AJkOy1P

Anonymous
10/30/25(Thu)14:19:40 No.107055283

Anonymous 10/30/25(Thu)14:19:40 No.107055283

>>107055277
cringe
im waiting for acestep 1.5

Anonymous
10/30/25(Thu)14:19:50 No.107055288

Anonymous 10/30/25(Thu)14:19:50 No.107055288

File: musicarena.png (57 KB, 1013x442)

57 KB PNG

>"Udio is da best!!!11!!1"
Reality is picrel
You can make direct comparisons yourselves in the Music Arena if you want
Udio is only "better" if you are into gacha and generate parts of the song multiple times until it does something good, and Suno allows you to do something like that as well

Anonymous
10/30/25(Thu)14:20:49 No.107055297

Anonymous 10/30/25(Thu)14:20:49 No.107055297

File: 1739936326305832.png (412 KB, 736x650)

412 KB PNG

>>107055288
>mememarks
>udio v1.5

Anonymous
10/30/25(Thu)14:22:29 No.107055321

Anonymous 10/30/25(Thu)14:22:29 No.107055321

I haven't been in the game for like a year now and I'm starting to plan a full system upgrade. Why is there no normal middle ground between 16 and 32 VRAM cards, nvidia?
I have a 2060 super, 8GB vram. My question is, assuming I just go for a 5080 or something for the 16GB, could I use it together with the 2060 to add up the VRAM and leave the rest to regular RAM? Or should I not bother and just make do with a 5080?
I could afford a 5090, I just cant help but feel like I'm being ripped off and honestly more than anything else I'm worried about it melting and exploding or something. Is undervolting/underclocking a good idea?

Anonymous
10/30/25(Thu)14:26:38 No.107055359

Anonymous 10/30/25(Thu)14:26:38 No.107055359

>>107055321
Im retarded and forgot to mention that I want to get into video generation. the pastebin doesn't go too much into detail on multi-card drifting for what its worth

Anonymous
10/30/25(Thu)14:27:27 No.107055370

Anonymous 10/30/25(Thu)14:27:27 No.107055370

>>107055149
I think the best possible way to train an anime model would be to first train your own captioner model that took tag lists along with an image as input, and interleaved the tags into proper sentences (in a way that would try to be grammatically correct but not necessarily to a fault) based on what it could actually see, while also adding spatial information where it could and where it made sense. Then you could just run that model on the Danbooru dataset directly with the original accurate tag lists for each model.

I'd also pick just a maximum resolution and proportionally downscale larger images to that if needed, but not *upscale* anything whatsoever, rather just bucketing everything at as close to the original upload res as possible. So the end result would be basically a fairly robust mixed-res model that could coherently do a wide range of resolutions rather than just focusing on one range.

Anonymous
10/30/25(Thu)14:31:44 No.107055412

Anonymous 10/30/25(Thu)14:31:44 No.107055412

>>107055288
Udio 1.0 was best by far. They neutered the model after that. I've never seen a Suno gen on par with Udio 1.0 composition wise.

Anonymous
10/30/25(Thu)14:32:15 No.107055418

Anonymous 10/30/25(Thu)14:32:15 No.107055418

File: WAN2.2_00406.mp4 (3.62 MB, 624x832)

3.62 MB MP4

Anonymous
10/30/25(Thu)14:37:56 No.107055483

Anonymous 10/30/25(Thu)14:37:56 No.107055483

>>107055321
If you're patient you could wait for the 50 series super refresh as the 5080 super is supposed to have 24gb vram. Can't speak to multi gpu use, but 8gb seems rather abysmal and not worth the hassle especially when you consider you can offload to system memory.
As for the 5090, I have one and I've undervolted, overclocked it and have capped the power at 80% (460W) without any issues. In any case make sure to get at least 64gb of system ram if you're gonna gen videos.

Anonymous
10/30/25(Thu)14:41:41 No.107055524

Anonymous 10/30/25(Thu)14:41:41 No.107055524

>>107055370
How'd I do it is have the same image with three different captions:
- tags as seen on the booru site
- short description
- long description

Each caption really is a supported way for a user to prompt the model and the model will naturally learn how to mix the different caption types. The problem we've now seen multiple times now is people training on caption blob balls and forcing the model to be reliant on long captions if you want maximum output quality.

Anonymous
10/30/25(Thu)14:42:42 No.107055531

Anonymous 10/30/25(Thu)14:42:42 No.107055531

>>107055321
There's really no downside to undervolting a 5090, you lose 5% performance and reduce the power 30%.

Anonymous
10/30/25(Thu)14:43:58 No.107055541

Anonymous 10/30/25(Thu)14:43:58 No.107055541

>>107055277
Bridges and outro are pretty weak https://voca.ro/12sNX01jyU6M

Anonymous
10/30/25(Thu)14:49:59 No.107055599

Anonymous 10/30/25(Thu)14:49:59 No.107055599

>>107055541
It's still pretty good if it's local.
Eg. in terms of slop.

Anonymous
10/30/25(Thu)14:50:40 No.107055603

Anonymous 10/30/25(Thu)14:50:40 No.107055603

>>107055412
>>107055288
Suno very likely is trained on royalty-free music libraries like Audiojungle which is why it sounds worse but also more "polished. I'm guessing Udio was trained on more copyrighted music even before the UMG deal so it's more random but gives more interesting results.

Anonymous
10/30/25(Thu)14:54:53 No.107055650

Anonymous 10/30/25(Thu)14:54:53 No.107055650

>>107055321
3090's are pretty cheap

Anonymous
10/30/25(Thu)14:59:35 No.107055685

Anonymous 10/30/25(Thu)14:59:35 No.107055685

>>107055603
Old Udio was overfit on copyrighted stuff, if you input the same tags and lyrics some tracks had, you'd get nearly identical outputs

Anonymous
10/30/25(Thu)15:01:11 No.107055697

Anonymous 10/30/25(Thu)15:01:11 No.107055697

File: me and ma girl.png (1.51 MB, 768x1344)

1.51 MB PNG

>>107055077

Anonymous
10/30/25(Thu)15:02:47 No.107055713

Anonymous 10/30/25(Thu)15:02:47 No.107055713

>>>/pol/520205588
How do I do this on my laptop?

Anonymous
10/30/25(Thu)15:03:56 No.107055729

Anonymous 10/30/25(Thu)15:03:56 No.107055729

>https://huggingface.co/nvidia/ChronoEdit-14B-Diffusers
anyone tried the new nvidia edit model?

Anonymous
10/30/25(Thu)15:06:48 No.107055752

Anonymous 10/30/25(Thu)15:06:48 No.107055752

File: me and ma girl3.png (903 KB, 1280x768)

903 KB PNG

>>107055697

Anonymous
10/30/25(Thu)15:11:28 No.107055797

Anonymous 10/30/25(Thu)15:11:28 No.107055797

>>107055524
I think your way might work if you literally swapped out the sets of captions for each image between epochs, probably better than slapping them all in one caption file

Anonymous
10/30/25(Thu)15:14:50 No.107055826

Anonymous 10/30/25(Thu)15:14:50 No.107055826

File: ComfyUI_00425_.png (2.72 MB, 1088x1344)

2.72 MB PNG

Anonymous
10/30/25(Thu)15:15:11 No.107055828

Anonymous 10/30/25(Thu)15:15:11 No.107055828

>>107055797
That's what I mean, you duplicate the image for each caption type. And if you practice VAE jitter you prevent memorization.

Anonymous
10/30/25(Thu)15:18:37 No.107055860

Anonymous 10/30/25(Thu)15:18:37 No.107055860

>>107055826
Nice. Maybe more... try dynamic angle, rim light. Really good.

Anonymous
10/30/25(Thu)15:25:29 No.107055921

Anonymous 10/30/25(Thu)15:25:29 No.107055921

File: UwU2.jpg (2 MB, 1536x2048)

2 MB JPG

Anonymous
10/30/25(Thu)15:28:03 No.107055946

Anonymous 10/30/25(Thu)15:28:03 No.107055946

does infinite talk work with wan2.2?

Anonymous
10/30/25(Thu)15:28:42 No.107055951

Anonymous 10/30/25(Thu)15:28:42 No.107055951

File: ComfyUI_00431_.png (2.7 MB, 1088x1344)

2.7 MB PNG

>>107055826

Anonymous
10/30/25(Thu)15:33:17 No.107055982

Anonymous 10/30/25(Thu)15:33:17 No.107055982

>>107055951
Love it. What's the model? I should try and gen something related to this.

Anonymous
10/30/25(Thu)15:36:33 No.107056008

Anonymous 10/30/25(Thu)15:36:33 No.107056008

File: ComfyUI_00434_.png (2.67 MB, 1088x1344)

2.67 MB PNG

>>107055951
>>107055982
Qwen

Anonymous
10/30/25(Thu)15:38:05 No.107056018

Anonymous 10/30/25(Thu)15:38:05 No.107056018

>>107056008
Cinematic Redmond had these wibes. Cool that Qwen can be grainy too. Of course it's probably pretty stiff but that's what they all are.

Anonymous
10/30/25(Thu)15:42:44 No.107056057

Anonymous 10/30/25(Thu)15:42:44 No.107056057

>>107054482
>emu3.5
HF links are all dead
I can test it if the models are available somewhere

Anonymous
10/30/25(Thu)15:45:50 No.107056080

Anonymous 10/30/25(Thu)15:45:50 No.107056080

File: failed.mp4 (276 KB, 832x480)

276 KB MP4

>>107055752
I can't make her sit on the pig, but it's still funny

Anonymous
10/30/25(Thu)15:54:01 No.107056110

Anonymous 10/30/25(Thu)15:54:01 No.107056110

Is there a way for Librewolf (flatpak) to remember its last directory? In Linux Mint.
It's somewhat tiring to use ComfyUI and I need to open a file dialog to traverse all the way up from /home/ to my work mount...

Anonymous
10/30/25(Thu)15:54:06 No.107056111

Anonymous 10/30/25(Thu)15:54:06 No.107056111

>>107055603
I don't think they're prompted the same. Udio has a better understanding of music, that shows because with a good prompt it destroys almost any Suno song I've ever heard

https://www.udio.com/songs/2bXYLKaVDyVwi1GAb6pSkR

This is a very hard song
https://www.udio.com/songs/7zrLreMnwCYrdBqQkGtEXM

The musical depth I've witnessed out of this model truly is insane. Unprecedented connection between lyrics and musical notes. It has mastered vocals and intonation in a way Suno has not.

Using high quality copyrighted music in conjunction with whatever royalty-free music is available for training the model is the way to go.

Anonymous
10/30/25(Thu)15:55:33 No.107056121

Anonymous 10/30/25(Thu)15:55:33 No.107056121

>>107056111
Yeah, that's just subjective to people who have never played any instrument in their lives.

Anonymous
10/30/25(Thu)15:58:59 No.107056151

Anonymous 10/30/25(Thu)15:58:59 No.107056151

Any of you has a recommandation to generate videos for a music video? I'm looking for 16:9, some kind of 35mm grain/look, mostly still shots but with some travelling too. Theme is urban 90's/2000's workers working daily shifts.

I only know of Veo 3 so far, and looking at Runway.

Anonymous
10/30/25(Thu)16:08:05 No.107056217

Anonymous 10/30/25(Thu)16:08:05 No.107056217

so has the copywritepocalypse finally started

Anonymous
10/30/25(Thu)16:09:46 No.107056231

Anonymous 10/30/25(Thu)16:09:46 No.107056231

>>107054444
>>107054508
>>107054542
>>107054636
Are you using TREAD like HDM too? And have you considered going VAE-less using
SVG (or at least using EQ-VAE like HDM)? https://arxiv.org/pdf/2510.15301

if not, you are missing out on huge speedups

Anonymous
10/30/25(Thu)16:10:37 No.107056239

Anonymous 10/30/25(Thu)16:10:37 No.107056239

>>107055921
this is a very nice gen
what model did you use?

Anonymous
10/30/25(Thu)16:11:19 No.107056241

Anonymous 10/30/25(Thu)16:11:19 No.107056241

>>107056231
TREAD is much harder to implement if you want the real speed up. The 16-channel VAE I'm using has been EQ'd yeah.

Anonymous
10/30/25(Thu)16:13:31 No.107056256

Anonymous 10/30/25(Thu)16:13:31 No.107056256

>>107056239
NovaOrangleXL_v120
I'm just testing my linux installation, I deleted all my previous models. Don't have noob or anything else.

Anonymous
10/30/25(Thu)16:14:14 No.107056262

Anonymous 10/30/25(Thu)16:14:14 No.107056262

>>107055149
>>107055103
THIS, we are on the cusp of home baked local SOTA.

Anonymous
10/30/25(Thu)16:16:36 No.107056281

Anonymous 10/30/25(Thu)16:16:36 No.107056281

>>107056121
Udio literally just got acquired by the largest music label. That should tell you all you need to know.

Anonymous
10/30/25(Thu)16:18:28 No.107056305

Anonymous 10/30/25(Thu)16:18:28 No.107056305

>>107056281
I don't know, really.

Anonymous
10/30/25(Thu)16:19:44 No.107056317

Anonymous 10/30/25(Thu)16:19:44 No.107056317

Test

Anonymous
10/30/25(Thu)16:21:53 No.107056339

Anonymous 10/30/25(Thu)16:21:53 No.107056339

File: 00219-597055894.png (2.74 MB, 1248x1824)

2.74 MB PNG

Anonymous
10/30/25(Thu)16:22:04 No.107056342

Anonymous 10/30/25(Thu)16:22:04 No.107056342

File: Sanic.jpg (2.06 MB, 1536x2048)

2.06 MB JPG

Anonymous
10/30/25(Thu)16:22:26 No.107056349

Anonymous 10/30/25(Thu)16:22:26 No.107056349

>>107056256
thank you anon, hows your linux experience going?

Anonymous
10/30/25(Thu)16:24:47 No.107056375

Anonymous 10/30/25(Thu)16:24:47 No.107056375

File: 00058-3892630275-ad-before.jpg (534 KB, 2048x2688)

534 KB JPG

Yume hands do work but you have to describe them in the prompt, treat it more like a LLM

Anonymous
10/30/25(Thu)16:24:54 No.107056376

Anonymous 10/30/25(Thu)16:24:54 No.107056376

File: ComfyUI_00015_.png (1.13 MB, 1024x1024)

1.13 MB PNG

/iemg/ lore, you wouldnt get it

Anonymous
10/30/25(Thu)16:25:27 No.107056382

Anonymous 10/30/25(Thu)16:25:27 No.107056382

File: image_00105_.jpg (385 KB, 1179x1768)

385 KB JPG

Anonymous
10/30/25(Thu)16:25:56 No.107056389

Anonymous 10/30/25(Thu)16:25:56 No.107056389

>>107056349
Yeah, well, I'm an experienced faggot but I wouldn't advertise it for normal people. Even with the most common interfaces, it's been 20 years and they still can't get a file save dialog right.
I have used Irix and it never had these issues.
Like save a file from Cum and it defaults to some ~/.
Open a file...
It's great if you are a developer but for a normal person just use Windows.
I feel like that Linux environments have gone backwards since I last used them 15 years ago.

Anonymous
10/30/25(Thu)16:27:14 No.107056405

Anonymous 10/30/25(Thu)16:27:14 No.107056405

>>107056389
Flatpak browser does not remember the save file location from Cum.
This is what I mean.
I need to browse in 5+ deep to just get to the directory I want.

Anonymous
10/30/25(Thu)16:27:23 No.107056410

Anonymous 10/30/25(Thu)16:27:23 No.107056410

File: 00224-1489466956.png (2.57 MB, 1824x1248)

2.57 MB PNG

Anonymous
10/30/25(Thu)16:27:42 No.107056412

Anonymous 10/30/25(Thu)16:27:42 No.107056412

remember that guy that was training a model and said his image of a brown splotch for the prompt "a woman" was 80% of the way there?

Anonymous
10/30/25(Thu)16:27:58 No.107056415

Anonymous 10/30/25(Thu)16:27:58 No.107056415

>>107056375
Okay, then what's the optimal total prompt length in your experience?

Anonymous
10/30/25(Thu)16:28:32 No.107056421

Anonymous 10/30/25(Thu)16:28:32 No.107056421

File: 00061-1166320666-ad-before.jpg (996 KB, 2048x2688)

996 KB JPG

Text can be consistent with small phrases from the looks of it and it is really sensitive to artist. The wrong tag will completely fuck everything which lends to the needs more training.
>>107056415
I never pay attention to that doesn't matter from my testing?

Anonymous
10/30/25(Thu)16:29:29 No.107056432

Anonymous 10/30/25(Thu)16:29:29 No.107056432

>>107056405
>>107056389
im a linux user, what distro r u on?
im on debian and the default file save dialog in brave/mullvad remembers locations (actually not sure about mullvad because it has crazy settings, but firefox did work) for extensions
what environment are u using? i use dwm so less ram/vram is used

Anonymous
10/30/25(Thu)16:30:53 No.107056440

Anonymous 10/30/25(Thu)16:30:53 No.107056440

>>107056412
how's your dataset going? still 0%?

Anonymous
10/30/25(Thu)16:31:04 No.107056441

Anonymous 10/30/25(Thu)16:31:04 No.107056441

>>107056432
Yeah, comparing distros is like comparing dicks. I think I'm using Librewolf and it's a flatpak - this explains why it does not remember the directories.

Anonymous
10/30/25(Thu)16:31:59 No.107056451

Anonymous 10/30/25(Thu)16:31:59 No.107056451

>>107056440
how's your training going 2 years later? still at 80% there?

Anonymous
10/30/25(Thu)16:32:28 No.107056455

Anonymous 10/30/25(Thu)16:32:28 No.107056455

>>107056421
>hands do work but you have to describe them in the prompt
Sounded like your approach is to really bloat the prompt with specifics, but I guess I misunderstood.

Anonymous
10/30/25(Thu)16:32:39 No.107056458

Anonymous 10/30/25(Thu)16:32:39 No.107056458

>>107056441
try firefox or brave, or look for settings to disable forgetting save directory in about:config
flatpak is likely the issue because of muh sandboxing

Anonymous
10/30/25(Thu)16:33:32 No.107056465

Anonymous 10/30/25(Thu)16:33:32 No.107056465

>>107056451
it doesn't take much philosophy to understand you don't do anything and thus won't achieve anything, don't put your insecurities of failure on me, thanks :)

Anonymous
10/30/25(Thu)16:34:22 No.107056476

Anonymous 10/30/25(Thu)16:34:22 No.107056476

>>107056465
yup, still 80% there confirmed lmao

Anonymous
10/30/25(Thu)16:35:30 No.107056490

Anonymous 10/30/25(Thu)16:35:30 No.107056490

>>107056421
do you switch the system prompt like how the guide says to help with text or? i have yet to really fuck with text on it

Anonymous
10/30/25(Thu)16:35:36 No.107056491

Anonymous 10/30/25(Thu)16:35:36 No.107056491

>>107056458
Nah, your advice is just like any of the useless non-tech advice - changing distro or even browser does not accomplish anything. If it works it works and if it does not there is way to do this but sure as hell it is not by reinstalling my disks.

Anonymous
10/30/25(Thu)16:35:51 No.107056497

Anonymous 10/30/25(Thu)16:35:51 No.107056497

>>107056476
actually that doesn't mean anything, for all you know I've already released a model, but what we both know is in 2 years you don't have anything except a bitter attitude
it's truly funny I'm living rent free in your brain though

Anonymous
10/30/25(Thu)16:37:56 No.107056511

Anonymous 10/30/25(Thu)16:37:56 No.107056511

>>107056490
I treat it like chroma I also use my own system prompt I'll look into the guide again but pigeonholing it to anime only doesn't do much for me

Anonymous
10/30/25(Thu)16:38:48 No.107056518

Anonymous 10/30/25(Thu)16:38:48 No.107056518

>>107056497
for all anyone knows you haven't released anything lmao

Anonymous
10/30/25(Thu)16:39:38 No.107056525

Anonymous 10/30/25(Thu)16:39:38 No.107056525

File: 00231-1253630154.png (3.4 MB, 1456x2128)

3.4 MB PNG

Anonymous
10/30/25(Thu)16:40:42 No.107056537

Anonymous 10/30/25(Thu)16:40:42 No.107056537

>>107056518
Feel free to explain why anyone would ever attach any of their professional work to 4chan. No one releasing a model that had their name attached to it would ever link to it 4chan if they wanted to be taken serious.

Anonymous
10/30/25(Thu)16:42:58 No.107056557

Anonymous 10/30/25(Thu)16:42:58 No.107056557

File: SD35Medium_Output_8473732.jpg (2.86 MB, 1280x1536)

2.86 MB JPG

>>107055046
One more

Anonymous
10/30/25(Thu)16:43:20 No.107056560

Anonymous 10/30/25(Thu)16:43:20 No.107056560

>>107056537
how convenient
my locally trained 300B model is going great too

Anonymous
10/30/25(Thu)16:44:34 No.107056567

Anonymous 10/30/25(Thu)16:44:34 No.107056567

>>107056560
The only thing you're developing is suicidal thoughts.

Anonymous
10/30/25(Thu)16:45:59 No.107056581

Anonymous 10/30/25(Thu)16:45:59 No.107056581

File: image_00109_.jpg (393 KB, 1326x1768)

393 KB JPG

Anonymous
10/30/25(Thu)16:46:17 No.107056585

Anonymous 10/30/25(Thu)16:46:17 No.107056585

>>107056567
let's see a 1girl gen, bro, I'm sure it will be great bro, two years of improvement bro

Anonymous
10/30/25(Thu)16:46:26 No.107056586

Anonymous 10/30/25(Thu)16:46:26 No.107056586

>>107055713
This is brilliant desu. Now they just need to make it uncensored so that a guy jerking off shows a girl fingering her pussy. Then bye bye thots, any guy can become an OnlyFans whore.

Anonymous
10/30/25(Thu)16:46:55 No.107056596

Anonymous 10/30/25(Thu)16:46:55 No.107056596

>>107056581
Great stuff.

Anonymous
10/30/25(Thu)16:46:58 No.107056597

Anonymous 10/30/25(Thu)16:46:58 No.107056597

>>107056375
That's not really true at all for Yume 3.5 IMO, you can absolutely even Booru prompt it straight up as long as you leave the Gemma boilerplate properly in my experience. That the generally recommended sampling configs are both not really that good is more likely the issue for some people, DPM++ 2S Ancestral at 4.5 to 5.5 CFG gives massively better results most if the time for me. It is slower though.

Anonymous
10/30/25(Thu)16:48:12 No.107056603

Anonymous 10/30/25(Thu)16:48:12 No.107056603

>>107056597
Oh I forgot to say, that's with Linear Quadratic.

Anonymous
10/30/25(Thu)16:49:24 No.107056614

Anonymous 10/30/25(Thu)16:49:24 No.107056614

https://huggingface.co/lightx2v/Wan2.2-Distill-Models/tree/main

new models, anyone test?

Anonymous
10/30/25(Thu)16:50:22 No.107056620

Anonymous 10/30/25(Thu)16:50:22 No.107056620

>>107056491
did saving files with librewolf remember?

Anonymous
10/30/25(Thu)16:51:37 No.107056635

Anonymous 10/30/25(Thu)16:51:37 No.107056635

File: ComfyUI_00535_.png (3.2 MB, 1536x2048)

3.2 MB PNG

Lustify is pretty good for off-topic gens.

Anonymous
10/30/25(Thu)16:52:20 No.107056640

Anonymous 10/30/25(Thu)16:52:20 No.107056640

File: image_00110_.jpg (573 KB, 1336x1768)

573 KB JPG

>>107056596
ty

Anonymous
10/30/25(Thu)16:52:42 No.107056644

Anonymous 10/30/25(Thu)16:52:42 No.107056644

File: 00062-1028256704.png (2.1 MB, 1288x864)

2.1 MB PNG

Anonymous
10/30/25(Thu)16:53:10 No.107056649

Anonymous 10/30/25(Thu)16:53:10 No.107056649

>>107056597
sadly that sampler is not in neo forge for some reason but I get good luck with DPM++ 2M

Anonymous
10/30/25(Thu)16:53:25 No.107056650

Anonymous 10/30/25(Thu)16:53:25 No.107056650

>>107056640
>>107056581
huh? aren't these just film stills?

Anonymous
10/30/25(Thu)16:53:39 No.107056652

Anonymous 10/30/25(Thu)16:53:39 No.107056652

>>107056620
I does remember the last directory for images but with cum ui it does not.

Anonymous
10/30/25(Thu)16:55:15 No.107056666

Anonymous 10/30/25(Thu)16:55:15 No.107056666

File: 00234-3489883587.png (3.11 MB, 1456x2128)

3.11 MB PNG

Anonymous
10/30/25(Thu)16:55:46 No.107056670

Anonymous 10/30/25(Thu)16:55:46 No.107056670

>>107056151
>Any of you has a recommandation to generate videos for a music video? I'm looking for 16:9, some kind of 35mm grain/look, mostly still shots but with some travelling too. Theme is urban 90's/2000's workers working daily shifts.

>I only know of Veo 3 so far, and looking at Runway.
This is the local general so I'll give you advice for a model you can run on a GPU on your home computer

Your only real option for cinematic stuff is Wan 2.2 or 2.1 with the MoviiGen lora. I would recommend trying a 2.2 workflow + that Lora at 720 using a 5090, or FusionX if you choose to use 2.1

If you don't need the lack of censorship of WAN and you have money to spend, I'd just use runway for this. Higgsfield AI may also be interesting to you because they have specific stuff for music videos

Anonymous
10/30/25(Thu)16:56:20 No.107056674

Anonymous 10/30/25(Thu)16:56:20 No.107056674

File: 1758330248565777.jpg (1.96 MB, 2016x1152)

1.96 MB JPG

Anonymous
10/30/25(Thu)16:56:53 No.107056679

Anonymous 10/30/25(Thu)16:56:53 No.107056679

>>107056640
I am going to give a you tip:
watch Bram Stoker's Dracula (90's) and take couple of screenshots, there's Lucy and all that. Then img2img them. That'll be great.

Anonymous
10/30/25(Thu)16:57:14 No.107056683

Anonymous 10/30/25(Thu)16:57:14 No.107056683

File: 00024-3326562438.jpg (1.01 MB, 1560x1944)

1.01 MB JPG

Anonymous
10/30/25(Thu)16:57:35 No.107056687

Anonymous 10/30/25(Thu)16:57:35 No.107056687

>>107056614
>more i2v
I sleep

Anonymous
10/30/25(Thu)16:57:35 No.107056688

Anonymous 10/30/25(Thu)16:57:35 No.107056688

>>107056674
i wish this were me right now

Anonymous
10/30/25(Thu)16:57:57 No.107056690

Anonymous 10/30/25(Thu)16:57:57 No.107056690

Kek, I was playing the Suno side and thought local already caught up somehow

https://levo-demo.github.io/

Very disingenuous demo

Anonymous
10/30/25(Thu)16:58:59 No.107056698

Anonymous 10/30/25(Thu)16:58:59 No.107056698

>>107056674
You should put him in a van. And make him go.

Anonymous
10/30/25(Thu)16:59:52 No.107056708

Anonymous 10/30/25(Thu)16:59:52 No.107056708

>>107056670
Cool thanks

Anonymous
10/30/25(Thu)16:59:55 No.107056709

Anonymous 10/30/25(Thu)16:59:55 No.107056709

>>107056679
>Bram Stoker's Dracula
The best shots are couple of still frames from inside the film, not these tiktok screenshots etc.

Anonymous
10/30/25(Thu)17:02:23 No.107056726

Anonymous 10/30/25(Thu)17:02:23 No.107056726

>>107056690
Kek, who trained this model? It spits out Adele unprompted?

https://levo-demo.github.io/static/audio_sample/overview/04_en.mp3

It might be good. How come I've never heard of it.

Anonymous
10/30/25(Thu)17:02:52 No.107056732

Anonymous 10/30/25(Thu)17:02:52 No.107056732

File: spyro.png (1.92 MB, 1024x1024)

1.92 MB PNG

Anonymous
10/30/25(Thu)17:03:45 No.107056742

Anonymous 10/30/25(Thu)17:03:45 No.107056742

>>107056726
Is there a reason why you feel the need to talk about a unrelated subject in the thread when it can exist in it's own thread with actual documentation we can grab from the OP and all use?
Just seems odd you can't do that instead

Anonymous
10/30/25(Thu)17:03:54 No.107056744

Anonymous 10/30/25(Thu)17:03:54 No.107056744

>>107056732
bigger

Anonymous
10/30/25(Thu)17:05:01 No.107056751

Anonymous 10/30/25(Thu)17:05:01 No.107056751

File: 00065-2876646701.png (1.96 MB, 1288x864)

1.96 MB PNG

Anonymous
10/30/25(Thu)17:08:21 No.107056773

Anonymous 10/30/25(Thu)17:08:21 No.107056773

>>107056742
There's no comfy workflow, and the model seems like some experimental half trained model, what is there to talk about?

Anonymous
10/30/25(Thu)17:10:13 No.107056799

Anonymous 10/30/25(Thu)17:10:13 No.107056799

File: 00247-667236782.png (2.59 MB, 2128x1456)

2.59 MB PNG

Anonymous
10/30/25(Thu)17:12:23 No.107056819

Anonymous 10/30/25(Thu)17:12:23 No.107056819

File: image_00111_.jpg (522 KB, 1337x1768)

522 KB JPG

>>107056650
Remember this scene?

>>107056679
It's a decent movie but I would unironically remake all Keanu Reeves dialogue with AI

Anonymous
10/30/25(Thu)17:14:13 No.107056836

Anonymous 10/30/25(Thu)17:14:13 No.107056836

File: ComfyUI_00068_.png (1.1 MB, 1024x1088)

1.1 MB PNG

>>107056819
bro dont upload that scary ass shit here

Anonymous
10/30/25(Thu)17:17:21 No.107056867

Anonymous 10/30/25(Thu)17:17:21 No.107056867

File: 00077-3534728429.jpg (917 KB, 2048x2688)

917 KB JPG

I'm getting closer

Anonymous
10/30/25(Thu)17:19:46 No.107056890

Anonymous 10/30/25(Thu)17:19:46 No.107056890

File: 8845748478.png (373 KB, 639x412)

373 KB PNG

>>107056726
Tencent is actually training their own music model.

https://huggingface.co/tencent/SongGeneration

>TODOs
>Release SongGeneration-v1.5 (trained on a larger multilingual dataset, supports more languages, and integrates a Reward Model with Reinforcement Learning to enhance musicality and lyric alignment)

And the data is so copyrighted it spits out Adele unprompted as you can see on their demo. That is wild, with Qwen doing the same, my faith in China has been restored.

Anonymous
10/30/25(Thu)17:20:59 No.107056904

Anonymous 10/30/25(Thu)17:20:59 No.107056904

>>107056890
What does this have to do with image diffusion?

Anonymous
10/30/25(Thu)17:22:07 No.107056915

Anonymous 10/30/25(Thu)17:22:07 No.107056915

>>107056867
to killing urself? never been happier for u

Anonymous
10/30/25(Thu)17:22:56 No.107056926

Anonymous 10/30/25(Thu)17:22:56 No.107056926

>>107056904
There is no music thread. It's either here or /lmg/, the only two places we can discuss local models.

Anonymous
10/30/25(Thu)17:23:31 No.107056930

Anonymous 10/30/25(Thu)17:23:31 No.107056930

File: image_00112_.jpg (576 KB, 1326x1768)

576 KB JPG

Anonymous
10/30/25(Thu)17:23:58 No.107056935

Anonymous 10/30/25(Thu)17:23:58 No.107056935

>>107056904
this is local diffusion general, we accept video and audio related content here.

Anonymous
10/30/25(Thu)17:24:26 No.107056943

Anonymous 10/30/25(Thu)17:24:26 No.107056943

i for one welcome our music gen brothers

Anonymous
10/30/25(Thu)17:24:56 No.107056946

Anonymous 10/30/25(Thu)17:24:56 No.107056946

>>107056935
>Discussion of Free and Open Source Text-to-Image/Video Models
>>107056915
>>107056926
You revealed yourself go back to your containment thread

Anonymous
10/30/25(Thu)17:26:14 No.107056958

Anonymous 10/30/25(Thu)17:26:14 No.107056958

File: 00260-408042399.png (2.53 MB, 1152x2016)

2.53 MB PNG

Anonymous
10/30/25(Thu)17:28:49 No.107056983

Anonymous 10/30/25(Thu)17:28:49 No.107056983

>>107056867
closer to approaching the quality of a quantized 2gb illustrious model? maybe

Anonymous
10/30/25(Thu)17:28:50 No.107056984

Anonymous 10/30/25(Thu)17:28:50 No.107056984

>>107056946
Comfy has audio models. We should be allowed to discuss anything comfy adopts as long as it is local.

Anonymous
10/30/25(Thu)17:29:15 No.107056987

Anonymous 10/30/25(Thu)17:29:15 No.107056987

>>107056983
>>107056984
You're so fucking pathetic dude

Anonymous
10/30/25(Thu)17:30:03 No.107057002

Anonymous 10/30/25(Thu)17:30:03 No.107057002

>>107056984
Besides, good audio models are pivotal for video. Since Sora 2 it's not muted audio era anymore, the SOTA has changed, so all discussion on audio research is welcomed.

Anonymous
10/30/25(Thu)17:38:51 No.107057066

Anonymous 10/30/25(Thu)17:38:51 No.107057066

I'll give the NetaYume shill this, the model requires a whole lot of gacha but at least it has some actual variation in its outputs.

Anonymous
10/30/25(Thu)17:40:35 No.107057081

Anonymous 10/30/25(Thu)17:40:35 No.107057081

Im running comfy ai and following the guide ive been playing around with the hand and face detailer. Is there an equivalent for feet/toes? Id like to be able to fix those too.

Anonymous
10/30/25(Thu)17:41:52 No.107057093

Anonymous 10/30/25(Thu)17:41:52 No.107057093

File: 00268-426404236.png (2.29 MB, 1824x1248)

2.29 MB PNG

Anonymous
10/30/25(Thu)17:44:29 No.107057111

Anonymous 10/30/25(Thu)17:44:29 No.107057111

>>107056597
Some are better than others, clearly, but IMHO much of sampler/scheduler choice is subjective. The latter moreso than the former in my estimation.

Anonymous
10/30/25(Thu)17:44:30 No.107057112

Anonymous 10/30/25(Thu)17:44:30 No.107057112

File: image_00115_.jpg (316 KB, 1520x1040)

316 KB JPG

Anonymous
10/30/25(Thu)17:45:21 No.107057119

Anonymous 10/30/25(Thu)17:45:21 No.107057119

>>107057066
You haven't taken any steps to learn the model and it shows, Why not explore something before going on multi day complaints?

Anonymous
10/30/25(Thu)17:46:50 No.107057137

Anonymous 10/30/25(Thu)17:46:50 No.107057137

>>107057119
I barely post in this thread, you're tilting at the wrong windmill friend. And I'm saying I like the model, I get better results out of it for the particular thing I'm prompting than I get out of the other boomerprompt models.

Anonymous
10/30/25(Thu)17:47:58 No.107057145

Anonymous 10/30/25(Thu)17:47:58 No.107057145

>>107057137
Anything to show?
There have been this constant wave of anons that complain about this model but don't post anything. I know you're just wasting time but take your low skill ass to one of the other threads

Anonymous
10/30/25(Thu)17:48:44 No.107057155

Anonymous 10/30/25(Thu)17:48:44 No.107057155

File: 00277-1703380660.png (2.65 MB, 1248x1824)

2.65 MB PNG

Anonymous
10/30/25(Thu)17:49:03 No.107057157

Anonymous 10/30/25(Thu)17:49:03 No.107057157

>>107057066
>the model requires a whole lot of gacha
Describe the poses/gestures better

Anonymous
10/30/25(Thu)17:50:09 No.107057165

Anonymous 10/30/25(Thu)17:50:09 No.107057165

>>107057157
"Face and proportions that don't look weird"
>>107057145
>take your low skill ass to one of the other threads
OK

Anonymous
10/30/25(Thu)17:50:33 No.107057172

Anonymous 10/30/25(Thu)17:50:33 No.107057172

>>107057165
Fuck off now thanks!

Anonymous
10/30/25(Thu)17:51:11 No.107057175

Anonymous 10/30/25(Thu)17:51:11 No.107057175

File: 00177-1107305360.png (3.13 MB, 1280x1920)

3.13 MB PNG

Anonymous
10/30/25(Thu)17:52:48 No.107057196

Anonymous 10/30/25(Thu)17:52:48 No.107057196

>>107057175
illustrious 2gb?

Anonymous
10/30/25(Thu)17:53:57 No.107057206

Anonymous 10/30/25(Thu)17:53:57 No.107057206

File: illunoob vs netayume 2.png (2.44 MB, 2560x1256)

2.44 MB PNG

why is netayume so sloppy bros??

Anonymous
10/30/25(Thu)17:54:57 No.107057213

Anonymous 10/30/25(Thu)17:54:57 No.107057213

*yawn*

Anonymous
10/30/25(Thu)17:56:06 No.107057227

Anonymous 10/30/25(Thu)17:56:06 No.107057227

>Mindbroken because hen ever made anything good in his life

Anonymous
10/30/25(Thu)17:58:51 No.107057255

Anonymous 10/30/25(Thu)17:58:51 No.107057255

>>107055177
it's already there

Anonymous
10/30/25(Thu)18:01:27 No.107057278

Anonymous 10/30/25(Thu)18:01:27 No.107057278

File: dmmg_0165.png (1.35 MB, 832x1216)

1.35 MB PNG

>>107054935
both

Anonymous
10/30/25(Thu)18:04:11 No.107057303

Anonymous 10/30/25(Thu)18:04:11 No.107057303

>>107057227
damn you melting so hard you cant even spell yumebro

Anonymous
10/30/25(Thu)18:05:40 No.107057317

Anonymous 10/30/25(Thu)18:05:40 No.107057317

File: wong_01.png (3.43 MB, 2048x1536)

3.43 MB PNG

https://www.youtube.com/watch?v=xboXFT46XSo

Anonymous
10/30/25(Thu)18:05:56 No.107057320

Anonymous 10/30/25(Thu)18:05:56 No.107057320

>>107057111
DPM++ 2S Ancestral is pretty objectively better than Res Multistep at least for details like hands and text, using Linear Quadratic for both, I'd say at least

Anonymous
10/30/25(Thu)18:06:27 No.107057327

Anonymous 10/30/25(Thu)18:06:27 No.107057327

File: 1745324974262853.jpg (1.51 MB, 2016x1152)

1.51 MB JPG

>>107054935
You typically don't need more than 25 steps. Most of my 50 step outputs have been either a sidegrade or even a downgrade in terms of quality.
Don't forget that chroma can gen pics above 1024 dimensions.

Anonymous
10/30/25(Thu)18:07:22 No.107057334

Anonymous 10/30/25(Thu)18:07:22 No.107057334

>>107057172
It's clearly the same fairly bad troll as yesterday, he's blatantly ragebaiting

Anonymous
10/30/25(Thu)18:09:34 No.107057352

Anonymous 10/30/25(Thu)18:09:34 No.107057352

>>107057334
yeah I agree fellow yumebro, theres totally not a vast majority of people that find this model trash

Anonymous
10/30/25(Thu)18:10:22 No.107057355

Anonymous 10/30/25(Thu)18:10:22 No.107057355

>>107057334
It's the same retard from the rentry, he spends his entire life doing this for years and is just reduced to a bitter faggot.

Anonymous
10/30/25(Thu)18:20:33 No.107057391

Anonymous 10/30/25(Thu)18:20:33 No.107057391

>>107057206
kek yeah that poster was an idiot
>>107057327
nice

Anonymous
10/30/25(Thu)18:23:58 No.107057408

Anonymous 10/30/25(Thu)18:23:58 No.107057408

>>107057352
You're right, there's in fact not a vast majority of such people

Anonymous
10/30/25(Thu)18:29:07 No.107057457

Anonymous 10/30/25(Thu)18:29:07 No.107057457

File: Chroma1-Radiance-v0.4.saf(...).png (1.81 MB, 1152x896)

1.81 MB PNG

>>107057327
NTA. Your pic is neat af. This is also 25 steps

Anonymous
10/30/25(Thu)18:30:46 No.107057474

Anonymous 10/30/25(Thu)18:30:46 No.107057474

>>107057457
Oh, this is neat too, how's radiance compared to DC-2K?

Anonymous
10/30/25(Thu)18:33:13 No.107057497

Anonymous 10/30/25(Thu)18:33:13 No.107057497

File: image_00127_.jpg (669 KB, 1336x1768)

669 KB JPG

Anonymous
10/30/25(Thu)18:34:21 No.107057505

Anonymous 10/30/25(Thu)18:34:21 No.107057505

File: Chroma1-Radiance-v0.4.saf(...).png (1.01 MB, 1152x896)

1.01 MB PNG

>>107057474
>How's radiance compared to DC-2K?
Couldn't tell you, but I loved the 2k debug ones. There's still a lack of blending the macro pixels but it's mostly good

Anonymous
10/30/25(Thu)18:35:26 No.107057515

Anonymous 10/30/25(Thu)18:35:26 No.107057515

File: 1750563092507017.jpg (1.62 MB, 1248x1824)

1.62 MB JPG

Anonymous
10/30/25(Thu)18:36:06 No.107057521

Anonymous 10/30/25(Thu)18:36:06 No.107057521

File: ComfyUI_00559_.png (3.69 MB, 2048x1536)

3.69 MB PNG

Anonymous
10/30/25(Thu)18:37:20 No.107057530

Anonymous 10/30/25(Thu)18:37:20 No.107057530

>>107057521
Cinematic Redmond is great.

Anonymous
10/30/25(Thu)18:40:36 No.107057553

Anonymous 10/30/25(Thu)18:40:36 No.107057553

File: Chroma1-Radiance-v0.4.saf(...).png (2.39 MB, 1152x896)

2.39 MB PNG

Anonymous
10/30/25(Thu)18:44:03 No.107057588

Anonymous 10/30/25(Thu)18:44:03 No.107057588

File: Chroma1-Radiance-v0.4.saf(...).png (2.12 MB, 1152x896)

2.12 MB PNG

Anonymous
10/30/25(Thu)18:44:09 No.107057589

Anonymous 10/30/25(Thu)18:44:09 No.107057589

File: image_00128_.jpg (558 KB, 1336x1768)

558 KB JPG

>>107057497
"The lighting is even with no strong shadows." compared to "Cinematic lighting, dark background, deep shadows, detailed skin. Sharp HDR."

>>107057515
>>107057521
very cool

Anonymous
10/30/25(Thu)18:44:17 No.107057591

Anonymous 10/30/25(Thu)18:44:17 No.107057591

File: 1360212403.png (1.5 MB, 832x1216)

1.5 MB PNG

Anonymous
10/30/25(Thu)18:44:28 No.107057595

Anonymous 10/30/25(Thu)18:44:28 No.107057595

File: Ted.png (3.6 MB, 2048x1536)

3.6 MB PNG

Anonymous
10/30/25(Thu)18:46:04 No.107057609

Anonymous 10/30/25(Thu)18:46:04 No.107057609

>>107057589
https://www.youtube.com/watch?v=ZEWGyyLiqY4

Anonymous
10/30/25(Thu)18:46:31 No.107057615

Anonymous 10/30/25(Thu)18:46:31 No.107057615

>>107054482
>32b
Mostly useless for local. Viable for use with quantization, especially nunchaku, but LoRA training will be a nightmare, and a model without low cost LoRA training is pointless beyond ten minutes of novelty use.

Anonymous
10/30/25(Thu)18:51:22 No.107057648

Anonymous 10/30/25(Thu)18:51:22 No.107057648

>>107054248
From the final pretrained model we haven't seen any samples, but this is as it was training

J-pop song
https://vocaroo.com/19CHG4V410OP

Some pop song
https://vocaroo.com/1i7OjKcLbmnO

Some opera song
https://vocaroo.com/1f64Fkmpn9Ax

Idk, maybe with SFT phase it'll catch up to where it needs to be, but those outputs are very underwhelming. Just a bit concerning, but I don't know jack shit about these models.

Anonymous
10/30/25(Thu)18:52:14 No.107057655

Anonymous 10/30/25(Thu)18:52:14 No.107057655

File: 00093-3495012195-ad-before.jpg (1.26 MB, 2480x3072)

1.26 MB JPG

Slowly getting it together still need to learn composition better

Anonymous
10/30/25(Thu)18:52:30 No.107057657

Anonymous 10/30/25(Thu)18:52:30 No.107057657

What was that feature of comfyui that was being advertised a while ago where you bundle a bunch of nodes together and then you can re-use that as one node?

did this ever actually happen?

Anonymous
10/30/25(Thu)18:54:22 No.107057671

Anonymous 10/30/25(Thu)18:54:22 No.107057671

>>107057657
subgraphs? didn't really change anything and was kind of a letdown. the node implementation in general is lacking too much and everything done to the front end has been lipstick on a pig

Anonymous
10/30/25(Thu)18:54:44 No.107057672

Anonymous 10/30/25(Thu)18:54:44 No.107057672

>>107057657
subgraphs?
they're pretty great to clean up wf and only see what you actually need to see

Anonymous
10/30/25(Thu)18:54:56 No.107057677

Anonymous 10/30/25(Thu)18:54:56 No.107057677

>https://huggingface.co/meituan-longcat/LongCat-Video
>We introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across Text-to-Video, Image-to-Video, and Video-Continuation generation tasks. It particularly excels in efficient and high-quality long video generation, representing our first step toward world models.
Anyone tried it? Works with KJ wanvideowrapper

Anonymous
10/30/25(Thu)18:55:30 No.107057686

Anonymous 10/30/25(Thu)18:55:30 No.107057686

>>107057589
I had an antiquated gpu. But jesus, the boost even SDXL has gotten in terms noise... Sounds like a faggotry.

Anonymous
10/30/25(Thu)18:55:49 No.107057690

Anonymous 10/30/25(Thu)18:55:49 No.107057690

File: Chroma1-Radiance-v0.4.saf(...).png (1.6 MB, 1152x896)

1.6 MB PNG

>>107057615
>LoRA training will be a nightmare
ostrisai's trainer has supported 3bit quants for a while now.. wouldn't that be sub-16gb? https://xcancel.com/ostrisai/status/1953933728948121838

Anonymous
10/30/25(Thu)18:56:38 No.107057700

Anonymous 10/30/25(Thu)18:56:38 No.107057700

anyone knows if infinite talk works with wan2.2, or is it just for 2.1?

Anonymous
10/30/25(Thu)18:57:47 No.107057713

Anonymous 10/30/25(Thu)18:57:47 No.107057713

>>107057589
what model is this?

Anonymous
10/30/25(Thu)18:58:04 No.107057718

Anonymous 10/30/25(Thu)18:58:04 No.107057718

>>107057677
some onions have been trying it out. doesn't look that much different from context window jerkiness after every 5 seconds

Anonymous
10/30/25(Thu)18:59:05 No.107057731

Anonymous 10/30/25(Thu)18:59:05 No.107057731

>>107057718
>onions
anons filters to onions sometimes or something? the more you know I guess

Anonymous
10/30/25(Thu)19:00:14 No.107057744

Anonymous 10/30/25(Thu)19:00:14 No.107057744

File: image_00132_.jpg (754 KB, 1336x1768)

754 KB JPG

>>107057713
Chroma-DC-2K-T2-SL4-Q8_0

Anonymous
10/30/25(Thu)19:01:04 No.107057754

Anonymous 10/30/25(Thu)19:01:04 No.107057754

File: ComfyUI_00569_.png (3.22 MB, 2048x1536)

3.22 MB PNG

Anonymous
10/30/25(Thu)19:05:25 No.107057788

Anonymous 10/30/25(Thu)19:05:25 No.107057788

File: 31925346.jpg (13 KB, 460x460)

13 KB JPG

>python?
>no, that shit is gay

Anonymous
10/30/25(Thu)19:07:48 No.107057810

Anonymous 10/30/25(Thu)19:07:48 No.107057810

>>107057788
based chink

Anonymous
10/30/25(Thu)19:08:14 No.107057813

Anonymous 10/30/25(Thu)19:08:14 No.107057813

File: 00097-493591130.jpg (1.13 MB, 2688x2688)

1.13 MB JPG

>>107054044
slowly but surely, mistakes were made just need to adjust values

Anonymous
10/30/25(Thu)19:09:28 No.107057823

Anonymous 10/30/25(Thu)19:09:28 No.107057823

>>107057813
Thank you Ran. You wanted some attention.

Anonymous
10/30/25(Thu)19:09:47 No.107057829

Anonymous 10/30/25(Thu)19:09:47 No.107057829

>>107057648
It's impressive to see the vocals don't sound anywhere near as robotic as original ACE-Step though. If they catch up to Suno 4.5 maybe there's a chance of getting Udio tier kino now and then.

Anonymous
10/30/25(Thu)19:10:13 No.107057836

Anonymous 10/30/25(Thu)19:10:13 No.107057836

>>107057813
you can't adjust values if you are worthless

Anonymous
10/30/25(Thu)19:10:17 No.107057838

Anonymous 10/30/25(Thu)19:10:17 No.107057838

>>107057655
>>107057813
the painted nails are nice

Anonymous
10/30/25(Thu)19:12:40 No.107057860

Anonymous 10/30/25(Thu)19:12:40 No.107057860

File: image_00135_.jpg (916 KB, 1336x1768)

916 KB JPG

Anonymous
10/30/25(Thu)19:12:56 No.107057861

Anonymous 10/30/25(Thu)19:12:56 No.107057861

File: 1755210653857745.jpg (606 KB, 1769x1111)

606 KB JPG

So far I've been using the Wan 2.1 workflow from the rentry but wanted to try out 2.2 from here: https://civitai.com/models/1818841/wan-22-workflow-t2v-i2v-t2i-kijai-wrapper (2.2 I2V)
Why isn't it recognizing the vae? Everything looks correct to me, straight dragging the vae output from the loader to the decoder doesn't do anything either

Anonymous
10/30/25(Thu)19:15:10 No.107057881

Anonymous 10/30/25(Thu)19:15:10 No.107057881

File: 1761859945112321.png (321 KB, 500x654)

321 KB PNG

Anonymous
10/30/25(Thu)19:16:22 No.107057893

Anonymous 10/30/25(Thu)19:16:22 No.107057893

>>107057861
it's not connected to the decode node, pull the string from the vae loader to the decode node to connect them

Anonymous
10/30/25(Thu)19:17:49 No.107057912

Anonymous 10/30/25(Thu)19:17:49 No.107057912

File: jager.jpg (1.92 MB, 2048x1536)

1.92 MB JPG

https://www.youtube.com/watch?v=Gu3TAuw3ZJ8

Anonymous
10/30/25(Thu)19:19:13 No.107057915

Anonymous 10/30/25(Thu)19:19:13 No.107057915

>>107057861
If it's not bait, as it's probably is, but it can be useful for newfags, use the example workflow instead and just load the correct models :
https://raw.githubusercontent.com/Comfy-Org/workflow_templates/refs/heads/main/templates/video_wan2_2_14B_i2v.json

Anonymous
10/30/25(Thu)19:19:52 No.107057917

Anonymous 10/30/25(Thu)19:19:52 No.107057917

File: 001.jpg (855 KB, 2048x2048)

855 KB JPG

1girl

Anonymous
10/30/25(Thu)19:20:53 No.107057922

Anonymous 10/30/25(Thu)19:20:53 No.107057922

>>107057823
*MrCatJak

Anonymous
10/30/25(Thu)19:21:09 No.107057924

Anonymous 10/30/25(Thu)19:21:09 No.107057924

>>107056890
need them to train a speech model with emotion prompting so we can be freed from the dead end known as vibevoice

Anonymous
10/30/25(Thu)19:21:13 No.107057925

Anonymous 10/30/25(Thu)19:21:13 No.107057925

>>107055149
That's what I want to hear.
Start a group, delegate simpler tasks to me, such as some manual captioning, and I'll contribute $250 toward training.
The only catch is that you share the training process and I get to ask a few technical questions.
We can find 20 others; there are plenty of interested people out there.
I don't care if it's a failure.

Anonymous
10/30/25(Thu)19:24:03 No.107057937

Anonymous 10/30/25(Thu)19:24:03 No.107057937

>>107057917
I wonder if krea has a buttchin obsession too

Anonymous
10/30/25(Thu)19:26:21 No.107057949

Anonymous 10/30/25(Thu)19:26:21 No.107057949

File: image_00137_.jpg (686 KB, 1336x1768)

686 KB JPG

Anonymous
10/30/25(Thu)19:26:51 No.107057950

Anonymous 10/30/25(Thu)19:26:51 No.107057950

>>107057922
You want to suck off people.

Anonymous
10/30/25(Thu)19:29:16 No.107057959

Anonymous 10/30/25(Thu)19:29:16 No.107057959

>>107057922
What took you so long, please offer your asshole.

Anonymous
10/30/25(Thu)19:32:19 No.107057980

Anonymous 10/30/25(Thu)19:32:19 No.107057980

>>107057655
hot

Anonymous
10/30/25(Thu)19:33:12 No.107057988

Anonymous 10/30/25(Thu)19:33:12 No.107057988

>>107057949
is that...

Anonymous
10/30/25(Thu)19:37:08 No.107058024

Anonymous 10/30/25(Thu)19:37:08 No.107058024

File: 002.jpg (885 KB, 2048x2048)

885 KB JPG

elongated 1girl

Anonymous
10/30/25(Thu)19:37:33 No.107058031

Anonymous 10/30/25(Thu)19:37:33 No.107058031

File: 00101-147928662.jpg (1.12 MB, 2688x2688)

1.12 MB JPG

>>107057838
Thanks, I think I'm getting the hang of this model now, the hardest part is finding the right bled of tags for a presentable image followed by adjustments, starting to feel like 60 steps is the magic number with this model. I wish neoforge had all the samplers I don't know why he took some away.
One thing I like with this model is I can game due to how little vram it uses compared to chroma
Sorry but I have a dedicated sperg that hates me and has been holding a grudge for years as well just ignore him

Anonymous
10/30/25(Thu)19:39:57 No.107058049

Anonymous 10/30/25(Thu)19:39:57 No.107058049

>>107058031
?

Anonymous
10/30/25(Thu)19:42:30 No.107058068

Anonymous 10/30/25(Thu)19:42:30 No.107058068

>>107058031
I respected you for years but not any longer. Seems like you are just spiteful.

Anonymous
10/30/25(Thu)19:43:37 No.107058084

Anonymous 10/30/25(Thu)19:43:37 No.107058084

>disabled retard noises

Anonymous
10/30/25(Thu)19:43:42 No.107058085

Anonymous 10/30/25(Thu)19:43:42 No.107058085

>>107057671
>>107057672
Thanks, I've started using subgraphs but I can't figure out how to make all subgraphs reflect each other's changes when I edit one of them. Any ideas? I would expect them to work like Scenes in Godot.

Anonymous
10/30/25(Thu)19:44:22 No.107058092

Anonymous 10/30/25(Thu)19:44:22 No.107058092

>>107058084
Why do you refer in 3rd person?

Anonymous
10/30/25(Thu)19:45:40 No.107058104

Anonymous 10/30/25(Thu)19:45:40 No.107058104

>>107058031
>ran wanted to come out
He manages to spit out a narcissist rant.

Anonymous
10/30/25(Thu)19:47:54 No.107058130

Anonymous 10/30/25(Thu)19:47:54 No.107058130

File: file.png (82 KB, 859x979)

82 KB PNG

>>107058085
clone it

Anonymous
10/30/25(Thu)19:51:22 No.107058160

Anonymous 10/30/25(Thu)19:51:22 No.107058160

>>107058085
>I would expect them to work like Scenes in Godot.
there are a lot of expectations from modern nodegraphs and comfyui ducks up 90% of what's standard

Anonymous
10/30/25(Thu)19:55:21 No.107058193

Anonymous 10/30/25(Thu)19:55:21 No.107058193

>>107058130
ah, I re-cloned and the clone is working now! I guess I must've cloned too early before, or there was a bug, which caused my clones to become unique (and were no longer clones).

Anonymous
10/30/25(Thu)19:57:23 No.107058204

Anonymous 10/30/25(Thu)19:57:23 No.107058204

File: 1743883744534508.png (2.93 MB, 1416x1888)

2.93 MB PNG

Anonymous
10/30/25(Thu)19:58:09 No.107058213

Anonymous 10/30/25(Thu)19:58:09 No.107058213

>>107055297
>pleeeeeease novel ai, i need the model files, my local model is kinda noisy

Anonymous
10/30/25(Thu)19:58:19 No.107058215

Anonymous 10/30/25(Thu)19:58:19 No.107058215

>>107058193
if you duplicate you get separated entities, and if you clone you get tied ones

Anonymous
10/30/25(Thu)19:58:25 No.107058217

Anonymous 10/30/25(Thu)19:58:25 No.107058217

>>107058024
Buffy x slenderman

Anonymous
10/30/25(Thu)20:06:03 No.107058274

Anonymous 10/30/25(Thu)20:06:03 No.107058274

File: 00110-1547100464.jpg (1.19 MB, 2688x2688)

1.19 MB JPG

Yeah I need to make loras for this model, it's the boost I needed, it should also be pretty fast compared to training chroma,

Anonymous
10/30/25(Thu)20:07:49 No.107058287

Anonymous 10/30/25(Thu)20:07:49 No.107058287

>>107057893
doing that just crashed comfyui
>>107057915
not bait, I'm just a bit of a brainlet when it comes to this but your workflow works fine, thanks

Anonymous
10/30/25(Thu)20:11:41 No.107058307

Anonymous 10/30/25(Thu)20:11:41 No.107058307

>>107058274
Netayume is fucking garbage, holy shit

Anonymous
10/30/25(Thu)20:14:33 No.107058315

Anonymous 10/30/25(Thu)20:14:33 No.107058315

File: ComfyUI_00079_.mp4 (640 KB, 480x832)

640 KB MP4

>>107057744
>Chroma-DC-2K-T2-SL4-Q8_0
nta, nice gens, with lora?

Anonymous
10/30/25(Thu)20:15:12 No.107058318

Anonymous 10/30/25(Thu)20:15:12 No.107058318

Netayume is fucking trash and just having it write some text that looks like its done in paint doesnt make it redeemable

Chroma for complex stuff and illustrious for hentai is the way to go

Anonymous
10/30/25(Thu)20:21:34 No.107058339

Anonymous 10/30/25(Thu)20:21:34 No.107058339

uh oh meltie

Anonymous
10/30/25(Thu)20:24:14 No.107058352

Anonymous 10/30/25(Thu)20:24:14 No.107058352

>>107057457
Shit I didnt notice you replied to me

>25 steps is enough
Thanks for the heads up boss, chroma fp16 with fp16 text encoder doesnt run all that slow on my 5060ti 16gb if I keep it under 30 steps

Anonymous
10/30/25(Thu)20:24:28 No.107058355

Anonymous 10/30/25(Thu)20:24:28 No.107058355

>>107058315
Yeah, uploading to civitai right now

Anonymous
10/30/25(Thu)20:26:44 No.107058370

Anonymous 10/30/25(Thu)20:26:44 No.107058370

File: Screenshot 2025-10-31 002207.jpg (26 KB, 371x276)

26 KB JPG

does using this node lead to loss in quality?

Anonymous
10/30/25(Thu)20:32:25 No.107058403

Anonymous 10/30/25(Thu)20:32:25 No.107058403

>>107058370
not anything visible

Anonymous
10/30/25(Thu)20:34:11 No.107058408

Anonymous 10/30/25(Thu)20:34:11 No.107058408

You can still download Udio songs on the fly as 320kbps btw, just downloaded a couple of bangers. No need to record or anything like that.

Anonymous
10/30/25(Thu)20:38:16 No.107058432

Anonymous 10/30/25(Thu)20:38:16 No.107058432

File: 1748063353105752.png (3.31 MB, 1416x1888)

3.31 MB PNG

can a pitfag rate this pit for me

Anonymous
10/30/25(Thu)20:39:56 No.107058441

Anonymous 10/30/25(Thu)20:39:56 No.107058441

>>107058408
from what I read they're limited to 192kbps mp3
I'm getting everything in bulk I saved there

Anonymous
10/30/25(Thu)20:40:38 No.107058444

Anonymous 10/30/25(Thu)20:40:38 No.107058444

>>107058432
pits fine but smaller boobs would be more harmonious

Anonymous
10/30/25(Thu)20:44:37 No.107058469

Anonymous 10/30/25(Thu)20:44:37 No.107058469

File: 250915-171020-wan-i2v-2xr(...).mp4 (2.54 MB, 1280x1280)

2.54 MB MP4

>>107058432
6/10
I prefer mine like this

Anonymous
10/30/25(Thu)20:46:06 No.107058474

Anonymous 10/30/25(Thu)20:46:06 No.107058474

>>107058432
Its nuts that i immediately spot a netayume pic every time since it looks so off

Anonymous
10/30/25(Thu)20:47:19 No.107058482

Anonymous 10/30/25(Thu)20:47:19 No.107058482

Fresh

>>107058480
>>107058480
>>107058480

Fresh

Anonymous
10/30/25(Thu)20:50:05 No.107058499

Anonymous 10/30/25(Thu)20:50:05 No.107058499

>>107058441
Yeah dunno, it's quite strange.

Was able to download a few of them at 320kbps with fetchv, like https://www.udio.com/songs/hoCg4BmayTYXcJfjo4jvbT

But other ones are only 192kbps. Maybe for some reason some of them stream at 320kbps, while other ones don't?

Anonymous
10/31/25(Fri)00:54:27 No.107059859

Anonymous 10/31/25(Fri)00:54:27 No.107059859

I see people making AI images of trump and stuff like that.
But some of the stuff is definitely better than others.
Is there any way to set it up such that whatever prompt I give, the character is strictly that one character?
I mean, not just simply typing in the name of the character but making it more realistic?
Like, in a way such that even when I make anime or caricature images, it seems like some professional artist drew that based on the likeness of the person?
I don't know much about the loras and such, that's why I ask.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.