[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107049284

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Neta Yume (Lumina 2)
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd
https://gumgum10.github.io/gumgum.github.io/
https://neta-lumina-style.tz03.xyz/
https://huggingface.co/neta-art/Neta-Lumina

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
File: 1756576992113833.png (274 KB, 1827x602)
274 KB
274 KB PNG
Imagine the kino level of shitpost if we really get suno 4.5 at home
>>
>all chroma shit
>probably own gens too
kys OP frfr
>>
>>107054061
>rainbow
>woman avatar
....... is he cooking?
>>
>>107054106
>woman
I think it's a man avatar, the hair is short
>>
>>107054061
If it was so easy suno would have been as good as udio long ago, but udio was always better, so I have my doubt that local can be as good.
>>
File: dmmg_0087.png (1.19 MB, 832x1216)
1.19 MB
1.19 MB PNG
>>107054015
gonna depend on the model, lora strength, captions etc. this is generally the reason why you use nonstandard words to invoke the lora to avoid confusion in the model.
>>
File: file.png (2.21 MB, 1472x1136)
2.21 MB
2.21 MB PNG
>>
>>107054121
this, suno is overrated as fuck, they always pretend it's at the same level as udio when it's definitely not
>>
>caring about the fagollage
>>
Pay debo no mind he's disabled
>>
>>107054047
https://vocaroo.com/1lhI4LNQojvT
Udio 1.0 of course.
>>
>>107054183
udio is amazing desu
>>
>>107054061
I listened to his samples, they are struggling to get quality of YuE even.
>>
So wan q8 gguf is like 95% of as good as fp16 and a little faster?
>>
>>107054183
meh
>>
>>107054221
show some of those samples here anon, I don't want to go to trooncord
>>
>>107054227
with a little optimism and cope
>>
>>107054227
yeah, the quality is really equivalent and it's 2x lighter in terms of size
>>
>>107054227
I have the same speed almost with 16 vs q8 on a 5090, I just block swap half the model.
>>
>>107054227
Half the precision, half as good
But being able to run it makes you twice as blind
>>
File: ComfyUI_temp_hqdve_00082_.png (1.71 MB, 1328x1328)
1.71 MB
1.71 MB PNG
>>
File: file.png (182 KB, 760x715)
182 KB
182 KB PNG
Contrastive flow matching is tight.
>>
>>107054444
>Contrastive flow matching
what is that?
>>
File: ComfyUI_08092_.png (2.04 MB, 1152x1152)
2.04 MB
2.04 MB PNG
>>107054206
Yes, I really want to hope that local will catch up but that seems like a leap from nothing, not even SD, to a Dalle 3 tier music model. High quality manually captioned audio data is probably a must for such results, and then really good DPO process.
>>
opensource emu3.5 with 32b, which according to the authors is supposed to be superior to nano banana in every way.
Looking at the sample images, I have my doubts about
>>
>>107054448
It's a new version of flow matching that encourages the model to find unique paths which speeds up convergence, gives shaper results because it's not blending paths, and also encourages diverse results (because paths aren't blended).
>>
>>107054492
I see, and I guess you're using that method to make a lora right?
>>
File: file.png (119 KB, 258x265)
119 KB
119 KB PNG
>>107054498
I'm using the method to train a 1.5B model from scratch.
>>
>>107054444
test finetune or what
>>
>>107054482
its chinkslop. if its not good they lie and say it is. if it is good they make you pay for it. the good thing about china isnt that its better, its that its cheaper
>>
>>107054508
Tell us more. This sounds interesting.
>>
>>107054508
nice, how much faster is it compared to the previous method?
>>
>>107054508
i believe in bigma
>>
>>107054482
even if it's true, 32b is just too big
>>
>>107054518
It's just a revision of the Pixart model I'm working on. 1.5B, Pixart architecture with the HDM mlp, and Ostris's 16 channel VAE.

>>107054524
Insanely fast compared to MSE, doing like 0.02 loss per day which means a full from scratch model on a single 5090 in 50 days.
>>
File: 1748619460298122.png (813 KB, 1920x1080)
813 KB
813 KB PNG
>>107054531
>is just too big
>>
>>107054531
32b for a model claiming to do everything they claim it can is quite impressive actually. the problem is benchmarks don’t hold up against reality.
>>
>>107054444
Based quads
>>
>>107054482
>according to the authors
And why should I believe authors this time? They lie as often as the common whore.
>>
>>107054564
>And why should I believe authors this time?
you should never believe them, like everything you test it out by yourself and see that at 95% of the time it's a big nothingburger
>>
>>107054542
Pretty cool tinkering. What sort of database are you using?
>>
>>107054564
My text consisted of two parts. Why are you asking me something that I answer in the second sentence?
>>
File: ComfyUI_08094_.png (2.05 MB, 1152x1152)
2.05 MB
2.05 MB PNG
>>107054482
It's unfortunate that from Chinese all we get are models on par with Seedream but they are made out to be something more. I'll give them props for catching up to Seedream by using Seedream based synthetic slop in conjunction with 4o though.
>>
File: file.png (154 KB, 771x712)
154 KB
154 KB PNG
>>107054542
Sexy loss graph. The new training run is switching from a 3d perlin noise Automagic warm up to AdamW.

>>107054591
I scraped millions of images from duckduckgo but I also have e621, danbooru and gelbooru (200k images). Then a lot of lets plays from Youtube for games. A few movie screencaps from stuff I have / famous movies. It's generally pop culture and art centered.
>>
>>107054636
so like the dark blue was the normal method and the light blue is the constrative flow thing?
>>
Weird how no local models can break into the top 10 anymore in arenas. Shame they fell so far behind
>>
File: ComfyUI_08093_.png (1.74 MB, 1152x1152)
1.74 MB
1.74 MB PNG
>>107054629
Arguably, Qwen already did that though. We are stuck getting same model over and over again. One slightly more fancy than the previous.
>>
>>107054642
No dark blue was me experimenting with a Automagic idea I had which basically was activating layers and parameters with 3d perlin noise much like how wind is simulated in a video game. Light blue is AdamW which is much faster than the other optimizer but I think the Automagic warmup was probably helpful especially for forcing the layers and parameters to be randomly and usefully activated.
>>
>>107054636
>I scraped millions of images from duckduckgo
Yandex used to be so good for this. Now it barely works as image search.
>>
>>107054706
Unfortunately all search results are fucking garbage now because AI bullshit is everywhere. I scraped early last year before the flood.
>>
>>107054636
retard here, what exactly does loss mean in this context?
>>
>>107054715
I think to be sure you should scrap every images before 2022, it's the last year before the AI flood
>>
>>107054725
the loss is like the error between the image you trained and the recreation of the model, the goal is to get the lower loss possible
>>
TTS with voice cloning capabilities that you can set up by using docker compose with the relevant options for your system, it comes with an api and GUI set up too.

https://github.com/devnen/Chatterbox-TTS-Server
>>
>>107054725
0 loss means the model’s prediction perfectly matches the objective, denoising an image or finding a flow. In practice this is catastrophic failure and memorization if you ever get to 0. Normally for diffusion models (e.g. SDXL) loss is how well the image is denoised. Models like Flux have loss based on finding paths in latent space to the image. Contrastive flow adds an additional objective that paths must be unique.
>>
>>107054657
you forgot slower every iteration
>>
File: ComfyUI_00001_.png (871 KB, 1024x1024)
871 KB
871 KB PNG
Hello people, its my first time using chroma, how can I get my pics to be higher quality? Higher steps or more negative prompts?
>>
>>107054636
Cool stuff. It's a great learning experience.
>>
File: ComfyUI_08102_.png (1.57 MB, 1152x1152)
1.57 MB
1.57 MB PNG
>>107054706
Yeah, quite sad what they did to it.
>>
File: SD35Medium_Output_262662.png (3.31 MB, 1280x1536)
3.31 MB
3.31 MB PNG
>>107053778
3.5 Medium was stronger at the top end of its resolution range than the lower end TBQH, like generating at e.g. 1216x1600 would very often be way more coherent and better looking than 832x1216 on the same seed. So for a high res use case like this it might make sense especially if they've tuned it any past the base model. Attached pic is a native 1280x1536 one-shot Medium gen, for example.
>>
>>107054736
>>107054902
i see
interesting and makes me wish i wasn't a retard
>>
>>107054980
It's not that complicated and all the smart people already did the math for you. I'm half a retard and I just experiment with bleeding edge research other people have already done. Really to get into any of this you just have to drop into it and be willing to sweat, a lot of this shit is just tedious churning, especially captioning. For example I'm working on finetuning Joycaption to be better which requires handcaptioning thousands of images.
>>
>>107054947
It's AI filtered walled garden.
Speculation wise, maybe Cuckflare and others restricted their web crawlers because of the geopolitical incidents. It's okay because Google can leech everything.
>>
>>107054902
>Contrastive flow adds an additional objective that paths must be unique.
that's quite a smart idea when you think about it, it forces the model to not be lazy and work on every edge cases
>>
File: 1730723114519001.jpg (1.49 MB, 2016x1152)
1.49 MB
1.49 MB JPG
>>
File: SD35Medium_Output_75544.png (3.32 MB, 1600x1216)
3.32 MB
3.32 MB PNG
>>107054969
And this one's 1600x1216
>>
>>107055046
illustrious is the peak of local diffusion. no other model comes close in terms of character knowledge, style knowledge, concept knowledge, and overall fidelity. Chroma is a disgusting blurry mangled mess, neta knows a fraction of the styles, qwen is a bloated stopgap that can’t even compete with seedream 3. SDXL is an absolute triumph and will likely not be surpassed for years
>>
File: 00011-145286813.jpg (1.11 MB, 2000x2000)
1.11 MB
1.11 MB JPG
>>
>>107055077
depends on the scope of the finetune dataset. they'll probably manage to make the girls/boys hotter, among some other things. it's probably fixing the biggest popular issue in a bunch of months or so?

idk if anyone will get enough compute to train the boorus, real fashion/nsfw collections, cjk idols and so on.
>>
>>107055061
(You)
>>
>>107055096
i think people actually will start to finetune it with ramtorch or w/e. it'll likely be quite slow.
>>
>>107055061
There's multiple small model projects now that have proven you can train a from scratch DiT model for a couple thousand dollars. The fact is many people don't want to stick out their neck especially if they're in North America or Europe.
>>
>>107055103
Not really. Multigpu is mainly used for higher batch sizes, not learning rate. A single gpu would only go ~4 times slower than the typical setup, and you would only rake in more donations by going slow and doing incremental releases. Recent papers have proven that with modern optimizers the results are no worse too.
>>
>>107055110
That's not what I said. There's nothing stopping you from taking HDM, scaling it 4x. The AMD 340m toy model was trained in 1.5 days. Pixart 600m is better than SD 1.5 and on par with SDXL. So you can assume a modern DiT 2B model could be trained in less than 30 days and be SOTA as a booru model.
>>
>>107053799
>Udio partnership with UMG

https://desuarchive.org/g/thread/106957370/#106958310

So it has begun. First they will figure out which model is best, give their artists exclusive access to this model, and then give the general public a watered down version of the model, if the public gets a version at all.
>>
Sage attention 3 when.
>>
>>107055167
Remember, Udio is no joke compared to everyone else. Going after them specifically is very strategic.
>>
Finished baking male vocals version of:
>>107053777
Lyrics I posted here already:
>>107053946
https://voca.ro/1muX2AJkOy1P
>>
>>107055277
cringe
im waiting for acestep 1.5
>>
File: musicarena.png (57 KB, 1013x442)
57 KB
57 KB PNG
>"Udio is da best!!!11!!1"
Reality is picrel
You can make direct comparisons yourselves in the Music Arena if you want
Udio is only "better" if you are into gacha and generate parts of the song multiple times until it does something good, and Suno allows you to do something like that as well
>>
File: 1739936326305832.png (412 KB, 736x650)
412 KB
412 KB PNG
>>107055288
>mememarks
>udio v1.5
>>
I haven't been in the game for like a year now and I'm starting to plan a full system upgrade. Why is there no normal middle ground between 16 and 32 VRAM cards, nvidia?
I have a 2060 super, 8GB vram. My question is, assuming I just go for a 5080 or something for the 16GB, could I use it together with the 2060 to add up the VRAM and leave the rest to regular RAM? Or should I not bother and just make do with a 5080?
I could afford a 5090, I just cant help but feel like I'm being ripped off and honestly more than anything else I'm worried about it melting and exploding or something. Is undervolting/underclocking a good idea?
>>
>>107055321
Im retarded and forgot to mention that I want to get into video generation. the pastebin doesn't go too much into detail on multi-card drifting for what its worth
>>
>>107055149
I think the best possible way to train an anime model would be to first train your own captioner model that took tag lists along with an image as input, and interleaved the tags into proper sentences (in a way that would try to be grammatically correct but not necessarily to a fault) based on what it could actually see, while also adding spatial information where it could and where it made sense. Then you could just run that model on the Danbooru dataset directly with the original accurate tag lists for each model.

I'd also pick just a maximum resolution and proportionally downscale larger images to that if needed, but not *upscale* anything whatsoever, rather just bucketing everything at as close to the original upload res as possible. So the end result would be basically a fairly robust mixed-res model that could coherently do a wide range of resolutions rather than just focusing on one range.
>>
>>107055288
Udio 1.0 was best by far. They neutered the model after that. I've never seen a Suno gen on par with Udio 1.0 composition wise.
>>
File: WAN2.2_00406.mp4 (3.62 MB, 624x832)
3.62 MB
3.62 MB MP4
>>
>>107055321
If you're patient you could wait for the 50 series super refresh as the 5080 super is supposed to have 24gb vram. Can't speak to multi gpu use, but 8gb seems rather abysmal and not worth the hassle especially when you consider you can offload to system memory.
As for the 5090, I have one and I've undervolted, overclocked it and have capped the power at 80% (460W) without any issues. In any case make sure to get at least 64gb of system ram if you're gonna gen videos.
>>
>>107055370
How'd I do it is have the same image with three different captions:
- tags as seen on the booru site
- short description
- long description

Each caption really is a supported way for a user to prompt the model and the model will naturally learn how to mix the different caption types. The problem we've now seen multiple times now is people training on caption blob balls and forcing the model to be reliant on long captions if you want maximum output quality.
>>
>>107055321
There's really no downside to undervolting a 5090, you lose 5% performance and reduce the power 30%.
>>
>>107055277
Bridges and outro are pretty weak https://voca.ro/12sNX01jyU6M
>>
>>107055541
It's still pretty good if it's local.
Eg. in terms of slop.
>>
>>107055412
>>107055288
Suno very likely is trained on royalty-free music libraries like Audiojungle which is why it sounds worse but also more "polished. I'm guessing Udio was trained on more copyrighted music even before the UMG deal so it's more random but gives more interesting results.
>>
>>107055321
3090's are pretty cheap
>>
>>107055603
Old Udio was overfit on copyrighted stuff, if you input the same tags and lyrics some tracks had, you'd get nearly identical outputs
>>
File: me and ma girl.png (1.51 MB, 768x1344)
1.51 MB
1.51 MB PNG
>>107055077
>>
>>>/pol/520205588
How do I do this on my laptop?
>>
>https://huggingface.co/nvidia/ChronoEdit-14B-Diffusers
anyone tried the new nvidia edit model?
>>
File: me and ma girl3.png (903 KB, 1280x768)
903 KB
903 KB PNG
>>107055697
>>
>>107055524
I think your way might work if you literally swapped out the sets of captions for each image between epochs, probably better than slapping them all in one caption file
>>
File: ComfyUI_00425_.png (2.72 MB, 1088x1344)
2.72 MB
2.72 MB PNG
>>
>>107055797
That's what I mean, you duplicate the image for each caption type. And if you practice VAE jitter you prevent memorization.
>>
>>107055826
Nice. Maybe more... try dynamic angle, rim light. Really good.
>>
File: UwU2.jpg (2 MB, 1536x2048)
2 MB
2 MB JPG
>>
does infinite talk work with wan2.2?
>>
File: ComfyUI_00431_.png (2.7 MB, 1088x1344)
2.7 MB
2.7 MB PNG
>>107055826
>>
>>107055951
Love it. What's the model? I should try and gen something related to this.
>>
File: ComfyUI_00434_.png (2.67 MB, 1088x1344)
2.67 MB
2.67 MB PNG
>>107055951
>>107055982
Qwen
>>
>>107056008
Cinematic Redmond had these wibes. Cool that Qwen can be grainy too. Of course it's probably pretty stiff but that's what they all are.
>>
>>107054482
>emu3.5
HF links are all dead
I can test it if the models are available somewhere
>>
File: failed.mp4 (276 KB, 832x480)
276 KB
276 KB MP4
>>107055752
I can't make her sit on the pig, but it's still funny
>>
Is there a way for Librewolf (flatpak) to remember its last directory? In Linux Mint.
It's somewhat tiring to use ComfyUI and I need to open a file dialog to traverse all the way up from /home/ to my work mount...
>>
>>107055603
I don't think they're prompted the same. Udio has a better understanding of music, that shows because with a good prompt it destroys almost any Suno song I've ever heard

https://www.udio.com/songs/2bXYLKaVDyVwi1GAb6pSkR

This is a very hard song
https://www.udio.com/songs/7zrLreMnwCYrdBqQkGtEXM

The musical depth I've witnessed out of this model truly is insane. Unprecedented connection between lyrics and musical notes. It has mastered vocals and intonation in a way Suno has not.

Using high quality copyrighted music in conjunction with whatever royalty-free music is available for training the model is the way to go.
>>
>>107056111
Yeah, that's just subjective to people who have never played any instrument in their lives.
>>
Any of you has a recommandation to generate videos for a music video? I'm looking for 16:9, some kind of 35mm grain/look, mostly still shots but with some travelling too. Theme is urban 90's/2000's workers working daily shifts.

I only know of Veo 3 so far, and looking at Runway.
>>
so has the copywritepocalypse finally started
>>
>>107054444
>>107054508
>>107054542
>>107054636
Are you using TREAD like HDM too? And have you considered going VAE-less using
SVG (or at least using EQ-VAE like HDM)? https://arxiv.org/pdf/2510.15301

if not, you are missing out on huge speedups
>>
>>107055921
this is a very nice gen
what model did you use?
>>
>>107056231
TREAD is much harder to implement if you want the real speed up. The 16-channel VAE I'm using has been EQ'd yeah.
>>
>>107056239
NovaOrangleXL_v120
I'm just testing my linux installation, I deleted all my previous models. Don't have noob or anything else.
>>
>>107055149
>>107055103
THIS, we are on the cusp of home baked local SOTA.
>>
>>107056121
Udio literally just got acquired by the largest music label. That should tell you all you need to know.
>>
>>107056281
I don't know, really.
>>
Test
>>
File: 00219-597055894.png (2.74 MB, 1248x1824)
2.74 MB
2.74 MB PNG
>>
File: Sanic.jpg (2.06 MB, 1536x2048)
2.06 MB
2.06 MB JPG
>>
>>107056256
thank you anon, hows your linux experience going?
>>
Yume hands do work but you have to describe them in the prompt, treat it more like a LLM
>>
File: ComfyUI_00015_.png (1.13 MB, 1024x1024)
1.13 MB
1.13 MB PNG
/iemg/ lore, you wouldnt get it
>>
File: image_00105_.jpg (385 KB, 1179x1768)
385 KB
385 KB JPG
>>
>>107056349
Yeah, well, I'm an experienced faggot but I wouldn't advertise it for normal people. Even with the most common interfaces, it's been 20 years and they still can't get a file save dialog right.
I have used Irix and it never had these issues.
Like save a file from Cum and it defaults to some ~/.
Open a file...
It's great if you are a developer but for a normal person just use Windows.
I feel like that Linux environments have gone backwards since I last used them 15 years ago.
>>
>>107056389
Flatpak browser does not remember the save file location from Cum.
This is what I mean.
I need to browse in 5+ deep to just get to the directory I want.
>>
File: 00224-1489466956.png (2.57 MB, 1824x1248)
2.57 MB
2.57 MB PNG
>>
remember that guy that was training a model and said his image of a brown splotch for the prompt "a woman" was 80% of the way there?
>>
>>107056375
Okay, then what's the optimal total prompt length in your experience?
>>
Text can be consistent with small phrases from the looks of it and it is really sensitive to artist. The wrong tag will completely fuck everything which lends to the needs more training.
>>107056415
I never pay attention to that doesn't matter from my testing?
>>
>>107056405
>>107056389
im a linux user, what distro r u on?
im on debian and the default file save dialog in brave/mullvad remembers locations (actually not sure about mullvad because it has crazy settings, but firefox did work) for extensions
what environment are u using? i use dwm so less ram/vram is used
>>
>>107056412
how's your dataset going? still 0%?
>>
>>107056432
Yeah, comparing distros is like comparing dicks. I think I'm using Librewolf and it's a flatpak - this explains why it does not remember the directories.
>>
>>107056440
how's your training going 2 years later? still at 80% there?
>>
>>107056421
>hands do work but you have to describe them in the prompt
Sounded like your approach is to really bloat the prompt with specifics, but I guess I misunderstood.
>>
>>107056441
try firefox or brave, or look for settings to disable forgetting save directory in about:config
flatpak is likely the issue because of muh sandboxing
>>
>>107056451
it doesn't take much philosophy to understand you don't do anything and thus won't achieve anything, don't put your insecurities of failure on me, thanks :)
>>
>>107056465
yup, still 80% there confirmed lmao
>>
>>107056421
do you switch the system prompt like how the guide says to help with text or? i have yet to really fuck with text on it
>>
>>107056458
Nah, your advice is just like any of the useless non-tech advice - changing distro or even browser does not accomplish anything. If it works it works and if it does not there is way to do this but sure as hell it is not by reinstalling my disks.
>>
>>107056476
actually that doesn't mean anything, for all you know I've already released a model, but what we both know is in 2 years you don't have anything except a bitter attitude
it's truly funny I'm living rent free in your brain though
>>
>>107056490
I treat it like chroma I also use my own system prompt I'll look into the guide again but pigeonholing it to anime only doesn't do much for me
>>
>>107056497
for all anyone knows you haven't released anything lmao
>>
File: 00231-1253630154.png (3.4 MB, 1456x2128)
3.4 MB
3.4 MB PNG
>>
>>107056518
Feel free to explain why anyone would ever attach any of their professional work to 4chan. No one releasing a model that had their name attached to it would ever link to it 4chan if they wanted to be taken serious.
>>
File: SD35Medium_Output_8473732.jpg (2.86 MB, 1280x1536)
2.86 MB
2.86 MB JPG
>>107055046
One more
>>
>>107056537
how convenient
my locally trained 300B model is going great too
>>
>>107056560
The only thing you're developing is suicidal thoughts.
>>
File: image_00109_.jpg (393 KB, 1326x1768)
393 KB
393 KB JPG
>>
>>107056567
let's see a 1girl gen, bro, I'm sure it will be great bro, two years of improvement bro
>>
>>107055713
This is brilliant desu. Now they just need to make it uncensored so that a guy jerking off shows a girl fingering her pussy. Then bye bye thots, any guy can become an OnlyFans whore.
>>
>>107056581
Great stuff.
>>
>>107056375
That's not really true at all for Yume 3.5 IMO, you can absolutely even Booru prompt it straight up as long as you leave the Gemma boilerplate properly in my experience. That the generally recommended sampling configs are both not really that good is more likely the issue for some people, DPM++ 2S Ancestral at 4.5 to 5.5 CFG gives massively better results most if the time for me. It is slower though.
>>
>>107056597
Oh I forgot to say, that's with Linear Quadratic.
>>
https://huggingface.co/lightx2v/Wan2.2-Distill-Models/tree/main

new models, anyone test?
>>
>>107056491
did saving files with librewolf remember?
>>
File: ComfyUI_00535_.png (3.2 MB, 1536x2048)
3.2 MB
3.2 MB PNG
Lustify is pretty good for off-topic gens.
>>
File: image_00110_.jpg (573 KB, 1336x1768)
573 KB
573 KB JPG
>>107056596
ty
>>
File: 00062-1028256704.png (2.1 MB, 1288x864)
2.1 MB
2.1 MB PNG
>>
>>107056597
sadly that sampler is not in neo forge for some reason but I get good luck with DPM++ 2M
>>
>>107056640
>>107056581
huh? aren't these just film stills?
>>
>>107056620
I does remember the last directory for images but with cum ui it does not.
>>
File: 00234-3489883587.png (3.11 MB, 1456x2128)
3.11 MB
3.11 MB PNG
>>
>>107056151
>Any of you has a recommandation to generate videos for a music video? I'm looking for 16:9, some kind of 35mm grain/look, mostly still shots but with some travelling too. Theme is urban 90's/2000's workers working daily shifts.

>I only know of Veo 3 so far, and looking at Runway.
This is the local general so I'll give you advice for a model you can run on a GPU on your home computer

Your only real option for cinematic stuff is Wan 2.2 or 2.1 with the MoviiGen lora. I would recommend trying a 2.2 workflow + that Lora at 720 using a 5090, or FusionX if you choose to use 2.1

If you don't need the lack of censorship of WAN and you have money to spend, I'd just use runway for this. Higgsfield AI may also be interesting to you because they have specific stuff for music videos
>>
File: 1758330248565777.jpg (1.96 MB, 2016x1152)
1.96 MB
1.96 MB JPG
>>
>>107056640
I am going to give a you tip:
watch Bram Stoker's Dracula (90's) and take couple of screenshots, there's Lucy and all that. Then img2img them. That'll be great.
>>
File: 00024-3326562438.jpg (1.01 MB, 1560x1944)
1.01 MB
1.01 MB JPG
>>
>>107056614
>more i2v
I sleep
>>
>>107056674
i wish this were me right now
>>
Kek, I was playing the Suno side and thought local already caught up somehow

https://levo-demo.github.io/

Very disingenuous demo
>>
>>107056674
You should put him in a van. And make him go.
>>
>>107056670
Cool thanks
>>
>>107056679
>Bram Stoker's Dracula
The best shots are couple of still frames from inside the film, not these tiktok screenshots etc.
>>
>>107056690
Kek, who trained this model? It spits out Adele unprompted?

https://levo-demo.github.io/static/audio_sample/overview/04_en.mp3

It might be good. How come I've never heard of it.
>>
File: spyro.png (1.92 MB, 1024x1024)
1.92 MB
1.92 MB PNG
>>
>>107056726
Is there a reason why you feel the need to talk about a unrelated subject in the thread when it can exist in it's own thread with actual documentation we can grab from the OP and all use?
Just seems odd you can't do that instead
>>
>>107056732
bigger
>>
File: 00065-2876646701.png (1.96 MB, 1288x864)
1.96 MB
1.96 MB PNG
>>
>>107056742
There's no comfy workflow, and the model seems like some experimental half trained model, what is there to talk about?
>>
File: 00247-667236782.png (2.59 MB, 2128x1456)
2.59 MB
2.59 MB PNG
>>
File: image_00111_.jpg (522 KB, 1337x1768)
522 KB
522 KB JPG
>>107056650
Remember this scene?

>>107056679
It's a decent movie but I would unironically remake all Keanu Reeves dialogue with AI
>>
File: ComfyUI_00068_.png (1.1 MB, 1024x1088)
1.1 MB
1.1 MB PNG
>>107056819
bro dont upload that scary ass shit here
>>
File: 00077-3534728429.jpg (917 KB, 2048x2688)
917 KB
917 KB JPG
I'm getting closer
>>
File: 8845748478.png (373 KB, 639x412)
373 KB
373 KB PNG
>>107056726
Tencent is actually training their own music model.

https://huggingface.co/tencent/SongGeneration

>TODOs
>Release SongGeneration-v1.5 (trained on a larger multilingual dataset, supports more languages, and integrates a Reward Model with Reinforcement Learning to enhance musicality and lyric alignment)

And the data is so copyrighted it spits out Adele unprompted as you can see on their demo. That is wild, with Qwen doing the same, my faith in China has been restored.
>>
>>107056890
What does this have to do with image diffusion?
>>
>>107056867
to killing urself? never been happier for u
>>
>>107056904
There is no music thread. It's either here or /lmg/, the only two places we can discuss local models.
>>
File: image_00112_.jpg (576 KB, 1326x1768)
576 KB
576 KB JPG
>>
>>107056904
this is local diffusion general, we accept video and audio related content here.
>>
i for one welcome our music gen brothers
>>
>>107056935
>Discussion of Free and Open Source Text-to-Image/Video Models
>>107056915
>>107056926
You revealed yourself go back to your containment thread
>>
File: 00260-408042399.png (2.53 MB, 1152x2016)
2.53 MB
2.53 MB PNG
>>
>>107056867
closer to approaching the quality of a quantized 2gb illustrious model? maybe
>>
>>107056946
Comfy has audio models. We should be allowed to discuss anything comfy adopts as long as it is local.
>>
>>107056983
>>107056984
You're so fucking pathetic dude
>>
>>107056984
Besides, good audio models are pivotal for video. Since Sora 2 it's not muted audio era anymore, the SOTA has changed, so all discussion on audio research is welcomed.
>>
I'll give the NetaYume shill this, the model requires a whole lot of gacha but at least it has some actual variation in its outputs.
>>
Im running comfy ai and following the guide ive been playing around with the hand and face detailer. Is there an equivalent for feet/toes? Id like to be able to fix those too.
>>
File: 00268-426404236.png (2.29 MB, 1824x1248)
2.29 MB
2.29 MB PNG
>>
>>107056597
Some are better than others, clearly, but IMHO much of sampler/scheduler choice is subjective. The latter moreso than the former in my estimation.
>>
File: image_00115_.jpg (316 KB, 1520x1040)
316 KB
316 KB JPG
>>
>>107057066
You haven't taken any steps to learn the model and it shows, Why not explore something before going on multi day complaints?
>>
>>107057119
I barely post in this thread, you're tilting at the wrong windmill friend. And I'm saying I like the model, I get better results out of it for the particular thing I'm prompting than I get out of the other boomerprompt models.
>>
>>107057137
Anything to show?
There have been this constant wave of anons that complain about this model but don't post anything. I know you're just wasting time but take your low skill ass to one of the other threads
>>
File: 00277-1703380660.png (2.65 MB, 1248x1824)
2.65 MB
2.65 MB PNG
>>
>>107057066
>the model requires a whole lot of gacha
Describe the poses/gestures better
>>
>>107057157
"Face and proportions that don't look weird"
>>107057145
>take your low skill ass to one of the other threads
OK
>>
>>107057165
Fuck off now thanks!
>>
File: 00177-1107305360.png (3.13 MB, 1280x1920)
3.13 MB
3.13 MB PNG
>>
>>107057175
illustrious 2gb?
>>
File: illunoob vs netayume 2.png (2.44 MB, 2560x1256)
2.44 MB
2.44 MB PNG
why is netayume so sloppy bros??
>>
*yawn*
>>
>Mindbroken because hen ever made anything good in his life
>>
>>107055177
it's already there
>>
File: dmmg_0165.png (1.35 MB, 832x1216)
1.35 MB
1.35 MB PNG
>>107054935
both
>>
>>107057227
damn you melting so hard you cant even spell yumebro
>>
File: wong_01.png (3.43 MB, 2048x1536)
3.43 MB
3.43 MB PNG
https://www.youtube.com/watch?v=xboXFT46XSo
>>
>>107057111
DPM++ 2S Ancestral is pretty objectively better than Res Multistep at least for details like hands and text, using Linear Quadratic for both, I'd say at least
>>
File: 1745324974262853.jpg (1.51 MB, 2016x1152)
1.51 MB
1.51 MB JPG
>>107054935
You typically don't need more than 25 steps. Most of my 50 step outputs have been either a sidegrade or even a downgrade in terms of quality.
Don't forget that chroma can gen pics above 1024 dimensions.
>>
>>107057172
It's clearly the same fairly bad troll as yesterday, he's blatantly ragebaiting
>>
>>107057334
yeah I agree fellow yumebro, theres totally not a vast majority of people that find this model trash
>>
>>107057334
It's the same retard from the rentry, he spends his entire life doing this for years and is just reduced to a bitter faggot.
>>
>>107057206
kek yeah that poster was an idiot
>>107057327
nice
>>
>>107057352
You're right, there's in fact not a vast majority of such people
>>
>>107057327
NTA. Your pic is neat af. This is also 25 steps
>>
>>107057457
Oh, this is neat too, how's radiance compared to DC-2K?
>>
File: image_00127_.jpg (669 KB, 1336x1768)
669 KB
669 KB JPG
>>
>>107057474
>How's radiance compared to DC-2K?
Couldn't tell you, but I loved the 2k debug ones. There's still a lack of blending the macro pixels but it's mostly good
>>
File: 1750563092507017.jpg (1.62 MB, 1248x1824)
1.62 MB
1.62 MB JPG
>>
File: ComfyUI_00559_.png (3.69 MB, 2048x1536)
3.69 MB
3.69 MB PNG
>>
>>107057521
Cinematic Redmond is great.
>>
>>
>>
File: image_00128_.jpg (558 KB, 1336x1768)
558 KB
558 KB JPG
>>107057497
"The lighting is even with no strong shadows." compared to "Cinematic lighting, dark background, deep shadows, detailed skin. Sharp HDR."

>>107057515
>>107057521
very cool
>>
File: 1360212403.png (1.5 MB, 832x1216)
1.5 MB
1.5 MB PNG
>>
File: Ted.png (3.6 MB, 2048x1536)
3.6 MB
3.6 MB PNG
>>
>>107057589
https://www.youtube.com/watch?v=ZEWGyyLiqY4
>>
>>107054482
>32b
Mostly useless for local. Viable for use with quantization, especially nunchaku, but LoRA training will be a nightmare, and a model without low cost LoRA training is pointless beyond ten minutes of novelty use.
>>
>>107054248
From the final pretrained model we haven't seen any samples, but this is as it was training

J-pop song
https://vocaroo.com/19CHG4V410OP

Some pop song
https://vocaroo.com/1i7OjKcLbmnO

Some opera song
https://vocaroo.com/1f64Fkmpn9Ax

Idk, maybe with SFT phase it'll catch up to where it needs to be, but those outputs are very underwhelming. Just a bit concerning, but I don't know jack shit about these models.
>>
Slowly getting it together still need to learn composition better
>>
What was that feature of comfyui that was being advertised a while ago where you bundle a bunch of nodes together and then you can re-use that as one node?

did this ever actually happen?
>>
>>107057657
subgraphs? didn't really change anything and was kind of a letdown. the node implementation in general is lacking too much and everything done to the front end has been lipstick on a pig
>>
>>107057657
subgraphs?
they're pretty great to clean up wf and only see what you actually need to see
>>
>https://huggingface.co/meituan-longcat/LongCat-Video
>We introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across Text-to-Video, Image-to-Video, and Video-Continuation generation tasks. It particularly excels in efficient and high-quality long video generation, representing our first step toward world models.
Anyone tried it? Works with KJ wanvideowrapper
>>
>>107057589
I had an antiquated gpu. But jesus, the boost even SDXL has gotten in terms noise... Sounds like a faggotry.
>>
>>107057615
>LoRA training will be a nightmare
ostrisai's trainer has supported 3bit quants for a while now.. wouldn't that be sub-16gb? https://xcancel.com/ostrisai/status/1953933728948121838
>>
anyone knows if infinite talk works with wan2.2, or is it just for 2.1?
>>
>>107057589
what model is this?
>>
>>107057677
some onions have been trying it out. doesn't look that much different from context window jerkiness after every 5 seconds
>>
>>107057718
>onions
anons filters to onions sometimes or something? the more you know I guess
>>
File: image_00132_.jpg (754 KB, 1336x1768)
754 KB
754 KB JPG
>>107057713
Chroma-DC-2K-T2-SL4-Q8_0
>>
File: ComfyUI_00569_.png (3.22 MB, 2048x1536)
3.22 MB
3.22 MB PNG
>>
File: 31925346.jpg (13 KB, 460x460)
13 KB
13 KB JPG
>python?
>no, that shit is gay
>>
>>107057788
based chink
>>
File: 00097-493591130.jpg (1.13 MB, 2688x2688)
1.13 MB
1.13 MB JPG
>>107054044
slowly but surely, mistakes were made just need to adjust values
>>
>>107057813
Thank you Ran. You wanted some attention.
>>
>>107057648
It's impressive to see the vocals don't sound anywhere near as robotic as original ACE-Step though. If they catch up to Suno 4.5 maybe there's a chance of getting Udio tier kino now and then.
>>
>>107057813
you can't adjust values if you are worthless
>>
>>107057655
>>107057813
the painted nails are nice
>>
File: image_00135_.jpg (916 KB, 1336x1768)
916 KB
916 KB JPG
>>
File: 1755210653857745.jpg (606 KB, 1769x1111)
606 KB
606 KB JPG
So far I've been using the Wan 2.1 workflow from the rentry but wanted to try out 2.2 from here: https://civitai.com/models/1818841/wan-22-workflow-t2v-i2v-t2i-kijai-wrapper (2.2 I2V)
Why isn't it recognizing the vae? Everything looks correct to me, straight dragging the vae output from the loader to the decoder doesn't do anything either
>>
File: 1761859945112321.png (321 KB, 500x654)
321 KB
321 KB PNG
>>
>>107057861
it's not connected to the decode node, pull the string from the vae loader to the decode node to connect them
>>
File: jager.jpg (1.92 MB, 2048x1536)
1.92 MB
1.92 MB JPG
https://www.youtube.com/watch?v=Gu3TAuw3ZJ8
>>
>>107057861
If it's not bait, as it's probably is, but it can be useful for newfags, use the example workflow instead and just load the correct models :
https://raw.githubusercontent.com/Comfy-Org/workflow_templates/refs/heads/main/templates/video_wan2_2_14B_i2v.json
>>
File: 001.jpg (855 KB, 2048x2048)
855 KB
855 KB JPG
1girl
>>
>>107057823
*MrCatJak
>>
>>107056890
need them to train a speech model with emotion prompting so we can be freed from the dead end known as vibevoice
>>
>>107055149
That's what I want to hear.
Start a group, delegate simpler tasks to me, such as some manual captioning, and I'll contribute $250 toward training.
The only catch is that you share the training process and I get to ask a few technical questions.
We can find 20 others; there are plenty of interested people out there.
I don't care if it's a failure.
>>
>>107057917
I wonder if krea has a buttchin obsession too
>>
File: image_00137_.jpg (686 KB, 1336x1768)
686 KB
686 KB JPG
>>
>>107057922
You want to suck off people.
>>
>>107057922
What took you so long, please offer your asshole.
>>
>>107057655
hot
>>
>>107057949
is that...
>>
File: 002.jpg (885 KB, 2048x2048)
885 KB
885 KB JPG
elongated 1girl
>>
File: 00101-147928662.jpg (1.12 MB, 2688x2688)
1.12 MB
1.12 MB JPG
>>107057838
Thanks, I think I'm getting the hang of this model now, the hardest part is finding the right bled of tags for a presentable image followed by adjustments, starting to feel like 60 steps is the magic number with this model. I wish neoforge had all the samplers I don't know why he took some away.
One thing I like with this model is I can game due to how little vram it uses compared to chroma
Sorry but I have a dedicated sperg that hates me and has been holding a grudge for years as well just ignore him
>>
>>107058031
?
>>
>>107058031
I respected you for years but not any longer. Seems like you are just spiteful.
>>
>disabled retard noises
>>
>>107057671
>>107057672
Thanks, I've started using subgraphs but I can't figure out how to make all subgraphs reflect each other's changes when I edit one of them. Any ideas? I would expect them to work like Scenes in Godot.
>>
>>107058084
Why do you refer in 3rd person?
>>
>>107058031
>ran wanted to come out
He manages to spit out a narcissist rant.
>>
File: file.png (82 KB, 859x979)
82 KB
82 KB PNG
>>107058085
clone it
>>
>>107058085
>I would expect them to work like Scenes in Godot.
there are a lot of expectations from modern nodegraphs and comfyui ducks up 90% of what's standard
>>
>>107058130
ah, I re-cloned and the clone is working now! I guess I must've cloned too early before, or there was a bug, which caused my clones to become unique (and were no longer clones).
>>
File: 1743883744534508.png (2.93 MB, 1416x1888)
2.93 MB
2.93 MB PNG
>>
>>107055297
>pleeeeeease novel ai, i need the model files, my local model is kinda noisy
>>
>>107058193
if you duplicate you get separated entities, and if you clone you get tied ones
>>
>>107058024
Buffy x slenderman
>>
File: 00110-1547100464.jpg (1.19 MB, 2688x2688)
1.19 MB
1.19 MB JPG
Yeah I need to make loras for this model, it's the boost I needed, it should also be pretty fast compared to training chroma,
>>
>>107057893
doing that just crashed comfyui
>>107057915
not bait, I'm just a bit of a brainlet when it comes to this but your workflow works fine, thanks
>>
>>107058274
Netayume is fucking garbage, holy shit
>>
File: ComfyUI_00079_.mp4 (640 KB, 480x832)
640 KB
640 KB MP4
>>107057744
>Chroma-DC-2K-T2-SL4-Q8_0
nta, nice gens, with lora?
>>
Netayume is fucking trash and just having it write some text that looks like its done in paint doesnt make it redeemable

Chroma for complex stuff and illustrious for hentai is the way to go
>>
uh oh meltie
>>
>>107057457
Shit I didnt notice you replied to me

>25 steps is enough
Thanks for the heads up boss, chroma fp16 with fp16 text encoder doesnt run all that slow on my 5060ti 16gb if I keep it under 30 steps
>>
>>107058315
Yeah, uploading to civitai right now
>>
does using this node lead to loss in quality?
>>
>>107058370
not anything visible
>>
You can still download Udio songs on the fly as 320kbps btw, just downloaded a couple of bangers. No need to record or anything like that.
>>
File: 1748063353105752.png (3.31 MB, 1416x1888)
3.31 MB
3.31 MB PNG
can a pitfag rate this pit for me
>>
>>107058408
from what I read they're limited to 192kbps mp3
I'm getting everything in bulk I saved there
>>
>>107058432
pits fine but smaller boobs would be more harmonious
>>
>>107058432
6/10
I prefer mine like this
>>
>>107058432
Its nuts that i immediately spot a netayume pic every time since it looks so off
>>
Fresh

>>107058480
>>107058480
>>107058480

Fresh
>>
>>107058441
Yeah dunno, it's quite strange.

Was able to download a few of them at 320kbps with fetchv, like https://www.udio.com/songs/hoCg4BmayTYXcJfjo4jvbT

But other ones are only 192kbps. Maybe for some reason some of them stream at 320kbps, while other ones don't?
>>
I see people making AI images of trump and stuff like that.
But some of the stuff is definitely better than others.
Is there any way to set it up such that whatever prompt I give, the character is strictly that one character?
I mean, not just simply typing in the name of the character but making it more realistic?
Like, in a way such that even when I make anime or caricature images, it seems like some professional artist drew that based on the likeness of the person?
I don't know much about the loras and such, that's why I ask.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.