[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


3 x 80 Edition

Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106719267

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Neta Lumina
https://huggingface.co/neta-art/Neta-Lumina
https://civitai.com/models/1790792?modelVersionId=2203741
https://neta-lumina-style.tz03.xyz/

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
>>106722132
I'd like to speak to the manager.
>>
Blessed thread of frenship
>>
>>106722132
cringe coomer collage
>>
>80
>billion
>>
File: 1739047480245991.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
a massive billboard with this character is visible on a building in Akihabara, Tokyo during the day.

I like the wrapping effect.
>>
File: ComfyUI_07130_.png (1.96 MB, 1152x1152)
1.96 MB
1.96 MB PNG
Imagine some kind of finetuning breakthrough and then 80B parameter
HunyuanImage Chroma. Haha, that would be insane.
>>
>>106722204
Are you running it locally?
>>
What is CoT in the instruct model?
>>
>>106722132
My gen is the Chroma one, the one in the middle. I feel honored, thank you for supporting my bitten nipple fetish.
>>
File: Wanimate_00042.mp4 (990 KB, 960x544)
990 KB
990 KB MP4
>>
Absolute retards thinking a moe model works like dense.
Active is like 13b or 14b, you'll be able to run that in 24b easily. In lower shitty cards with low quants.
You can easily do a dynamic brain damage quant and fit everything else in ram and it won't make much difference in quality or speed.
lmg CHADS rise up.
>>
File: ComfyUI_07145_.png (2.48 MB, 1152x1152)
2.48 MB
2.48 MB PNG
>>106722211
>3x80GB locally
I don't think anyone can. China has to hurry up with their GPUs. If they're soon looking to shit on Nvidia, I think that might be the appeal of releasing such a large model.
>>
goofs wen
>>
File: ComfyUI_07146_.png (2.28 MB, 1152x1152)
2.28 MB
2.28 MB PNG
>>
>Before release
>Looks shit, I don't even want it
>released
>OMG GOOFS WHERE I NEEEEED IT.

Fuck off.
inb4 goomba.
>>
Can someone explain to me why https://github.com/FlyMyAI/flymyai-lora-trainer managed to train just fine within my 24GB of VRAM while https://github.com/ostris/ai-toolkit goes over by ~6GB?
Same lora rank, same quant settings (8bit), unloading the text encoder in AIT, same image size (1024). What the fuck is going on there?
Also would I be losing anything by using the former as opposed to the latter? First time training a lora for any model so not quite sure how to go about it. 3090 if that matters.
>>
>>106722244
>If they're soon looking to shit on Nvidia,
lmao dude the stuff they make now is so ass it's not even close.
>>
>>106722256
You are referencing two different kinds of posters
>>
>>106722259
Could be a few things, could be the rank, batch size, gradient accumulation or the optimizer whatever flymyai is uses. That's something you can tweak in the settings.
That being said, I wouldn't recommend diffusion pipe over anything else unless you have more than one GPU, in which case you should absolutely use it.
>>
>>106722256
The only way to test the model is to run it yourself nigga. I don't believe benchmark images one bit
>>
File: ComfyUI_07147_.png (2.03 MB, 1152x1152)
2.03 MB
2.03 MB PNG
>Absolute retards thinking a moe model works like dense.
It's an moe model?
Either way, from my understanding you still need enough VRAM to open the entire model even if there's only a few active parameters.
>>
File: ComfyUI_temp_dknmz_00020_.png (2.19 MB, 1152x1728)
2.19 MB
2.19 MB PNG
>>
>>106722274
I said inb4 goomba.
>>
>>106722281
https://x.com/tencenthunyuan/status/1972130405160833334
READ NIGGER READ
>>
File: ComfyUI_07153_.png (1.94 MB, 1152x1152)
1.94 MB
1.94 MB PNG
>>106722238
>>106722281
Forgot to quote
>>
>>106722281
In its defense, and I am playing devil's advocate here because I really doubt it will be the case. You can usually offload a MoE with less of a speed hit compared to dense models.
>>
File: ComfyUI_07159_.png (1.89 MB, 1152x1152)
1.89 MB
1.89 MB PNG
>>106722299
Yeah but that doesn't really change what you need to inference it anon
>>
>>106722259
AI Toolkit is notoriously bad at optimizing vram use, I don't even know if it supports any offloading at all

Best trainer for speed if you need any offloading (model doesn't fully fit in your vram) is OneTrainer
>>
does nag not work with chroma flash or are my settings just wrong? any time i use a flash model, or the delta weights, or a flash lora, the entire image looks like it's been jpegified
>>
>>106722319
It does, you seem more retarded each time you open your dirty mouth.
With moe you just need to fit the actual active part in vram, the rest can be offloaded and even quant to a different size.
You don't know shit about anything and talk without even reading the sources, the fuck you even trying to even engage here?
>>
File: ComfyUI_07163_.png (2.11 MB, 1152x1152)
2.11 MB
2.11 MB PNG
>>106722339
Example catbox? Something might be off in it, or maybe it really doesn't work.
>>
tfw found a new artist to train on
>>
File: 2_banner_all.jpg (3.92 MB, 3847x5000)
3.92 MB
3.92 MB JPG
So this is the power of 80b. A model that looks worse than flux for realism, worse than illustrious for anime and uses gpt generated synthetic data to train its text outputs. Amazing.
>>
File: vibevoice.jpg (50 KB, 1099x153)
50 KB
50 KB JPG
>>106721831
i was able to use 1.5B to gen something, but when i tried the large or 7B i am guessing i am out of memory. but I have 24gb vram. the 1.5B doesnt sound that great, buts its able to preserve accents from the sample which is awesome
>>
>>106722191
did they steal sora's content or something? i hope they at least added some nsfw fun
>>
>>106722353
nta, but I think you're being overly optimistic how feasible this model will be to run at any speed that won't make you want to kill yourself.
>>
>>106722360
>A model that looks worse than flux for realism
No, it clearly doesn't

That said for such a huge model it does look unimpressive
>>
File: ComfyUI_07164_.png (1.84 MB, 1152x1152)
1.84 MB
1.84 MB PNG
>>106722360
I agree, they would never give us for free what is strictly from a hand curated dataset. Though it does appear to have some slight hints of using real photos as reference, at least for cinematic stuff (so it's comparable to Seedream 4). The one area that is clearly superior is prompt following and text, given it's autoregressive. An image edit model based on this would probably be pretty good, and there's a good chance they'd open source it.
>>
File: Wanimate_00043.mp4 (827 KB, 960x544)
827 KB
827 KB MP4
>>
>>106722394
Nah, he's right, it's good for realism, but it's not as good as Krea or Chroma.
>>
>>106722290
You didn't say it you typed it
>>
>>106722362
7B works fine on 24GB, I use it all the time. Make sure the sample voice file is 3 minutes max, a bit over 3 minutes is when it'll OOM on 24GB.
>https://pastebin.com/raw/f2ibMSGf
Simple single speaker wf.
Use https://github.com/diodiogod/TTS-Audio-Suite, the other extensions for VibeVoice are no good.
>>
>>106722414
looks like shit
>>
>>106722417
>3 minutes max
Is there a benefit to using samples that long?
>>
>>106722419
But enough about you.
>>
Comfy give me smea dy ++ plox
>>
File: 1748599270941883.jpg (1.44 MB, 2000x2599)
1.44 MB
1.44 MB JPG
>>106722360
imagine paying a 10000 dollars gpu to get this shit lmao
>>
>>106722441
Yes, the cloned voice will better resemble the original and the cadence of the speech will be more accurate. Make a 30 second sample and a 2 minute sample, then compare the outputs. The 2 minute will usually (if not always) sound better.
It's also better to split up your samples by emotion. Don't mix whispers with neutral speaking, or angry/yelling upbeat/happy. Have characterX_angry.wav and characterX_netural.wav for example, then use them accordingly for best results.
>>
uh oh localkeks are seething! meanwhile the rest of us can enjoy hunyuan 3.0 uncensored through API nodes
>>
>>106722456
Just take another mortage on your house, totally worth it
>>
>>106722463
>uncensored
post oneeshota futa pegging or gtfo
>>
>>106722115
They've had well over a year to make it better though
>>
>>106722463
>uncensored
API
kek
>>
>Silveroxides removed all his speed loras that were better than Flash
Why
>>
File: 1748285656514546.png (64 KB, 336x150)
64 KB
64 KB PNG
>>106722191
80 billions parameters!
>>
>>106722484
moved into a new folder
>>
>>
>>106722489
Where? His Chroma Loras repo only has the flash loras.
>>
>>106722496
https://huggingface.co/silveroxides/Chroma-LoRAs/tree/main/flash-heun
>>
I think people expecting this model to be trimmed down to a manageable size and still be worth using are huffing obscene amounts of copium.
>>
>>106722504
Are you retarded or just illiterate?
>>
>>106722487
Not even a gorillion
>>
What's the best sampler/scheduler/steps combo for Chroma HD?
>>
>>106722512
lcm/karras 75 steps
>>
>>106722512
res_multistep, beta, 50
>>
File: 1741184122881217.png (97 KB, 1437x778)
97 KB
97 KB PNG
>>106722191
>240 gb of vram
kek, you literally need 10x3090 cards to run this shit
>>
>>106722506
i heard chodestone will de-distill it and retrain it at 256x256
>>
>>106722524
That's the 16bit model, right?
>>
>>106722528
and his version will take twice the time as the original to gen 1 image
>>
>>106722529
yes, bf16, that means you'll need 120gb of vram to run Q8
>>
>>106722516
Blurry as fuck
>>106722517
That actually looks great
>>
>>106722417
thanks ill try this out, did an install and its saying some things are missing ill figure it out tomorrow
>>
https://xcancel.com/TencentHunyuan/status/1972130405160833334#m
>noo you don't get it anon, we need 80b parameters to get decent images
those engineers are living on another planet or something?
>>
I'm glad every new model is getting larger and larger while also getting worse and worse
>>
localbrowns seething at hunyuan for pushing the tech forward instead of seething at nvidia for keeping hardware behind. there is nothing wrong with 80b, an H200 can run this fine. SOTA models like Seedream and GPT are equally as big. what do you expect, endless 12b flux clones that all look like shit? they tried that with hunyuan 2.1 and nobody cared.
>>
File: 1732763641637071.png (48 KB, 622x115)
48 KB
48 KB PNG
>>106722550
they trained this shit on 5 billions images, damnnnnnn
>>
>>106722548
Needs sage installed, since it speeds up gens. You can set attention_mode to auto and it'll still gen, only slower. Only extensions needed are tts suite and rgthree.
>>
>>106722524
Can any richfags test the pretrained version just to check if that version is unslopped?
>>
>>106722560
>5 billion openai generated synthetic image-text pairs
Yeah, it shows.
>>
>>106722550
>those engineers are living on another planet or something?
For the last four years anyone working in AI has basically be able to say "We need X H100s to do Y" and someone will write them a blank cheque to achieve it. So yeah, the do live on another planet. A planet where someone will just provide the compute for them no matter the cost. As long as that is the case, no further optimizations will ever be made to make smaller models more effective, because the people who make the models have no upper limit to their resources.

This is why I hope for an AI crash soon. So engineers are forced to work with limited resources and provide solutions that aren't just "more compute"
>>
gonna smoke some weed brb
>>
File: 00205-4204754562.png (2.73 MB, 1248x1848)
2.73 MB
2.73 MB PNG
>>
>>106722568
OpenAI got some nice money with those chinese fuck spamming their server lmao
>>
cozy bread
>>
>>106722573
Yeah, half if not more of their traffic has to be the Chinese slopping out synthsets
>>
Is there a way to make comfyui stop capping floats to the first digit?
>>
>>106722204
desu I'm sure you can remove some of the fat and end up with a 20b model that is like 95% as good
>>
File: RA_NBCM_00016.jpg (879 KB, 2736x1872)
879 KB
879 KB JPG
>>
Where's the tool that lets you convert Flux loras to Chroma loras?
>>
>>106722569
yeah, they haven't improved shit on the architecture or on the training process, it's just "stack moar layers bro" and they call it a day, fucking lazy fucks
>>
>>106722557
But you don't understand there is a miracle tune method out there that will make model smaller than XL and beat the best of the best!
What "not even trillion dollar companies have figured such a method out yet?" Fuck off with that shit, they just dumb, I obviously know best. #JustOneMoreEpoch!!
>>
1000 more years of sdxl btw
>>
File: Wanimate_00045.mp4 (1.82 MB, 960x544)
1.82 MB
1.82 MB MP4
>>
>>106722602
It's called retraining.
>>
>>106722607
this
>>
>>106722557
>nvidia for keeping hardware behind
this meme must die, why do you insist on Nvdia having to be the only gpu provider for humanity? it's not Nvdia's fault that its rivals suck ass, I'm more angry at AMD not even trying to be competitive
>>
>>106722609
No, there was a specific script that was posted here that you could use to convert shit like the hyper-flux loras into Chroma loras.
>>
>>106722601
y repoast
>>
File: 1729312036509967.png (1.11 MB, 1006x953)
1.11 MB
1.11 MB PNG
>>106722524
Tencent be like:
>>
>>106722608
Gandalf was truly a master of disguise
>>
>>106722557
>what do you expect, endless 12b flux clones that all look like shit? they tried that with hunyuan 2.1 and nobody cared.
nobody cared because they train on synthetic slop, 12b or 80b it doesn't matter, the result will end up slopped as fuck, no one will care until they finally make some effort on having a decent dataset (like they did on Midjourney or Seedream)
>>
>>106722608
>tits accurately swinging with the motion of the cart
What a time to be alive
>>
>>106722631
>Midjourney or Seedream
Fuck no, those model gens suck ass
>>
Downloading the Hunyuan 3 now
I don't have 3x H100, but I have an RTX 6000 and 192GB system memory so hopefully it offloads easily and I can at least do some basic tests to see if it's worth pursuing any further.
>>
why couldn't that furry faggot just release a normal goddamn speed lora instead of one that fucks everything up? and why did the other furry faggot remove his collection of speed loras that worked better than the official one? furries are a fucking blight and need to be banned from contributing to ai development
>>
>>106722577
Tencent weren't doing that as much back in HunyuanDiT days. Had they continued iterating on that, we might have had a good contender to Flux.
>>
File: 1731796850524359.png (311 KB, 1356x940)
311 KB
311 KB PNG
he wasn't joking when he called it "the super parameter" 2 weeks ago, this baby is huge
>>
>>106722642
>furries are a fucking blight and need to be banned from contributing to ai development
They're better than trannies, and also unlike trannies they have money to fund training
>>
>>106722607
I'm ok with this. It's tried and true.
>>
>>106722649
furries and trannies are often one and the same
>>
>>106722649
>They're better than trannies
a lot of furries are trannies though
>>
>>106722607
all you need
>>
>>106722647
>anon says something *crickets*
>this random fuck on twitter who was wrong about Wan 2.5 says something *50 different screenshots including news we all know by now*

Posting him should be a bannable offense.
>>
>>106722660
>wrong about Wan 2.5
>she doesn't know
>>
>>106722660
>he was wrong on one prediction out of 10 gozillions, BAN HIM
what kind of mental illness is this?
>>
i'm going to make my own ui and it'll just be photoshop plus sdxl. i'm going to leave this general and i'm never going to return. no more noodles, no more fluxchromaqwenhunyuanwan shit. everything after sdxl has been disappointment after disappointment
>>
>>106722668
But what about wan support.
>>
see you 2morrow
>>
>>106722672
okay so maybe wan is fine. but nothing else.
>>
>>106722666
Because he's just the marketing arm of Chinese SaaS companies now.
>>
File: ComfyUI__00270.png (3.34 MB, 1280x1920)
3.34 MB
3.34 MB PNG
yes 1girl is very nice, but have you considered ... 2girl?
>>
File: output.webm (3.85 MB, 832x1248)
3.85 MB
3.85 MB WEBM
>>106722494

>>106722674
I shoulda gone to bed 2 hours ago.
>>
>>106722668
trvke: r/StableDiffusion is a better source of local gen information than this general
>>
>>106722668
we already have this technology it's called the krita ai diffusion plugin
>>
>>106720291
>Troon posting is no different from shitting up the board with scat. It should be a bannable offense.
>>106722660
>Posting him should be a bannable offense.
this troon thinks he's on reddit or something? lmao
>>
>>106722698
>krita
sucks fucking cock
>krita ai diffusion plugin
sucks mega fucking cock
>>
I don't want to sound mean (Because I have shit gens myself) but why are all video poster gens always shit or interchangeable?
>>
File: cube.mp4 (1 MB, 512x512)
1 MB
1 MB MP4
>>106722706
I've only ever seen one soulful video gen from this place and it was posted when 2.1 first came out
>>
>>106722695
Lmao
this is great
>>
>>106722685
>have you considered ... 2girl?
as a Tencent employee, I always prefer my AI images with 80 billions girls
https://youtu.be/VU2d_Pld3w8?t=60
>>
>>106722706
Because most video posters are just fucking around and they take a long time to make. So there more of an expression hey look at this rather than hey I made this for you entertainment.
>>
>>106722707
Fair. I like some, but we got walkinganon, rocketanon, and one other. I don't have the tech to do something or else I'd do something fun besides Makoto pics.VWRHT
>>
>>106722695
on the other hand once you notice how locked in space her hand and phone are it's kind of distracting
>>
>>106722700
>sucks mega fucking cock
my cock
>>
>>106722716
>I don't have the tech to do something or else
Then maybe you should just shut the fuck up and consume what you are given like a good little vramlet instead of complaining.
>>
File: 1751538243755537.jpg (328 KB, 1248x1824)
328 KB
328 KB JPG
>>106722685
2 girl is illegal is some regions
>>
>>106722719
Idiot retard double dipshit. That's how phones in mirrors work. You wouldn't know because you're too ugly to be worth taking photos of yourself.
>>
>>106722721
this guy owns a 5090 and slops
>>
File: buzz lol.png (4 KB, 151x58)
4 KB
4 KB PNG
>>106722715
Who don't they use online gens at this point? With CivitAI they could rack up buzz and do video gens.
>>
>>106722724
you know im right and you're coping
>>
>>106722685
I consider 2girl an unhealthy amount
>>
What's stopping me from training my own speed loras
>>
>>106722725
I doubt it lol
>>
>>106722726
People with disposable income do not care about buzz.
>>
>>106722721
Like I said: I don't want to be mean. I enjoy them, but they get old after a while. If I had some god shit, I'd be fucking with everything lol
>>
>>106722517
Can be okay for the initial gen, bad for upscaling. It gives the edges of pixels this ragged/aliased look. Okay aesthetic for certain gens, maybe.
res_2s | beta57 | 20-35 steps for initial gen, 15 for upscaling passes @ x2 scale (max 4 tiles) looks best on chroma hd imo, but you'd want a decent GPU given how slow 2s is.
>>
>>106722722
>teenage mutant ninja turtle hands
illustrious does this a lot more than i remember
>>
>>106722719
kek, people like you always make me laugh, always looking out for those little details in ai gens
>>
>>106722724
Bait aside, believe it or not, turn the camera slightly in any direction then it won't be in the same spot anymore in the mirror. On the other hand, if you started with larger frame, stabilization can be applied by cropping stuff.
>>
>>106722736
That doesn't answer what I asked. I got all that from liking gens and claiming daily buzz,
>>
API NODES SAVE US ONEGAI
>>
>>106722755
six fingers
>>
>>106722755
details matter
>>
>>106722695
thats fucking awesome, did you prompt the devil guy or did you use some reference image?
>>
>>106722661
They never fully confirmed the weights for Wan 2.5 would be released. They implied they would "consider" releasing it after the API preview phase, but this is not a confirmation and it sounds like it's not up to the devs but the higher ups.

But I personally believe they will ultimately release it (Kling and even some other API models mogs it)
>>
>>106722792
>They implied they would "consider"
Anyone familiar with Chinese or asian culture for that matter knows this means no.
>>
>>106722782
only turbo autists get fixated on looking out for small details like that
>>
File: 1758906075660251.jpg (359 KB, 2048x2048)
359 KB
359 KB JPG
>>106722752
It's not every day you'll come across someone as bad at configuring this stuff as me, so there's that too
>>
>>106722792
>I personally believe they will ultimately release it
same, because this model isn't close to veo 3, they'll probably reach google's level with wan 3.0 and wan 2.5 will be useless, so might as well share this scap to the localkeks
>>
>>106722746
This is the same "If I was a billionaire I'd" but you wouldn't you're just as fallible and predictable as everyone else and would end up producing the same slop.

If you were a billionaire, you would be a tight ass.
If you were living in Nazi Germany, you would have been a hard core Nazu.
If you had a 5090, you'd gen slop.
>>
>>106722808
where is the 'make 2girl kiss' guy when you need him?
>>
>>106722801
Their model is not even SOTA. I think the higher ups got too confident/cocky they would be on top this time around and decided to become API-only, but from the side by side comparisons I've been seeing on X, if they lock this behind API it will be just another video model not even normies will bother using
I do think they would permanently stop flirting with the idea of releasing weights the second they become at least top2 or when they realize they have something truly useful industry-wise
>>
File: Untitled.jpg (3.99 MB, 7034x9142)
3.99 MB
3.99 MB JPG
>>106722191
who the fuck is their target audience even that they really thought this collection of sample pics was reasonable justification for the 80B params. Like NOTHING seen there is any different from the cherry picked shit every recently released Chinese model has done, it's not impressive in any way.
>>
>>106722847
can you post this image 12 more times please
>>
>>106722649
>money to fund training
It would be more effective to pour gasoline on that money and just burn it then the experimental bullshit training the furries do. At least the former would entertaining for a couple of minutes
>>
>>106722820
I only have a 4060. If I had some good shit, I'd train LoRAs 'n shit with help from you guys.
>>
why has the hunyuan collage been posted three times in this thread
>>
>>106722847
isnt it a multimodal llm?
>>
localbrowns continue seething at being unable to run the top open weight model in the world. absolute sour grapes
>>
>>106722857
The dude has been mindbroken by chinese dick. sad really.
>>
>>106722846
>I think the higher ups got too confident/cocky
I think they did too. But it gives you a glimpse into their mindset. They want to start getting their money back for this investment. That kind of action shifts your entire mindset. Releasing 2.5 might detract paying customers from forking out for 3.0 when it drops etc. It's flawed reasoning, but it's hard not to see them tightening the noose.
>>
>>106722847
Those latest chinese models (except the Bytedance ones) are proof that these guys, while being smart and technical, have no real sense of aesthetics or taste by realizing their models produce slop.
The fact that a fucking 80b model from them still produces slop is the final proof of that.
>>
>>106722856
You can train chroma loras on 8gb
>>
>>106722889
>You can train chroma loras on 8gb
This has the same energy as a homeless person telling you to stay out of school.
>>
>>106722889
I know, I'm just sayin; I would if I could lol
>>
>>106722855
anti-Chroma schizo, get a life
>>
>>106722572
Very nice. Catbox?
>>
>>106722901
Different guy, newly converted into hating chroma. Hate seeing someone with so much potential waste his effort on worthless bullshit
>>
>>106722900
>I would if I could lol
Wait... how much vram do you actually have?
>>
>>106722913
lel, sure Jan
>>
i don't even hate chroma i just want a better speed lora
>>
>>106722916
8
>>
File: 16539[1].jpg (21 KB, 640x432)
21 KB
21 KB JPG
>>106722922
vramlet
>>
QWEN was too good to be true..
>changes image composition
>zooms in
>doesn't follow instructions
>doesn't remove objects
>>
>>106722934
don't use lightning
>>
>>106722484
https://huggingface.co/clover-supply/Chroma-loras/blob/main/chroma-unlocked-v4x-hyper-turbo-flash-r64-fp32.safetensors
https://huggingface.co/clover-supply/Chroma-loras/blob/main/Hyper-Chroma-low-step-LoRA.safetensors
>>
>>106722929
You're right lol
I wish I had more. I have 48GB RAM so I think I'm okay.
>>
Spending 4k on a new sff pc build with a 5090 dedicated to AI gooning. Is it worth it if i save up over the next year? That 32gb vram take me far for what i need. 100% offline & local.

What would you guys suggest i add to my build. I was thinking
64gb ram
4tb nvme or even 8 if there's any reputable brands out there
9800x3d
>>
>>106722955
>64gb ram
128+
>>
>32gb vram
vramlet
>>
>>106722947
i love you forever
>>
>>
>>106722956
I've seen some people suggest that, any reason why. Vram would be getting hit for the most part. I have 32gb ram right now and there's times where i get OOM errors but 128?
>>106722961
I've toyed with getting a A6000, maybe i will
>>
If I want to know more about how these models are trained should I start reading papers and textbooks?
>>
>>106722970
>any reason why
Futureproofing, training.
>>
File: 12.jpg (3.83 MB, 4614x2000)
3.83 MB
3.83 MB JPG
>>106722854
yw
>>
File: WanVideo2_2_I2V_00455.webm (340 KB, 1248x704)
340 KB
340 KB WEBM
>>106722414
So the quality of animate is kind of shit, so I had the schizo idea of feeding the first frame through i2v after the generation was done to see if it could clean up some of the shittyness. idk.
>>
>>106722994
>use 180p source video
>it matches that quality when inserting your custom character
>durrrr da quawity of animate is shiiiiiiiiiit
>>
>>106723000
kys faggot. I'm talking about the warping and blurring of the objects in the image.
>>
>>106722955
You can usually have more than 1 nvme.
>>
>>106722847
We're fully in the Deepseek era but for image diffusion/video diffusion. Remember when LLMs were at max 70B open source while propietary was playing with 100-200B and then suddenly, we had 671B parameters with Deepseek. Same thing happened with image diffusion here. We went from 12B parameters and API models are probably in the 20-30B range and now we have 80B open sourced from Tencent.
>>
>>106722994
her fingers longer than most anon penis
>>
>>106723007
This is their game. They can play the open source on paper card while releasing models nobody could feasibly hope to ever run and thus forking out for their use.
>>
>>106723004
Use a higher resolution input video, you fucking retard.
>>
Is there some place or maybe a youtube channel where i can go to learn how AI works. I'm barely understanding what 12/20/80B means but i'd like to know some of the nerdier stuff
>>
File: 00000-277433315.png (2.71 MB, 1248x1848)
2.71 MB
2.71 MB PNG
man I love alcohol and not caring about the quality of my posts in a thread that will disappear. What a nice feeling.
>>
>>106722955
get 192 or 256gb of ram. ditch the sff so you can have 4 dimms, even at a slightly lower speed.

system ram can make a huge difference for models with different subcomponents that are sequentially swapped out of vram. there are also a lot of badly programmed python inference scripts that rely on system ram cache that are easier to throw the ram at than to spend time fixing. and of course, system ram helps run moe llms, even if it's slow dual channel.
>>
>>106723017
The resolution of the input is the resolution I genned at for wan animate. I can only upscale that image and video and hope it works. If you're talking about the actual source of the video itself, I'm genning it now.
>>
>>106722979
>>106723019
Yeah lurk here and read papers
>>
>>106723011
>projection
>>
>>106723007
deepseek is actually good though, better than the closed competition at the time in many areas.
hunyuan 3 tbd but not looking good based on >>106722991
>>
>>106723024
What are you using for interpolation? If you say rife, I'm going to punch you in the face.
>>
>>106723011
not here in long dick general
>>
>>106722979
What is even the point pursuing that when you don't have any access to the resources required to train them?

I'd bet any midwit here would be able to train top tier stuff if they had tons of free compute thrown at them. People would just replicate architecture and training pipelines from existing papers and use properly curated data and still get great results. Anyone with a good eye determining what is "slop" and what is not would still produce better models than any 130 IQ ML engineer has been putting out.
>>
File: 1735755570014342.png (1.42 MB, 1248x832)
1.42 MB
1.42 MB PNG
>>106723020
i like them
>>
>>106722847
The only thing that actually looks impressive is that it knows how to write and position correctly. The Chinese characters don't look positioned correctly and none of them look like they are impossible characters or hallucinated from my limited Japanese knowledge. I am guessing it will probably be good at maybe making manga or comics oneshot without doing any editing and manual placement. But other than that, I really don't know.
>>
>>106723036
I'm not looking to make top-tier SOTA GPT-killer-tier stuff, I just want to know enough to be able to fuck around to the extent that I want.
>>
>>106723052
Unless you have some H100s laying around that extent is effectively 0.
>>
>>106723061
That sucks. I hate that I just hallucinated this entire hobby and the websites full of models and their associated tooling, which all do not exist because the people involved don't have H100s laying around.
>>
>>106723052
I have IRL friends that like training small ml models for "fun and experience" but I never really saw the point of that considering they never had any ambition to work at FAGMAN or join an AI-related startup with VC money, where they would truly put their knowledge to practice.
It's a big waste of time, I wouldn't even spin it as an "intellectual" activity since this doesn't even displays true intelligence
>>
>>106723061
Let a nigga be curious ffs
>>
>>106723051
Being able to output coherent text isn't impressive at all. It's an autoregressive model, coherent text is a stock standard feature of the architecture (modeling sequential/long range dependencies). Literally every other aspect of it looks like complete ass, doubly so considering the size.
>>
File: 00438-2801304206.png (2.54 MB, 1280x1920)
2.54 MB
2.54 MB PNG
>>106723038
now that is a really nice image
>>
>>106722847
Tbf, in China, where 4o is probably banned, this is probably ground breaking stuff. Which is why they distill in the first place.
>>
>>106722695
Can you post this again but without the text.
>>
>>106723051
Think you mean they do look positioned correctly.
>>106723074
English text no. Chinese text though, even ChatGPT messes up at times, the 4 and 12 characters are wrong. But I dunno if you need a model possibly 4x what ChatGPT is running to get perfect Chinese text.
>>
>>106723111
I mean, you have to figure a Chinese model is going to be better at Chinese than a Westoid model, right? OpenAI's market is mostly English speaking regions, and their product is banned in China.
>>
Attempting to generate the Hunyuan 3.0 reference image
>>
>>106723133
Oof. What hardware are you using and how much are you offloading?
>>
File: 1751845357020963.jpg (692 KB, 2120x1416)
692 KB
692 KB JPG
>>106723086
second passes look shit but pretty much what the real thing looks like kek https://x.com/MorinagaJunko/media
>>
>>106722877
>have no real sense of aesthetics or taste by realizing their models produce slop.

Some of their researchers don't seem to. Same thing happened at ClosedAI. First they had Dalle which shows some competent artistic direction, but then they tossed all of that out the window with 4o, with a much suspected fingerprint/censorship causing the yellow tint. Chinese companies have simply gotten lazy, no more effort into their models, which is why we're now getting low effort models with trashy aesthetics too. Bytedance has offset some of that crap a bit with Seedream, but it's still not quite there yet. As much as I hate it, if I were to choose an API model based on aesthetics alone, MJ is still the winner, closely followed by NAI.
>>
>>106723142
damn that shit is raw, nice find
>>
File: 4545484421845.png (59 KB, 885x574)
59 KB
59 KB PNG
>>106723145
>MJ
And that's the sad part of all this. There was a time when Tencent legit cared and thought MJ was aesthetic and SOTA. By testing their model against it, they were doing honest research back then which is how we got HunyuanDiT. It's so sad that we're talking about the past, and what is essentially now just an archived and forgotten about repo.
>>
File: RA_NBCM_00025.jpg (1.48 MB, 1872x2736)
1.48 MB
1.48 MB JPG
>>
>>106723140
RTX 6000 BBW with 192GB of RAM
About half of the system RAM is being used along with all the VRAM
>>
>>106722608
https://files.catbox.moe/3h88fr.webm

Here's another attempt and i2ving the animate output to clean it up. Unfortunately her nipple kind of popped out so I gotta catbox it.
>>
>>106723145
dall-e 3 was the peak of image models. insane amounts of characters and styles, and did a really nice job at creative gens. too bad it was nerfed to shit and never released. 4o is a complete downgrade
>>
sooo hunyan v3 is a moe. an 80b moe with A13B.
is it unironically over for us vramlets?
>>
>>106723145
>>106723180

True, dalle3 was magical. To this day it's the only base model that can produce good niche aesthetics like polaroid-like or vhs-like pics, was really good at some painting styles like impressionist art, could make some "trippy" retro anime illustrations that I haven't seen any other model pulling out, and overall produced some really authentic 2D artworks. The "normal" outputs were slop, but with some prompt it produced some very interesting results. Whoever worked on post-training it likely doesn't work at openai anymore, lol, the 4o outputs look pretty generic in comparison
>>
>>106723193
No, because SDXL is still the best for anime and I say that without a hint of irony. This model looks slopped.
>>
I never thought De3 looked aesthetic desu the humans especially were slopped
>>
>>106723193
no it means you have a chance of running it with enough ram... if lmao.cpp ever implements it (never ever)
>>
File: image.png (1.53 MB, 1216x832)
1.53 MB
1.53 MB PNG
>>106723175
The first output of Hunyuan 3.0 has arrived.
The total generation time was 13:20 (50 steps, all default settings, sample inference code).
The default prompt is
>prompt = "A brown and white dog is running on the grass"
Output resolution is chosen automatically by default.
>>
>>106723210
I always thought they looked impressive at first, but once you know and get used to what they look like they become overbearingly slopped.
>>
>>106723219
Agreed
>>
>>106723180
They're really tight lipped about the architecture too. DALLE 3 was the only model to this day that truly understood my prompts. I think they used a smaller version of GPT4V as the text encoder or something which would explain a lot.
>>
>>106723216
Jesus. It looks like liquid shit spat out through a straw.
>>
>>106723210
People liked how it defaulted making women look like hot bimbos, which is quite something considering the company
>>
>>106723216
I guess it's a nice looking dog. Not the best dog I've ever seen though. Certainly not for 13 minutes.
>>
>>106723216
isn't the new hunyuan a multimodal llm? can you talk to it?
>>
>>106723216
Reminds me of Chroma during its early epochs, ie the lack of resolution on the grass, the way it frazzles fine details on hair, etc
>>
>>106723216
>The total generation time was 13:20
It's like they want people to keep using sdxl
>>
>>106723235
Yeah, I think it's supposed to be. The basic inference script simply accepts a text prompt and outputs an image. I'll try to use the chatbot functionality after I generate a couple more test pictures.
>>
>>106723197
Its art was very good. Dalle 3 threads from back then were pure sovl. Realism was also very good, just not as good as what we have now (and Chroma has essentially closed the gap in prompt following). Still, Dalle 3 was probably single handedly the most coherent realism even if it had to be jailbroken to unlock its capabilities. I still get iffy multiple subject gens on Chroma (even Chroma HD Flash) that I know Dalle would nail in one shot.
>>
>>106723251
>It's like they want people to keep using sdxl
If all you people prompt is "1girl" and criticizes "boomer prompting" (aka, prompting like a sane non-autistic member of society would), why even bothering with other models at all?
>>
>>106723145
We get it you like MJ, like holy shit man. I think it looks like ass so hey.
>NAI
Lol their current model looks like absolute ass, V3 was great, I'll give it that.
>>
>qwen 3 vl multimodal
"Ooh, I want to try and make it translate and colorize douji-"
>235b
>>
>>106723213
>if lmao.cpp ever implements it
im still waiting for them to implement qwen3max, so hopes are low
>>
>>106723275
The qwen vl series only accepts image input anyway, so it won't do any coloring for you even with a B200 farm.
>>
>>106723264
I don't mind boomer prompting but in 13 minutes I can gen enough with seed gacha to get the same result on other models.
>>
>>106723216
Not bad, maybe not 80B good but is something.
>>
File: 1748287166931509.mp4 (1.36 MB, 720x720)
1.36 MB
1.36 MB MP4
>>106722808
>>106722823
>>
File: 1745076622417154.jpg (56 KB, 897x882)
56 KB
56 KB JPG
>>106723307
>>
>>106722353
No, krugertard, with MoE, for every step basically all of the experts are going to actually be used every time, it's just that at any given time only a set amount is active, but you are still reading the entire model every time. So you don't "just place the active parts in vram", since they swap all the time are are all used for each step, which would make you hit into the pcie bottleneck quickly.

The point of MoE is that you get faster speed but need to train a bigger model to get the same IQ of the dense model, which is a fine tradeoff since given the faster speed and RAM cheapness, you can load it into RAM.

There are also some caveats here like some architectures indeed having experts that are more commonly used and thus always locked into vram but thats besides the general way MoE models work right now.
>>
>>106723349
>are are
and are
>>
>>106723262
Chroma with loras is the closest thing that exists that can replicate the "dalle3 aesthetics" in an open model, but even then it's very disappointing how mangled Chroma is overall with anatomy and simple things like characters holding swords
>>
>>106723349
we dont even know if this uses shared experts (it probably does, like all moes), still even with shared on vram and the rest on ram, it would be painfully slow for imagegen.
I'm not even sure if comfy has an implementation that lets you select how many layers you want on cpu/gpu for moes, or select the layers by name (like using -ot with llama) and tensor splitting across multiple gpus I mean
>>
File: makotoburrito.jpg (2.93 MB, 2304x2304)
2.93 MB
2.93 MB JPG
We're just hangin' out lol
>>
Well the noob guys got their GPT equivalent that they wanted, you think they will come and finetune this? kek
>>
File: WanVideo2_2_I2V_00458.webm (1.72 MB, 1248x704)
1.72 MB
1.72 MB WEBM
Here's another wan animate cleaned up through i2v. This time I used seedvr2 to upscale video before running it.
>>
>>106723307
awoo
>>
>>106723390
gave him anime eyes
>>
>>106723414
You don't understand, Gandalf is a type of gollum you see.
>>
File: image2.png (2.05 MB, 1024x1024)
2.05 MB
2.05 MB PNG
After 29:32, the second output of Hunyuan 3.0 is ready. It's the mandatory 1girl test. Yes, this 1024x1024 image took half an hour.

>>106723213
You don't need to wait. The stock inference code offloads properly. I would aim for over 220GB total memory though.
>>
>>106723426
It's detailed
>>
>>106723426
nice earth. looks like dall-e
>>
>>106723426
The earth looks pretty good desu.
>>
>>106723426
are you running it with FA and flashinfer?
>>
>>106723426
interesting part of the globe it chose to display
>>
>>106723426
If you could, can you test the model's world knowledge? Things that current local models need a lora for? I tried the official website but there is a rewrite engine behind the scenes, so it keeps modifying my prompts for slopped outputs basically.
>>
i kneel
>>
>>106723445
it's showing the only part of the world that matters, china and friends
>>
>>106723307
very nice
>>
>>106723458
brown
>>
File: 1752098074223273.png (1.02 MB, 1248x832)
1.02 MB
1.02 MB PNG
>>106723426
>>
>>106723469
friend*
>>
>>106723458
And India. I think we'll see Indian AI being the future of open source in the coming weeks.
>>
>>106723441
No, it didn't work despite everything nominally being configured, so I gave up for now to try actually generating some things.

>>106723447
What do you want to test specifically? The model itself is an LLM hybrid so I'm not sure if "creative interpretation" is built in.
>>
File: 1753933781942115.jpg (991 KB, 1248x1824)
991 KB
991 KB JPG
>handcel seeth
wake me up when it can do hands
>>
>>106723474
Model?
>>
>just realized I can split up elements of a scene each into their own clip encoder to avoid shitting up the prompt
wew
>>
>>106723496
>VRAMlet detected
you caught me...
app.reve.com
>>
>>106723508
buy an ad
>>
>>106723517
>I did it
>I said the line!
buy me a GPU, faggot
>>
>>106723525
this is the LOCAL diffusion general, go shit up /sdg/ saastard.
>>
>https://huggingface.co/lightx2v/Wan2.2-Lightning/tree/main/Wan2.2-T2V-A14B-4steps-lora-250928

Looks like we've got fixed lightning loras for Wan2.2 releasing.
>>
>>106723426
half an hour for that is brutal
>>
>>106723540
>T2V
wake me up I2V drops.
>>
>>106723540
>files are not named properly

These fuckers need to hang.

"Lets release this brand new car! But sir, what do we name it? That's the thing, we don't!"
>>
File: 1733539143304046.png (1.04 MB, 832x1248)
1.04 MB
1.04 MB PNG
>>
>>106723497
How?
>>
>>106723133
>seedvr2 to upscale video before running it.
Why? The movie is available in 4k, there is no reason to do an upscale on a clip of said movie, you can downscale from that resolution. And you introduced artifacts in said upscale that caused the end result to be worse.
>>
>>106723589
Attention retard. ATTENTION RETARD.
The clip needs to be lowered in resolution in order to run wan animate. I can't throw a 4k clip into wan animate and not oom. Nobody can.
>>
>>106723587
you just make a bunch of them and then use conditioning (combine) nodes to chain them together by 2s until you have a final positive conditioning output
I actually don't know if this works, I'm still waiting to find out lol
>>
File: 1739542274361803.png (3.56 MB, 1416x2120)
3.56 MB
3.56 MB PNG
>>
File: makotodrunk4.jpg (2.71 MB, 2304x2304)
2.71 MB
2.71 MB JPG
I gotta put her to bed, but I don't think she will.
>>
>>106723426
YOU ARE THE RETARD. I explicitly said
>you can downscale from that resolution
Why even have a clip at a lower resolution than wan animate to even need to upscale was my quesiton. If you don't have the actual raw movie, then your fault for needing to upscale from your shit source data.
>>
>>106723607
Mean to quote >>106723596
>>
>>106723602
ok it doesn't really work great
>>
>>106723607
Hey shit dick. Not everyone just has a 4k rip of the lord of the rings on their computer. It's an unreasonable expectation.
>>
>>106723604
>an actual sexy gen for once

nice
>>
>>106723624
>>106723624
>>106723624
>>106723624
>>106723624
>>
>>106723370
Yeah, Chroma messes up if prompt with multiple subjects is too complicated, and I guess Qwen is pretty good at that stuff nowadays too (though it's censored).

>>106723426
This is pretty good. Looks like the model knows some rough manga lines, similar to HunyuanDiT. I wonder if it knows mangaka too. If only it were smaller, local would be saved. God damnit.
>>
>>106723622
>not hoarding media in the year of our lord 2025
Your loss.
>>
>>106723621
>>106723602
actually it kinda does, I just need to figure out how to break it up best
>>
>>106723216
imagine spending a 10000 dollars gpu, waiting 15 minutes for this lmaooo
>>
>>106722602
https://github.com/EnragedAntelope/Flux-ChromaLoraConversion
>>
Sage Attention 3 out, thoughts?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.