[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Specialized Models Edition

Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106758695

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Neta Lumina
https://huggingface.co/neta-art/Neta-Lumina
https://civitai.com/models/1790792?modelVersionId=2203741
https://neta-lumina-style.tz03.xyz/

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
localcope hour
>>
Cursed thread of Mental Illness
>>
Blessed thread of frenship
>>
File: rfhnrfdxrfhrf.png (294 KB, 1453x791)
294 KB
294 KB PNG
trying to figure out why this upscaling method is causing the output to become desaturated. pls help.
https://files.catbox.moe/m0fwcu.json
>>
>instant seethe
How does the existence of localchads piss them off so much?
>>
>>106760374
>localchads
I wished I was a chad, but compared to API our models are fucking toys
>>
>>106760374
because they can only generate sam altman. no other real people.
>>
File: 00138-1668900267.png (2.51 MB, 1240x1240)
2.51 MB
2.51 MB PNG
>>106760374
Has nothing to do with local he does this whenever a api model gets released. He can't afford a new gpu and is stuck with a 3000 series card which is still solid but can't play with bleeding edge
>>
>>106760397
>because they can only generate sam altman. no other real people.
Wan can only do Trump
>but muhh i2v
Sora can do that too
>>
File: 00002-2571889917.png (1.54 MB, 1024x1280)
1.54 MB
1.54 MB PNG
>>
>instant cope
How does the existence of superior saas models trigger the localpoors so easily?
>>
>>106760409
>Sora can do that too
prove it with an image of any celeb ill wait :]
>>
so much cope already
>>
The brain isn’t a monolith model, it’s a federation of specialized modules. Visual cortex, auditory cortex, motor cortex, language centers, etc. all evolved for domain-specific processing. Coordination doesn’t erase specialization; it depends on it. And by the way, the brain has trillions of synapses: orders of magnitude beyond any AI model. If you think that comparison justifies wasting parameters on unfocused multimodal models, you’re proving my point: specialization is what actually makes the system efficient. The irony is our brain is more akin to a MoE model with specialized domains all of which filter and prepare inputs for the "generalist" model. Do you think your brain processes the raw auditory data? Do you think what you see in your brain is what your eyes actually see?
>>
>>106760414
>How does the existence of superior saas models trigger the localpoors so easily?
jealousy, that's all, and instead of wanting to reach that level, they prefer to pretend that their localkek models are good enough, those guys have 0 ambition it's sad
>>
>>106760414
Sam Killed local
https://files.catbox.moe/q5yjbn.mp4
>>
File: 1737378911147787.png (1.27 MB, 992x1048)
1.27 MB
1.27 MB PNG
view the scene from the side 90 degrees

neat, works
>>
>no u
Woah, is this a rerun?
>>
File: FluxKrea_Output_2726927.png (2.45 MB, 1024x1496)
2.45 MB
2.45 MB PNG
>>
>>106760442
Could you do this, but instead of Sam Altman it is a 1girl with gigantic jiggling boobies?
>>
File: 1741458254746239.jpg (204 KB, 1440x1517)
204 KB
204 KB JPG
>>106760455
original:
>>
>>106760462
too unsafe saar please do not
>>
>>106760458
So far, only /ldg/ is seething about Sora 2, never seen /DE3/ seething about Wan 2.2 lol
>>
>>106760419
Assuming you are responding to me, yes, all that is correct. I am not advocating for the "throw more parameters and compute at the problem" procedure, I am saying that there is no one solution to the problem because the problem is ill-defined and depends on context, thus, you need all sorts of different tools, some general and some specialized.
>>
still waiting for that celeb i2v btw
>>
File: reeeee.png (59 KB, 259x194)
59 KB
59 KB PNG
>>106760492
>I am not advocating for the "throw more parameters and compute at the problem" procedure
Tenslop: "HOW DARE YOU"
>>
>every few gens all models unload completely

Anyone else having this issue? They take forever to load back up
>>
>>106760483
I bet if DE3 got 50 or so links to videos posted in a row they'd be annoyed as well
>>
>>106760483
>seething
yes i'm so jealous you are paying hundreds a month to prompt censored slop. money you could use to buy a gpu or cpu or build a computer!
>>
>>106760528
/ldg/ started to seethe immediatly after the first video, try not to rewrite history, they are completly allergic to API news, they don't want to know the world is advancing without them, it makes them sad you know
>>
File: 00142-3000492623.png (2.62 MB, 1240x1240)
2.62 MB
2.62 MB PNG
>>
File: 1738420508849752.png (1.28 MB, 992x1048)
1.28 MB
1.28 MB PNG
give the woman black lingerie and anime cat ears.
>>
what's the obsession with celebrities anyways?
>>
File: 1747645830781770.png (217 KB, 1060x805)
217 KB
217 KB PNG
>>106760540
>hundreds a month
how about 20 a month, and you had to pay thousands to daddy Nvdia to run your localkek slop you know
>>
>>106760557
>what's the obsession with celebrities anyways?
Ikr, before Sora 2 appeared, they were pretending they didn't need celebrities and that it was fine that local models can only render Miku and Trump, and now they care, curious
>>
>>106760560
>$200 a month
>for slightly higher res and no watermarks
the utter state of SAAS fags, you probably pay $1000 a year for games you dont own too.
>>
>>106760557
The part of our brains that used to care about polytheistic deities/fairies etc. got hijacked by the propaganda industry during the 1920s and advent of mass visual media, thus, when some people think "Venus" they instead think "Christina Hendricks" or (Allah forbid) cosplayers.
>>
>>106760568
>for slightly higher res
that's why it's a better deal to go for 20 a month, the difference between 720p and 1080p isn't big
>>
Come up with something new lilbro. I know you're restricted with your API calls but that doesn't mean you're restricted with your trolling.
>>
File: 1746575212901509.png (1.14 MB, 992x1048)
1.14 MB
1.14 MB PNG
>>106760550
give the woman a black business suit, with a white bra and cleavage. she is smoking a cigar.
>>
have local models learned what blitzball is yet?
>>
ask openAI to generate a video of Sam Altman being beheaded for defending his Israel masters.

see what happens. Oh? Censorship? That's sure worth paying for.
>>
I don't care what anyone says, this is the most kino AI video I've ever seen in my life.
https://files.catbox.moe/uj9981.mp4
>>
>>106760541
Do you think the newfags bought this reply?
>>
>>106760599
yeah why look at miku hatsune when you could see sam altman's ugly face in every video.
>>
>>106760596
can you do behading on local? they don't know how to do violence, Sora 2 is better at this
https://files.catbox.moe/nr3fk0.mp4
>>
>>106760599
yeah low iq gooners just don't understand progress
>>
File: 1739640448275619.png (1.3 MB, 992x1048)
1.3 MB
1.3 MB PNG
show me a sora2 video with an asian girl with big tits like this. go ahead. surely the paid service has an i2v option. Use this image!
>>
>>106760409
on behalf of anon i humbly accept your concession btw
>>
File: ComfyUI_temp_yjmco_00028_.jpg (849 KB, 1536x1152)
849 KB
849 KB JPG
stop replying to it, fucking retards. what happened to "don't feed the trolls"
>>
>>106760492
For neural networks trained with gradient descent, there really is only one path to efficiency: clean datasets with minimal noise and specialized modalities. Everything else just increases variance, slows convergence, and wastes compute. Until machine learning models diverge from being specialist autocomplete models, this will not change. Current ML models (transformers, diffusion, etc.) are fundamentally specialist autocomplete machines. They learn distributions and predict the next most likely output.
>>
>>106760621
In anons defence the troll will simply reply to himself if he gets no (you)s. That being said you're right.
>>
>>106760621
they will never learn
>>106753726
>>
File: 1757377837801470.jpg (11 KB, 225x225)
11 KB
11 KB JPG
qwen edit 2509 is more impressive than generic slop videos with sam altman.
>>
>>106760609
>yeah why look at miku hatsune
Sora 2 can do miku though, it can do a shit ton of characters
https://files.catbox.moe/1c3h2s.mp4
https://files.catbox.moe/6tmyun.mp4
https://files.catbox.moe/1seqwp.mp4
>>
>>106760621
>don't feed the trolls
a near impossible ask when most of the thread consists of troll posts
itll die down soon thoever
>>
>>106760644
okay now do a japanese gravure model with big tits.
>>
>>106760657
>but muuhh coom
that's all /ldg/ cares about huh?
>>
File: 1757944334874321.png (1.09 MB, 888x1176)
1.09 MB
1.09 MB PNG
stop arguing and play Starfield(tm)
>>
>>106760669
Now do anything that is remotely PG-13.
>>
File: 00149-240615931.png (2.59 MB, 1240x1240)
2.59 MB
2.59 MB PNG
>>
File: 1756173605584358.png (1.06 MB, 888x1176)
1.06 MB
1.06 MB PNG
>>106760677
>>
File: ComfyUI_05856_.png (1.45 MB, 992x1048)
1.45 MB
1.45 MB PNG
>>106760583
Give the woman a severe case of leprosy
>>
>>106760689
>Now do anything that is remotely PG-13.
an Epstein Joke is good to you?
https://files.catbox.moe/p8zyu7.mp4
>>
>>106760621
You vastly underestimate the dedication this guy puts into sliding this long dick general
>>
>>106760720
>this guy
we all know it's debo though, everything bad that happens to earth is the fault of debo after all
>>
>>106760707
Nothing in that besides the meta knowledge makes it PG-13.
>>
File: 1756794538403652.png (992 KB, 888x1176)
992 KB
992 KB PNG
THE FANS DEMANDED IT
>>
>>106760725
i'm debo#3 myself
>>
>proving my point for me
kek
>>
localkeks are NOT ready for Midjourney V8
>>
>>106760727
how about Hitler having his head blown out by a gun? is that also for all ages? >>106760610
>>
>>106760743
>brain blown out
>blood splatter
Okay, that might get PG-13. But I hope the cuts aren't a sign of a quality model.
>>
>>106760060
People who have never really used 4o or sora for still images wouldn’t know, but from my use (and I am a local enjoyer don’t get me wrong saas won’t make my big titty 1girls taking dicks in their ass), OpenAI’s models seem to have the best prompt recognition and understanding. That’s not to say they have the best output, but I think anons in the thread that I see are conflating output quality with prompt recognition. Which fair if the quality is too poor who cares if it adheres well, but I don’t think it’s that bad.
>>
File: 00010-2405399559.png (1.42 MB, 1024x1280)
1.42 MB
1.42 MB PNG
>>
https://files.catbox.moe/obg1y4.mp4
>>
>>106760775
kek, this thread is fun with those meme Sora models, it's a good way to pass the time before talking about some somethingburger happening in the local space
>>
>>106760775
Absolutely schizo cineotography and framing.
>>
>>106760707
wan2.2 will throw cum-covered ass all over a horsecock. sora2 is cool but different jobs require different tools
>>
File: 621037~01.jpg (23 KB, 303x362)
23 KB
23 KB JPG
What's the default CFG/steps for Wan without the light loras?
>>
>>106760790
>Absolutely schizo cineotography and framing.
I agree, their model is impressive but there's just too many cuts
>>
>>
>>106760816
it's not the cuts, the last cut is unusable and a shot no one would ever do, why would you frame his back and the empty back wall?
>>
File: 00488-950026451.png (2.66 MB, 1248x1848)
2.66 MB
2.66 MB PNG
>goin fishin huh?
>make sure to get some good bait
>>
>>106760849
yeah, that too, since there's so many cuts, it has more chances of making mistakes, it's way easier for a model to do just one continuous shot, what they did is ambitious, but it's not accurate enough yet
>>
>>106760845
>tylenol
Kekd
>>
>>106760864
But maybe the next $1 pull will make a better video :^)
>>
File: 00163-2666148618.jpg (1.1 MB, 2480x2480)
1.1 MB
1.1 MB JPG
>>
File: 00013-3042618728.png (1.39 MB, 1280x1024)
1.39 MB
1.39 MB PNG
>>
>>106760644
I have a single question, aren't they scared of being copyright raped? Why are they allowing so many characters to be rendered lol
>>
why is he so easy to bait? he'll be melting down for hours
>>
so forge neo seems pretty good, all you have to do is git clone it then move your forge/reforge stuff to the same folders.

faster for loading/switching models too.
>>
>OpenAI's servers right now
https://files.catbox.moe/zu9sl2.mp4
>>
>>106760908
>git clone
this alone filters 70% of users
>>
>>106760908
are you trolling? it's completely unusable for chroma, qwen and wan.

and who cares about sdxl? if you insist on using sdxl reforge is so far ahead of neo forge it's not even funny
>>
File: 00167-3122639363.jpg (585 KB, 2480x2480)
585 KB
585 KB JPG
>>106760930
I'm using chroma just fine
>>
chroma is dogshit trained at 1/4 the resolution of SDXL yet takes 5x as long to generate. any post that seriously suggests it can be discarded as tasteless coomerboomer babble
>>
>>106760948
He's out of line but he's right!
>>
File: file.png (910 KB, 832x1280)
910 KB
910 KB PNG
>>106760940
do you have 64gb ram or a 5090?
neo forge insists on reloading chroma every single gen, so you're either lying (for whatever reason) or just have enough ram where the memory management doesn't shit itself like a retard

>>106760948
haha yeah i know right? your gens looks so much better! oh wait.
>>
>>106760948
>tasteless coomerboomer babble
So SDXL?
>>
>>106760958
>So SDXL?
everything but Seedream and Midjourney
>>
File: 00188-1162602840.png (3.34 MB, 3072x768)
3.34 MB
3.34 MB PNG
>>106760908
does it gen faster
>>
>>106760499
>Nobody giving a fuck about hunyuan image 3.0
>>106760633
>nobody gives a shit about it because its a piece of trash, if it could simulate reality and take 32845698 b200 everyone would still scramble to be the first api piggie to try it
chat is it true?
>>
>>106760957
I have both and nothing is taxed, I'm using fedora
>>
This is how Sam scrapped Studio Ghibli's images btw
https://files.catbox.moe/qb7d5t.mp4
>>
>>106760990
He's right that no goofs really put a damper on things. But even the one anon who busted out his 96GB VRAM build thought it was ass.
>>
>>106760990
yes obviously. jeetykeks jump through hoops to run wan 2.2 on 8gb because it's good, but nobody cares about hunyuan because it's 1024x shit (still better than chroma btw)
>>
>>106760930
I use forge for my illustrious/noobai stuff. adetailer also works with a checkbox and is easy to configure and is not very comfy in comfyui.

I moved stuff over cause reforge isn't getting updated any more.
>>
>>106761015
>jeetykeks jump through hoops to run wan 2.2 on 8gb because it's good
this, if HunyuanImage 3.0 was an unslopped model that could do a shit ton of characters and celebrities, you bet I would find a way to stuff this shit on my poor 3090
>>
>>106761022
>adetailer also works with a checkbox and is easy to configure and is not very comfy in comfyui.
im UI agnostic but check out https://github.com/chrisgoringe/cg-controller
>>
>>106760930
reforge still better overall? okay, i'll have to compare both
>>
>>106760990
Yes, people would be renting clusters if Hunyuan was actually the best model ever made but it doesn't even seem even 10% better than QIE.
>>
File: 1750781475977569.png (2.46 MB, 1536x1536)
2.46 MB
2.46 MB PNG
yeah reforge seems a bit better plus the img2img and inpaint UI is cleaner
>>
local image model ranking best to worst (post-sdxl):
qwen
wan single-frame
flux pixelwave
chroma
flux dev
hunyuan 3 80b
hidream
sd3.5m
sana
>>
>>106761132
for tranime or realism?
>>
https://xcancel.com/Adyseku/status/1973352752714752430#m
wtf lol
>>
File: 1752881996931518.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
>>106761127
made that monitor gen so I can test qwen edit:

remove the text on the screen of the CRT monitor. add White ascii text that says "LDG general" on the monitor.
>>
https://files.catbox.moe/xipsho.mp4
Is this the first celebrity Sora can do that isn't Sam?
>>
Invalid workflow against zod schema:
Validation error: Required at "definitions.subgraphs[1].nodes[8].inputs[0].type"; Required at "definitions.subgraphs[1].nodes[8].inputs[1].type"; Required at "definitions.subgraphs[1].nodes[8].outputs[0].type"; Required at "definitions.subgraphs[1].nodes[9].inputs[0].type"; Required at "definitions.subgraphs[1].nodes[9].inputs[1].type"; Required at "definitions.subgraphs[1].nodes[9].outputs[0].type"


How do I use this information to debug the workflow? Obviously it has something to do with subgraphs, but what else?
>>
>>106761201
it's clear openai models are trained on absolutely everything, the most powerful models in the world. but they get censored and censored to the point where everyone forgets what they're capable of as they're neutered until they produce nothing more than sanitized slop
>>
Wow, it truly is over isn't it?
https://files.catbox.moe/nyn13v.mp4
>>
>>106761260
I'm not being ironic when I say this is as slopped as SDXL booru gacha. If you stop clapping like a retard at the flashing colors and start actually considering what you're seeing you'd see the details are nonsense. We're still in slop meme gacha territory, just like 4o before it.
>>
>>106761260
You can literally use this thing to make HD versions/continue manga adaptations where it left off etc...
Crazy
>>
File: remember SD3? lol.png (1.34 MB, 2047x524)
1.34 MB
1.34 MB PNG
>>106761260
>APIchads are having their fun playing around with cool anime characters while localkeks can only boast that they managed to put a red sphere on top of a blue square... yay...
>>
File: 00060-2659787539.png (1.71 MB, 896x1152)
1.71 MB
1.71 MB PNG
>>
File: 1728304467251327.png (1.17 MB, 992x1048)
1.17 MB
1.17 MB PNG
>>106761281
>censored
>PG at most
yawn.
>>
>>106761278
k, good luck with that
>>
>>106761275
>We're still in slop meme gacha territory, just like 4o before it.
no one said it's perfect, but it's way better than what local can produce, and yet you're a fan of local right? curiously you're less harsh when it's about to judge the capabilities of Wan, and god knows this model can make some horror shit
>>
>>106760700
Unironically hotter

t ghoulish enjoyer

>>106760815
Think its like 5 cfg and 20-25 steps? I forgor, been using light and other slopped models for so long
>>
>>106761293
It's not just "not perfect", it's actually useless beyond generating short meme slop.
>>
File: 1748694875762694.gif (2.28 MB, 450x360)
2.28 MB
2.28 MB GIF
>>106761303
>short meme
a.k.a the sovl of the golden age of the internet
>>
>>106761290
If it were local it'd just need controlnet and some other simple scene control. Add to that the possibility of a video edit model... It's 95%+ of the way there. Fan animations would actually be feasible for the first time.
>>
>>106761319
Yes, which is the value of $0. I'm not saying Sora isn't interesting, but it's so overhyped and you people don't realize they're going to rugpull you on the memes again just like the did with 4o.
>>
>>106761260
that looks like some YTP scenes of animes, that's funni
>>
File: 1756037045255772.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
it doesn't matter if sora could do 4k HD videos, it is censored by default which is bad. and it's not free.

any fun prompt will be rejected by the openAI overlords.
>>
>>106761337
>Yes, which is the value of $0.
That's true, because having a good time and laughing out loud at these videos is priceless.
>>
https://files.catbox.moe/bvrz47.mp4
wan 2.5...
>>
>>106761356
Yeah just like those 4o memes you people share today right?
It's funny because the only real video model that has lasted is Wan because it can do uncensored.
>>
>>106761363
already better than sora
>>
>>106761363
>furry coom
lodestone is that you?
>>
>>106761363
Way too unsafe to be OpenAI for sure.
>>
>>106761366
to be fair this is a local thread, from time to time we talk about the news of an API model, but that's pretty much it, it shouldn't overstay its welcome, it should remain a place to talk about local in the majority of the time
>>
>>106761363
Is this legit wan 2.5?, cause gross furry shit aside that's legit good.
>>
>>106761363
you guys are pathetic, all you care about is coom, that's why everyone makes fun of localkeks btw
>>
>>106761390
You don't see Sora fucking anywhere. And 4o only gets posted in the OpenAI general which literally looks like only two people post in.
>>
>>106761363
This has no reason to go so hard
>>
File: hunyuan 3 examples.png (1.65 MB, 1109x974)
1.65 MB
1.65 MB PNG
>>106761281
the problem with local labs is they fail to understand that these party tricks were meant to be a demonstration, not an end-goal. now we have models like hunyuan 3 emadmaxxing textslop while being incapable of producing anything pleasing. instead of going
>being able to produce text on signs indicates the model was trained on a wide range of well-captioned data, including text
they go
>we need to generate longer text than everyone else, quick generate some flux images of a man holding a blank sign and then paste 3 paragraphs of arial font gptslop into it! now automate this for 10000 more samples
>>
>>106761366
Yeah this model looks fun at short bursts but the schizo editing will get annoying quick. Already getting to me ngl
>>
>>106761363
Damn, that's actually repulsive.
>>
>>106761399
Well no shit, you have two options:
- low quality meme slop worth $0
- specialized slop for titilation
>>
>>106761407
>You don't see Sora fucking anywhere.
me when I lie, I couldn't escape Sora today and yesterday, whether it's tiktok, youtube, reddit, twitter...
>>
>>106761394
https://www.reddit.com/r/aivideo/comments/1nv72c4/
i'm assuming it is but it just says wan in the title

>>106761399
https://www.reddit.com/r/aivideo/comments/1nv8bh1/
i was looking for sora vids, this one has some good blood going at 3:10
>>
>wan wait to release 2.5 to make sora 2 look like shit
I kneel China
>>
>>106761424
>- specialized slop for titilation
also worth $0
>>
>>106761412
What gets to me is the fact that these text showcases always look like absolute shit.
Worse than a 2 minute photoshop job. It literally looks like someone just used MS Paint to plop some text on an image. No blending, no natural strokes. Just machine text on another layer from the background.
It fucking sucks ass.
>>
File: 00061-1303723015.png (2.22 MB, 896x1152)
2.22 MB
2.22 MB PNG
>>
>>106761428
Sora 2 is being spammed just like 4o on release. Sora 1 is posted no where. And when they crack down on the memes just like they did with 4o, it'll die too because the only things you're going to see spammed with Sora 2 is copyrighted characters. But have fun with your FotW fun model, not like we've not done this dance the last four major API video model releases.
>>
>>106761363
Is the song AI generated too? That's the most impressive part of the video.
>>
>>106761464
>Sora 1 is posted no where.
well duh, it was a bad model, Sora 2 is giving good shit if you test it out by yourself
>>
>>106761446
No it's definitely worth more than $0 because people spend money and time making those LoRAs. Anon, it's okay to admit there's no functional value to Sora 2.
>>
Needs more disabo
>>
File: whol.jpg (2.65 MB, 3456x1440)
2.65 MB
2.65 MB JPG
>>
>>106761474
>Is the song AI generated too?
not a chance, it sounds better than Udio and Suno, it's probably a real song
>>
>>106761452
If I were to guess that's exactly how they assembled their dataset. Take blank_page.jpg and then dynamically generate text on it and then have the prompt "A white paper on a table with text that reads: generated_text".
>>
>>106761484
>people spend money and time making those LoRAs
and people spend money and time making anime character loras, your point?
>Anon, it's okay to admit there's no functional value to Sora 2.
memes have value, sorry if you only goal in life is to coom and not to laugh
>>
>>106761497
Yes, but there's ElevenLabs Music now which I think could be higher in quality.
>>
>>106761505
Post your last best gen.
>>
>>106761502
That sounds dumb enough to be entirely true, seeing how they really love their synthetic training data.
>>
ComfyUI custom node question:

Does anyone know the correct way to create a node that feeds an image filename into a Load Image node? It doesn't accept a string, even though that's what's being actually put out by the primitive node that feeds into it.

The input info looks like this:

@classmethod
def INPUT_TYPES(s):
. . input_dir = folder_paths.get_input_directory()
. . files = [f for f in os.listdir(input_dir) if os.path.isfile(os.path.join(input_dir, f))]
. . return {"required":
. . . . {
. . . . . . "image": (sorted(files), {"image_upload": True}),
etc.

So it's a list of filenames in the input directory.

Listing the return type as [ ] on a node allows you connect it to this input, so it's doing some kind of check to make sure it's a list, but when you actually try to run the workflow it finds the lists don't match. I'm not sure why that matters because the only value actually being returned by e.g. a primitive node connected to this input is a single string filename. I've tried to have the exact list recreated as a return type (lol) but it still finds they don't match.

deleted and reposted because I fucked up copying the code section lol
>>
after 2 days of running comfyui i came to a conclusion that you only need epic realism checkpoint. prove me wrong
>>
File: 1744540823349298.png (222 KB, 1407x861)
222 KB
222 KB PNG
>>106761517
It's sad that the chinks are so obsessed with mememarks, when will they understand that this is not what we care about??? and when we try to make them understand that they cry and piss themselves
>>
File: comf0.jpg (1.51 MB, 1536x2560)
1.51 MB
1.51 MB JPG
>>
>>106761517
It actually would work quite well if you took care to make sure it didn't look computer generated, but since they're Chinese, much like Jeet, they don't go the extra mile.

Synthetic ultimately is the only way to scale to millions of dataset items with 100% accuracy. But realistically you'd do things like procedural generation with a game engine to create different images in different styles.
>>
>>106761522
Isn't there already a custom node like that? Also try asking Claude
>>
>>106761434
Man I hope those chinks keep to their promise and actually release it.
>sora
Pretty cool motion is pretty good but man those cuts are cancer...
>>
>>106761522
ah shit, I'm retarded. The error was because I forgot to sort the list.
>>
>custom nodes
meh
>API nodes
now were talking
>>
https://files.catbox.moe/hs4dbx.mp4
To give some credits to Chroma, it can do Animal Crossing so...
>>
>>106761559
Ok I'm still getting errors, just different errors. Something about how I'm doing it is wrong.
>>
File: ChromaRobocop_00010_.jpg (721 KB, 1912x1648)
721 KB
721 KB JPG
>>
>>106761434
>>106761581
everytime it cuts, there's some weird pixel blurring for a few frames, it quickly becomes a blurry mess when the cuts are frequent, like on that boxing fight
>>
>>106760364
either your initial output is deepfried as fuck and the upscale is simply fixing that, or youre using a bad vae
>>
Soracucks immediately silence when wan 2.5 chads anally destroy them
>>
>>106761616
>you defeat them with furry coom
not everyone is a degenrate like you lodestone
>>
File: 1746210888878086.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
>>106761354
remove the crt monitor from the image. the anime girl has both her hands on the desk.
>>
>>106761631
can sora have Sam anally raped by a fur god? I didn't think so
>>
>>106761522
You probably need to rewrite it to be a string for the image input so that the string primitive can hook to it. The LoadImage node is so shit.
>>
https://files.catbox.moe/6p1rbn.mp4
SAAR Altman
>>106761661
you can have Sam being a furry though >>106761181
>>
File: 1754190377498679.mp4 (3.17 MB, 1280x720)
3.17 MB
3.17 MB MP4
>>106761631
How about a wan meme
>>
>>106761399
Outside of generating my own hentai and porno, what is the genuine use case for image and video generation? I mean really? I do manual labour and my hobbies don’t involve media. You look down your nose at it but would anyone even be using this shit nearly as much otherwise?
>>
Don't fret anon, open source will always be here for you.
>>
>>106761605
It almost feels like their magic sauce is it generates multiple 3 second clips but uses the previous frames as input, similar artifact to Wan with first frame insertion where the first frames of the gen are funky.
>>
File: Fun not allowed.png (172 KB, 460x460)
172 KB
172 KB PNG
>>106761680
>use case for memes?
>>
>>106761680
>I
>I
>my
that's your problem, people might have hobbies you don't like, and you don't seem to understand that simple concept, narcissistic behavior
>>
checkpoints folder, or diffusion models folder?
call it friendo.
>>
>>106761711
Why does other people’s use case matter for mine?
>muh narcissism
Yeah sure but how about an actual answer.
>>
File: that's right.png (89 KB, 618x640)
89 KB
89 KB PNG
>>106761679
kek, that's cool, I love memes, whether it's made by a local model or an API model I don't give a shit, if it makes me laugh I welcome it
>>
So what version of chroma should I be using? And anyone has workflow for it?
How does it compare to WAN txt2img workflow?
>>
>>106761679
heh, not bad! something with focus lampooning current pol in this quality would get some reposts.
>>
>>106761723
And why does your use case matter for mine?
>>
>>106761697
I guess I somehow just assumed those were implicit, fair call out.
>>
>>106761738
It doesn’t, but I don’t sit here shitting on you for it like you (or whoever the REEEEE COOOMERS guy was). It’s as silly as the dudes who mald about anime on here, an anime website lol
>>
>>106761672
no thanks, had to have acts of sexual deviancy
>>
Remember hypernetworks?
>>
>>106761724
That's the thing with memes they get old fast. Though if you are having fun now, you do you but I can't see this model lasting more than a month before the magic wears off
>>
Reminder to flush your vram with an empty prompt to stop buiildup of loose bits or you get ramrot.
>>
>>106761733
i recommend chroma 1 base,

the current radiance snapshot is not for everyone (arguably you could try it too because that's the one we could currently in some ways help improve)
>>
>>106761555
>Man I hope those chinks keep to their promise and actually release it.
same here, there's a few more wan 2.5 vids i saw on there but they were a lot more slopped than the furry vid

https://www.reddit.com/r/aivideo/comments/1nlucn1/
i usually hate these type of vids but some of the lyrics were p funny, makes me want to try suno
>>
>>106761755
remember embeddings? doras?
>>
File: 1752412229041615.mp4 (3.66 MB, 736x736)
3.66 MB
3.66 MB MP4
>>106761763
>Though if you are having fun now, you do you but I can't see this model lasting more than a month before the magic wears off
lol, I'm still having fun with Wan + I2V, it's a great combo for infinite memes
>>
>>106761780
I still use an embedding just to saben me the few seconds of typing out “masterpiece best quality” and bla bla for my booru models.
>>
>>106761834
>magic prompt keyword embedding
Yikes
>>
quick question, is hunyuan image 3 our monkey paw wish for gpt 4o at home? or are we still not there yet
>>
https://files.catbox.moe/pt8abc.mp4
it's impressive how well it's able to reproduce 80's styles videos
>>
File: 1735072024342886.png (8 KB, 405x71)
8 KB
8 KB PNG
I've only been using these 2 upscalers in 2025, is there something newer or better I should look into? I kind of like upscalers that add some grain.
>>
>>106761260
>2:08
SOVL
>>
>>106761856
>is hunyuan image 3 our monkey paw wish for gpt 4o at home?
https://www.youtube.com/watch?v=H47ow4_Cmk0
https://www.reddit.com/r/StableDiffusion/comments/1nt22sm/hunyuanimage_30_t2i_example
>>
>>106761669
I don't understand why it doesn't work for me. I can make a node that the load image node primitive can hook into and the data it receives is just a string. I can make a node that hooks into the load image input without error and gives it a string but now I get
>Prompt outputs failed validation:
>LoadImage:
>- Exception when validating inner node: 'NoneType' object has no attribute 'endswith'
>>
>>106761733
workflow wise, you could just load the models (chroma 1 base, flux ae vae, t5xxl) on your preferred sdxl worklflow

i can share you a workflow otherwise but IDK if you want the same custom nodes, they're not "what's needed to get chroma working", i just habitually like some nodes
>>
>>106761856
Hunyuan Image 3 is a retarded meme, as if we need 80B parameters. It's like building a 30,000 sq/ft single family home, it's completely missed the plot when they could've done something more practical like having a 24B image model and maybe an external, optional, vlm guidance module.
>>
>>106761856
Sora 2 is the new benchmark. Until we get something like this opened locally, we will never be there. BFL is capable, but they bend the knee to safety.
>>
>>106761856
its just generic chinese crap, there are no amazing gens locked behind the steep requirements. it's available for free on hunyuan's api and the outputs there are equally garbage. the output quality absolutely does not match the parameter count, and in almost all cases it somehow winds up looking worse than qwen/wan
>>
File: 1749580249303490.png (2.5 MB, 1536x1536)
2.5 MB
2.5 MB PNG
wai v15 is pretty good, no lora

masterpiece, best quality, amazing quality, hatsune miku, takamaki anne cosplay, school uniform, classroom, smile, persona 5, sitting, desk

the first 2 are default, then you just add prompts. this extension is amazing for booru based models btw:

https://github.com/DominikDoom/a1111-sd-webui-tagcomplete
>>
>>106761755
kek, yeah those were fun for sd1.5, they were super strong. wonder if there's a way to do chroma out there somewhere
>>
its crazy someone could post a bare pussy right now and get nuked out of orbit within a minute but all the spamming wouldn't even be touched
>>
>>106761893
well that's kind of a relief to hear then
>>
>>106761901
based mods
>>
>>106761890
>Sora 2 is the new benchmark.
you can tell OpenAI could destroy the competition on image models as well, I wonder why they prefer to keep the piss filter instead
>>
I wish anon trained more LoRAs.
>>
PEAK KINO
https://files.catbox.moe/55wi6g.mp4
>>
>>106761866
damn that is p good
>>
File: Screenshot.jpg (26 KB, 505x234)
26 KB
26 KB JPG
>>106760815
>>
>>106761848
Shitty slop models still say they’ need em so who am I to argue. Whatever it takes for my 1girls.
>>
File: 1750413587300693.mp4 (1.48 MB, 640x640)
1.48 MB
1.48 MB MP4
>>106761895
the teal hair anime girl Miku Hatsune shakes hands with the blonde girl on the right, in the classroom.

welcome to class anon!
>>
File: ChromaRobocop_00017_.jpg (571 KB, 992x1456)
571 KB
571 KB JPG
>>106761919
Hear hear! This goes back to oven for 4k more steps
>>
>>106761895
Have you used v14 to compare? I’d upgrade but these rando mixes on Civitai have shown multiple times that whoever is cooking them is just throwing shit at a wall hoping it sticks and newer versions aren’t always better.
>>
>>106761866
I find it hard to believe that this isn't like a giant 200b model, it's capable of grasping all the subtleties of these styles. It's probably the first model that manages to really understand the world around it.
>>
>>106761866
They definitely trained on the 80s video marathons they have on Youtube.
>>
>>106761890
>Sora 2 is the new benchmark
Aren't they talking about an image model? What does Sora 2 got to do with it. While it's fun as a video I wouldn't call Sora vids image quality high. Can you even gen higher res on it?
>>
>>106761961
They're all sameslop anyway just use base.
>>
>>106761876
Where do you even find up scaling models? I’m a functional retard and have basically only found everything through civitai and there aren’t many Upscalers on there. I use one called “remacri” but it there’s something noticeably better out there
>>
>>106761895
If you want plastic just use qwen.
>>
>>106761957
>4k steps
Holy fuck are you for real? How many steps does it usually take with Chroma?
>>
File: 1729247725345413.mp4 (1.26 MB, 640x640)
1.26 MB
1.26 MB MP4
sora has no upskirts

wan does, without prompts. eat SHIT, saas users.
>>
>>106761966
>t. Hunyuan's marketing team
Why do you think you need 15 times the paramters as Wan? It's not even twice as good as Wan and Wan can approximate most styles out of the box. Also 200B is infeasible as a business model, it'd cost $2 in electricity to generate a video.
>>
>>106761966
>I find it hard to believe that this isn't like a giant 200b model
don't cope anon, this model is too big for your 3060 anyways lol
>>
>>106761961
yeah, it's not radically different but it has more data and characters, the aesthetic isn't too different but it seems good so far to me. cfg 7 20-30 steps works well with euler a.
>>
>>106761997
>Also 200B is infeasible as a business model, it'd cost $2 in electricity to generate a video.
look at deepseek, it's cheap as fuck and it's a 672b model, they can make big models as long as it's MoE
>>
>>106761991
Not that guy, but I use the default setting on 100 epochs and it ends up being around 1500-2000 steps
>>
every. one of my problems with wan 2.2 lately. have been because of bad FP8 quants across the board.
im gonna fucking. not even throw a fit. find my zen and just fap to some massive titties.
so, at this point now i can just tweak the lightning loras a bit to bring back some quality, lowering their strengths should do that right? tits and motion are looking a bit crusty and stiff.
>>
>>106761997
>Wan can approximate most styles out of the box
absolute bullshit, you're tripping my nigga
>>
Is there already technology that can take a photo of a person as a parameter and create a new image with that person in a different position, expression, clothes, or situation, without that person having been used to train an algorithm?

Example: photo of my neighbor
Prompt: girl dressed as a bunny in a cabaret
Result: the face generated on the "character" is that of my neighbor
>>
>>106761991
This is different because I got large dataset so I use gradient accumulation to speedrun epochs
>>
>>106762010
Wan is 16B. There is nothing that Sora 2 does that wouldn't even be feasible in 24B or 32B. Deepseek is also MoE, so it's ~30B during generation.
>>
>be me
>doing wan2.2 t2v
>they told me to use lightx lora cause it's fun
>getting drifting movements at 1280x720
>just fine at 720x480

what do? What is the recommended resolution?
>>
>>106762005
>more data and characters
That’s good a lot of the mixes seem to be stuck in the same dataset the original noobai used from 2024. Well disk space is cheap may as well try it thanks for the heads up
>>
>>106762026
>photo of my neighbor
rethink your life choices
>>
>>106762032
>large dataset
How large?
>gradient accumulation
I'd ask for a TLDR but I think that's something I can manage to read about on my own desu
>>
>>106762034
>Wan is 16B
it's 14b
>There is nothing that Sora 2 does that wouldn't even be feasible in 24B or 32B.
I hope you're right anon, deep down I wish it is true as well
>>
File: 1754712177309000.jpg (1.73 MB, 1664x2432)
1.73 MB
1.73 MB JPG
>>106761980
This website here. It's great.
https://openmodeldb.info/
>>
>>106762026
Obvious Qwen Edit 2509 is obvious

never tried myself though
>>
>>106762035
take a peek at the wan guide in the lazy getting started guide in OP
>>
>>106762041
I used my neighbor as an example because she didn't have her image trained in an algorithm like celebrities do.
She is 70 years old.
>>
>>106762051
385 images
>>
File: 1746532314756956.mp4 (1.85 MB, 640x640)
1.85 MB
1.85 MB MP4
make Miku bending over with Sora 2. I will wait SAAS anon.
>>
>>106762056
she ain't got no nose, jim
>>
>>106762067
Do you also run just adam + 2 batch size or do you have some unholy mixture of settings?
>>
https://files.catbox.moe/8hsubg.mp4
desu if it kills sloptubers I'm all for it
>>
>>106762086
michael got her nose hee hee
>>
>>106761895
This one's good enough to go in my slop folder
>>
>>106762052
It is true, I don't know why people think parameters are everything and the reality is the only people who want more parameters are the people who want to make sure you don't run anything locally. It's funny because you can literally see the benefits and the diminishing returns of parameters as you compare models.

From HDM 300m to Pixart 600m to Lumina 2.6B to Flux 12B, you can see that performance diminishes per parameter, even pruned Flux (8B) still maintains 99% of base Flux's text legibility and prompt adherence.

So realistically you're adding parameters from Wan's 14B to add knowledge capacity, not necessarily for video realism.
>>
>>106762087
adamw8bit 2 batch + 2 gradient + snakeoil settings I'm testing. Still wrangling concept bleed
>>
File: 1738360631795970.jpg (570 KB, 832x1216)
570 KB
570 KB JPG
>>106762086
Yes, I had to do (no nose:1.1) just to make sure of that.
>>
File: 1729528762452747.mp4 (1.32 MB, 640x640)
1.32 MB
1.32 MB MP4
Miku laughing at Scam Altman:

also, getting better wan 2.2 movement with a combo lora setup another anon suggested:

Wan 2.2 lora setup for ideal movement: 2.2 high 1 str into 2.1 lora 3 strength, 2.2 low lora 1 strength into 2.1 lora .25 strength.
>>
>so desprare for attention he will spend most of his time in a thread that doesn't welcome him
>only post in his containment zone as a means to cope with the humiliation from his defeat
>still can't make a good gen after all these years
Grim
>>
>>106762114
>I don't know why people think parameters are everything
look at wan 2.2 5b and wan 2.2 14b, one is an useless piece of shit and the other one is kinda cool, there's some threshold you need to surprass if you want to get closer to perfection, of course at some point you have diminishing returns, but I don't think it starts at 14b, it has to be higher
>>
>>106762135
20b? who knows.
>>
File: 1740589033337976.png (118 KB, 1457x547)
118 KB
118 KB PNG
>>106762132
*there are two separate lora paths in the workflow, it's like this:
>>
>>106762135
Dumb comparison, you don't know how much they even trained the 5B model when it's clear the flagship model was always the 14B.

But way to ignore CogX which is also 5B and the one people were proudly showing Tom and Jerry cartoons from last thread.
>>
>>106762132
Miku left you for Sora 2, sorry anon
https://files.catbox.moe/1c3h2s.mp4
>>
how detailed are you suppose to describe wan? can i just say "she nervously looks around", or do you have to say "she has an expression of fear and panic on her face, she quickly looks to the left then quickly looks to the right. she bends her knees slightly" etc
>>
>>106762151
>But way to ignore CogX which is also 5B
this is even worse than Wan 2.2 5b, what are you smoking anon?
>>
>>106762173
But even with Wan you can curve between 1.3B -> 5B and 14B and realize there's clearly a diminishing returns curve. So anyone thinking even going to 24B is going to result in anything fantastic is crazy.
>>
>>106762190
Idk, is 14b enough to have Sora 2's knowledge, on the llm space, if it's under 100b it doesn't remember some less mainstream details and trivia
>>
>>106762200
I'm suggesting that even going from 14B to say 18B focusing on knowledge capacity would likely be more than enough.
>>
Dataset dataset dataset. It's all about the dataset.
>>
>>106762214
And yes, to even have Wan do 80s late night talk shows someone has to put that in the dataset and properly caption it.
>>
>>106762214
This, OpenAI likely spent millions of dollars and years manually annotating a huge dataset, which is difficult to replicate.
>>
File: 1741583119201902.mp4 (1.93 MB, 640x640)
1.93 MB
1.93 MB MP4
guys I think the monitor is defective...
>>
>>106762228
It's more likely a SOTA caption model. We can only dream of a hand captioned model.
>>
There is no way Sora 2 fits within anything under 50b. Especially not with audio. The fact that it can copy a variety of video sources (spongebob, bob ross, english anime dubs, 90s commercials) while maintaining proper voice/audio theme is quite impressive. You might get something from China that can do audio/video to the same level, but it won't have the wide range of knowledge that Sora 2 has just like how Flux had better comprehension than Dall-E 3 but only a fraction of the knowledge.
>>
>>106762214
...which is all about the VLM
>>
>>106762241
Crazy idea:
- 2B audio generator
- 20B video generator
>>
>>106762235
>>106762245
but you need to train a caption model to be able to recognize all those characters and styles too, so... at some point you need manually annotated data to do that
>>
>>106762241
>There is no way Sora 2 fits within anything under 50b. Especially not with audio.
I'd say it's a 100b MoE model with 15b active parameters, it's fast enough so it's cheap, but big enough to memorize all the concepts
>>
There is an equivalent of image2video, but like image2image, but not the ones that generate an image similar to the input, but one that takes, for example, a picture of my teacher and returns my teacher naked smoking a cigarette, according to the prompt .
Thank you.
>>
File: Michelle T 4576.mp4 (3.46 MB, 1056x768)
3.46 MB
3.46 MB MP4
Whoever created this masterpiece, is a true god. I encourage you to continue with this most righteous endeavor. :D
God Speed >
>>
>>106762264
Obviously posted by an >>>/b/AI+parody chad
>>
>>106762264
imagine
>>
the tiddimigu scared me
>>
>>106762271
No, it was one of your fellow guru's here in g/ldg , I saved it from here about two weeks ago.
>>
>check other ai generals from different boards
>almost all of them are eerily civil

goddamn, get it together /ldg/
>>
>>106762285
it's civil at the moment, and we're talking about that API model, I like it, it should be like /lmg/, it's all right to speculate about SOTA models and see what make them so special
>>
It's annoying that you have scrape danbooru if you want a proper dataset as everything on HF has no tags/metadata.
>>
>>106759949
>>106760002
>>106760064
>>106760098
>>106760126
>>106760148
>>106760237
The results are quite amazing, are you using the default comfy workflow?
Also how does it handle LoRA's with concepts it doesn't understand?
>>
>>106762299
open source always wins. qwen edit v2 > nano banana. also because no censorship, but it's arguably even more versatile.
>>
>>106762308
>qwen edit v2 > nano banana
I want to know what you're smoking m8
>>
>>106762309
>she is nude
>>
Needless to say but i'l say it anyway since we're all probably on this page by now

parameters are a moot point, its the fucking giganigger fuckhuge captioned dataset that gives Sora its power level.
Question at this point should be, are the chinese bold enough to play dataset chicken with wan 3.0 or other potentially new models?
>>
>>106762303
yes, default workflow with Qwen-Image-Lightning-8steps-V2.0.safetensors at 8 steps. it actually works better than qwen edit v1, but you can try both.

it works 100% fine with loras and also with multiple image inputs, just reference them as "image2" or "image3".

ie: the anime girl is wearing the outfit in image2 (image2 being an outfit, on someone or cropped).
>>
New
>>106762321
New
>>106762321
New
>>106762321
New
>>106762321
New
>>106762321
>>
>>106762318
>Question at this point should be, are the chinese bold enough to play dataset chicken with wan 3.0 or other potentially new models?
it'll be an API model like wan 2.5 so I don't really care about what they will be doing
>>
>>106762318
I think the Chinese are incapable of paying that much attention to detail. They're all about be flashy as a culture and putting corncobs in the concrete. I don't think you can expect them to be that serious about the dataset outside of raw volume unless one of their researchers gets personally invested in accuracy.
>>
any good noob/Illustrious checkpoint/lora for classic fantasy characters and creatures? I wanted to see if I could do paper minis for dnd but my setup makes the goblins too uguu kawaii.
>>
>>106762302
>>106762341
>>
>>106760610
>completely unfazed by a member of the normandy appearing via portal
what a chad



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.