[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106772671

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Neta Lumina
https://huggingface.co/neta-art/Neta-Lumina
https://civitai.com/models/1790792?modelVersionId=2203741
https://neta-lumina-style.tz03.xyz/

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
So anons are basically keeping the /sdg/ up to pacify this schizo that is currently seething at us because of something he did to himself?
>>
>>106774530
>everyone I don't like is debo
>>
why did anons give up on seedream? it's available locally through comfyui api you know
>>
>>106774533
Why do you think this going to work when you follow the same exact pattern of seething for over 3 years where it always stops whenever you're sleeping and a rentry in the OP showing you doing exactly this and getting caught?
At this point I think you're only parroting this for your own sake.
>>
ace step is terrible, it cant seem to output a continuous beat, its all over the place
>>
>>106774530
I don't get your complaint, the previous thread was cozy and we were talking about what's the right ingredients to put to have a good model
>>
good start desu
>>
>>106774542
>the same exact pattern
you think you are a great noticer, but you are overestimating yourself lol
>>
>>106774541
>available locally through comfyui api
kek
fastest way to make 'em seethe
>>
Blessed thread of frenship
>>
>>106774556
So the crew is you Illuminati drunk and two other shit stirrers at most?
Your drunken friend shat the bed last night too and we called him out for it. You're not clever or smart you're a bunch of broken losers that have nothing better to do than cry after pushing everyone away.
>>
>>106774515
>I wonder how many rupees Sam pays the indians to shill on 4chan.
this nigga thinks Sam has to shill his model on a thread composed of 30 autistic guys to be succesful, come on anon, we're not that important
>>
>>106774545
such is the nature of local models. if you dont want tablescraps might i suggest Udio?
>>
This thread must be important if anon feels the desire to troll here so hard.
>>
>>106774567
i will probably download fruity loops
>>
>>106774565
that's right anon, we are all debo, and we conspired with drunk schizos to make you seethe, you, and only you, because you are the main character in a dome simulating the world, you are Truman anon, now you know the truth!
https://www.youtube.com/watch?v=dlnmQbPGuls
>>
I am only here for the Sora gens, can someone generate some Mikus or something?
>>
>>106774569
3 years and counting, can you imagine spending every waking moment of your life trolling being obvious and then saying "it's not me" even though there is documented proof at the start of the thread.
He really has nothing else in life and it shows and the small group of retards that follow him can't even get respect in this thread. It's kind of sad when you think about it and when you call them by name they recoil.
>>
>>106774582
>"he made me miserable for 3 years, 7 months, 24 days, 7 hours and 10 seconds!!"
damn, debo is so good at mindfucking some autists desu
>>
>>106774578
>I am only here for the Sora gens
desu I wouldn't mind an unified general that lets people talk about local and API, at least there wouldn't be a war everytime someone wants to talk the advancment of the diffusion ecosystem (which always happens in the API side let's face it)
>>
>>106774519
It's hypocritical to use 2 gens made with Anistudio in your collage while ignoring the dev and not crediting him in the OP.
>>
>>106774582
>>106774591
>>106774575
>>106774565
all me (debo) btw
>>
File: 1754659074626132.jpg (130 KB, 704x1280)
130 KB
130 KB JPG
>>106774578
https://litter.catbox.moe/sbmjndnoehs9y3az.mp4
>>
File: 1736651112743255.mp4 (936 KB, 480x672)
936 KB
936 KB MP4
>>
>>106774594
kill yourself
>>
>same exact pattern
>why can anons point me out
We know you retards sit on discord seething because you couldn't be popular avatarfags just cope with it already and move on!
>>
File: 1751985422927124.png (628 KB, 800x600)
628 KB
628 KB PNG
>>106774607
>I see the patterns anon I see them, why can't everyone see it with me??!!
>>
>>106774519
I made those Anistudio images. Either credit Ani or take them out of the collage. You posted my AI gens without asking me. Remove them.
>>
>>106774578
>I am only here for the Sora gens
same, they are way more funny since they have sound
>>
>>106774621
>they are way more funny since they have sound
now imagine if alibaba didn't cuck us
>>
>>106774578
>can someone generate some Mikus or something?
still the best migu gen from Sora 2 imo, I like the song
https://files.catbox.moe/1c3h2s.mp4
>>
*yawn*
>>
This just solidifies /ldg/ won, they have to to spergout multiple times a week all day long.
Feels good man, I can see seething while working, I can see seething after the gym. I can see seething after a meal with my lady.
While they have no actual skill or improvement and only post here out of spite.
>>
>>106774633
That style is kino poast moar
>>
>>106774626
>now imagine if alibaba didn't cuck us
they went to the API battle too soon imo, I don't know how they're gonna be relevant at all, Veo 3 and Sora 2 are still miles ahead Wan 2.5
>>
yes please post more to show those saaskeks who really owns this place. perhaps stay awake for the next 12 hours, just in case they try to raid!
>>
I have genned tons of stuff since it doesn't take long, do people ACTUALLY pay money for AI gen content?
>>
>>106774603
With pleasure. But remove my images from the collage if you’re not going to credit my friend Ani.
I posted the images to show that the UI was working and so the OP would add it.
You guys didn’t do that.
Remove my images, you’re taking credit for other people’s work.
>>
>>106774649
We call them "cucks" not people
>>
>>106774655
I find it funny this only starts when the 3000 series is showing it's age with newer models. It's really telling who doesn't have a job and who does.
>>
>>106774649
>do people ACTUALLY pay money for AI gen content?
you have to pay a good GPU to make some local render, how is that different?
>>
>>106774618
>>106774650

>>106772757
>>
>>106774662
I gen cause my Nvidia gpu can do it, the primary purpose is games. otherwise i'd use an integrated GPU.

wan 2.2, qwen edit, qwen/flux/SDXL, noob/illustrious, even audio gen, all of it is free/open source.
>>
>>106774677
let's not pretend you don't need money to run wan 2.2, if you can't afford a 3090 minimum you're gonna have a miserable time
>>
>he can't afford a decent card
I can't believe poorfags post here
>>
>>106774519
Please, bake another /ldg/, but without Anistudio's images. You’re not going to profit off someone you actively push away.
>>
>>106774578
https://files.catbox.moe/lr0v8v.mp4
At least with this technology we will be able to save some series and prevent them from remaining mid.
>>
>>106774691
>moving the goalpost
we went from "omg they're paying money to do AI gen that's crazy, fucking API paypigs ohohoh" to "whaaat? you don't have thousands of dollars to buy that gpu so that you can run that local model, what are you poor??"
>>
>>106774628
this
>>
>>106774683
4080 works perfectly fine with wan 2.2 Q8. even if the model is 20GB some just loads into ram. doesn't take long at all. same with qwen edit.
>>
Sora is free, what's with the local meltdown over it being some pay2gen scam?
>>
>>106774712
ok now use wan without the lightning loras
>>
>>106774716
this, imagine paying a thousand dollar GPU to run mid models (local)
>>
File: 00129-2747509829.png (2.37 MB, 1240x1240)
2.37 MB
2.37 MB PNG
>>106774691
He's been doing this cope for days, I think his parents won't buy him a new card so he's taking it out on us. It's almost as if we don't use these cards for multiple purposes.
>>
geez what got under anons skin? youd think a local model killed his family or something
>>
>>106774725
The existence of this thread triggers him and a small group of vain retards that wanted to make the entire thread about them.
>>
>>
File: please.png (11 KB, 185x298)
11 KB
11 KB PNG
>>106774725
>>106774738
>exactly one minute apart
anon, if you want to talk to yourself and samefag, at least try to be discreet about it
>>
ok lets re-rail the thread localbros. we have a lot of good stuff to talk about like wan2.5--- ermmmm hmmm... well, hunyuan just released a new model, hunyuan 3 80b available in comfyu- oh.... well how about chroma......
>>
sora 2 is uncensored on comfy api btw
>>
File: (You).png (897 KB, 840x1232)
897 KB
897 KB PNG
>>106774662
>getting an uber ride in some jeets curry wagon is the same as owning a car
just say you're poor
you would have a good gpu even if ai didn't exist, if you weren't poor
>>
>>106774744
anistudio, bitch
>>
>>106774725
I don't have the computes to run anything so I just seethe
>>
>>106774745
jokes aside, I don't think Comfy has integrated Sora 2 yet right?
>>
>>106774718
why? they work well
>>
>>106774627
more coherent than anything I got
https://litter.catbox.moe/owv1fvrj0xv74uu3.mp4
it's pretty shit desu
>>
>>106774749
KEEEEEEEEKKKKK its so bad i forgot about it ahahahahaha
>>
>>106774748
>just say you're poor
>>106774649
>do people ACTUALLY pay money for AI gen content?
>>
>>106774757
poverty is paying for subscription services instead of owning the superior product outright
>>
>>106774754
maybe only the 1080p version is the good one?
>>
>>106774753
>they work well
spoken like someone who has never genned a raw gen on wan
>>
>>106774757
I dont see why you'd spend $200 on prompts instead of $200 in PC hardware.
>>
localkeks:
>YOU'RE POOR imagine paying money for ai gens!
also localkeks:
>OMG when is the new 6090 gonna come out so i can spend $5k to wait 2 minutes per gen on a 4bit quanted wan!
>>
>>106774759
>the superior product
holy cope, do you seriously believe Wan 2.2 is superior to Sora 2?
>>
>>106774769
>$200 on prompts
youre exaggerating, he can only afford a couple rupees worth
>>
>>106774775
why aren't you comparing to the latest 2.5 wan via comfy api nodes?
>>
>>106774769
except that it's 20 dollars per month if you want to use sora 2
>>
File: 00138-1668900267.png (2.51 MB, 1240x1240)
2.51 MB
2.51 MB PNG
You have to feel bad for them, they realized they can't move anything and never could so they can't even post gens because they will get laughed at. They always had that problem which is why they got jealous of other anons and used to harass them. Now they can't keep a thread afloat and they lash out like children.
I tried to warn them but they didn't listen ~
>>
File: 1655131543668.png (225 KB, 600x600)
225 KB
225 KB PNG
>>106774696
>southpark
!!!
>>
>>106774775
I don't use video gen personally, but if I could only generate what these jewish demons allowed then this would all be completely worthless off the bat for free, forget about paying for it lmao.
Your poverty permeates every thought you have, it's truly a mindset.
>>
File: 00167-2216998198.png (2.37 MB, 1080x1920)
2.37 MB
2.37 MB PNG
>>
>>106774748
>>getting an uber ride in some jeets curry wagon is the same as owning a car
I can't think of a counter argument to this. It's really that simple.
>>
File: kek.png (280 KB, 960x730)
280 KB
280 KB PNG
>>106774725
>how dare you call local models mid??? they are perfect! leave this cult right now!
imagine protecting billion dollars companies (Alibaba and Tencent)
>>
>>106774519
This is your real collage.
You’re not going to use my art to profit and brag about thread quality while ignoring the person who helped me made this art possible.
>>
>>106774582
Seems like you are butthurt.
>>
>>106774784
paying for nothing is bad
>>
>>106774519
>Discussion of Free and Open Source Text-to-Image/Video Models and UI
crazy how much seethe and cope this simple phrase spurs
>>
File: this is sora 2.png (1.1 MB, 1920x1080)
1.1 MB
1.1 MB PNG
>>106774748
>>106774790
>renting a Pagani Huayra is inferior than buying a Toyota Corolla
meh, that's debatable
>>
>>106774507
If only Chroma wasn't mangled to hell, it would be further within normie graces since it's the most unslopped T2I model in existence (even including SaaS, which is crazy)
>>
>>106774808
you're renting a pagani huayra with a 10mph speed limiter
>>
File: 00143-3000492624.png (2.7 MB, 1240x1240)
2.7 MB
2.7 MB PNG
>>106774804
It's because they have nothing left, if you were around before the split they got this desperate multiple times whenever anon got tired of their shit. Now they are alone and only kept afloat by anons giving them pity bumps to prevent this meltdown from happening. They used to have these meltdowns every couple of months now it's every couple of days.
Can't really do much until the mods step in and ban them again. They will come back but at least it opens up more post.
>>
>>106774815
>a 10mph speed limiter
local models also have a limiter, don't tell me that having a base local model not knowing nudity, characters, celebrities, styles... is not limiting
>>
File: 1729477802005178.png (109 KB, 1083x456)
109 KB
109 KB PNG
unfortunately we can't locally diffuse Sora 2 yet but it's in the works
>>
I don't get the "local models don't know celebs and nudes" thing when the celeb deepfake threads are exclusively local gen
>>
>>106774823
>we can't locally diffuse Sora 2 yet but it's in the works
at no point your image suggests that lol
>>
We seriously need to start calling the thread "Offline Diffusion General" so API trolls "jokes" stop making sense and they fuck off
>>
>>106774822
>don't tell me that having a base local model not knowing nudity
You are not generating nudity with anything you're paying $20 a month for. I am generating it for free continuously.
>celebrities
Go grab a lora from the /b/ thread. For free. Which you can actually use to make what you want. For free.
>>
>>106774853
Seedance 1.0 PRO does nudity just fine doe?
>>
>>106774795
Anon, they’ve got someone going thread to thread telling anons to “go back” to /ldg/ >>106773171
What do you expect? Maid posters are board universal and top right pic also post in /sdg/

Most of the good art in their collage isn’t even from real /ldg/. Real /ldg/ all they do is spam Mikus and Deus Ex gens between slop models.
>>
>>106774848
It's never about having a discussion it's about wasting your time followed by spamming once he doesn't get engagement. I guess this will keep our thread bumped but he won't stop until his caretaker tells him it's bedtime.
>>
>>106774844
I like the "Unified Diffusion General" name desu, talk about everything (API or local) as long as it's diffusion
>>
>>106774835
just ignore him hell start arguing about semantics even though you have to have less than one braincell to think its local and open source
>>
>>106774830
Because people are training celebrity loras, but the base models don't know jack shit while some of the SaaS ones seems to know almost every mainstream celebrity out of the box
>>
>>106774853
>Go grab a lora from the /b/ thread. For free.
shut the fuck up with your loras, they'll never replace the model knowing the concept by itself, you can't stack loras so good luck having more than 3 characters in there, Sora 2 can do fucking FF7 x 5 MLP characters in one go, no need for lora cope, you guys are fucking annoying, stop excusing companies not putting concepts on their base models, it should be mendatory ffs
https://files.catbox.moe/1seqwp.mp4
>>
>>106774867
But they won't let you do nudity so local wins that one
>>106774870
But you can't generate nude celebs and localchads can so I think you lose that one
>>
>>106774870
Why are you spamming the local diffusion thread about ChatGPT
>>
>you guys are fucking annoying
weird how youre still here you dont have to get so upset
>>
>>106774879
curious that you're only sending the message to the one that won the argument, and not him (or (You)) >>106774853
>>
>>106774879
See
>>106774857
It's that simple, we have seen him do this nearly daily at this point. What else can we do but hope the mod gets off his ass and clean this up which they always do once the thread already goes to shit.
>>
>>106774887
>please, you should listen to my retarded arguments and cope
no, you should be dogpilled if you say retarded shit, where the fuck do you think you are? this is 4chan motherfucker, if you say something dumb, you goddam right we're gonna remind you about it
>>
>>106774858
We need an AI board asap. I really don't get why jannies won't do it, it would literally please everyone, both AIfags and the /g/ users that hate the AI thread spam
>>
Go listen to some music and cool off, Igor
>>
>>106774870
>stop excusing companies not putting concepts on their base models, it should be mendatory ffs
this
>>
>>106774899
>We need an AI board asap. I really don't get why jannies won't do it
me neither, AI is like the most important technology of this century, and god knows there's a lot of useless board on 4chan
>>
>>106774870
one of the more impressive sora vids I've seen
>>
File: 00141-3000492622.png (2.46 MB, 1240x1240)
2.46 MB
2.46 MB PNG
>>106774899
I'm for this because that would cause the people that hate this thread to suffer the most. All of us can discuss and enjoy the space and go anon, they want to be known and are holding multi year grudges because they couldn't be popular on a 4chan thread.
>>
>>106774739
spooky
>>
File: 1738434378258539.png (1.87 MB, 1024x1024)
1.87 MB
1.87 MB PNG
>>106774835
once they release the API we can defuse locally with ComfyUI
>>
File: ComfyUI_01636_.png (1.28 MB, 1216x832)
1.28 MB
1.28 MB PNG
ASCENDING TO JOIN THE WAN CHADS

testing some of my old difficult prompts on this model (wan 2.2 low noise) for t2i. it struggles with this prompt much more than chroma or qwen. pic is without the lightx lora.
https://files.catbox.moe/b2bze5.png

please crit my WF if you think wan can do significantly better than this
>>
>>106774923
yeah, it seems like this model has solved concept bleeding, they probably have some secret sauce in there, having a good quality dataset is one thing, but I suspect a new architecture as well
>>
>>106774938
sweet gen dude
>>
>>106774938
>it struggles with this prompt much more than chroma or qwen
Not surprising. Wan t2i is very cool but not really better than any modern strictly t2i model.
>>
>>106774811
how is it unslopped, it's a finetune of a super distiled model and it shows a lot of the time
>>
>>106774953
a good way of dealing with shitty data is likely the secret sauce. Everyone is grabbing the same shit. Hopefully we'll get a paper soon.
>>
>>106774892
Endleslly complaining about a schizo bogeyman should be a bannable offence btw.
>>
>https://aaxwaz.github.io/Ovi/
>https://github.com/character-ai/Ovi
Breadcrumbs for local that want a Sora 2 at home.
>>
>>106774982
>https://aaxwaz.github.io/Ovi/
wait, it's actually not bad at all
https://files.catbox.moe/900269.mp4
>>
File: 1744401034045120.png (2.14 MB, 1440x1120)
2.14 MB
2.14 MB PNG
>>106774955
I have seen wan gens that btfo chroma and qwen, especially crowds engaging in actions.

>>106774954
thanks, here's chroma for comparison (it fucks this up a lot too tho)
>>
>>106774982
>>106774994
WAN 2.5 BTFO
>>
>>106774982
weights tomorrow, apparently
>>
File: 1742492374889462.png (94 KB, 1392x600)
94 KB
94 KB PNG
>>106774982
I thought it was another Wan finetune but it's actually a new model, that one is 11b
>>
>>106774935
Thanks
>>
File: 1745674982897328.png (1.82 MB, 1440x1120)
1.82 MB
1.82 MB PNG
>>106775000
>>106774954
and qwen (most accurate to prompt tho lacking in style)

I'm trying wan t2i because the wanchads ITT have posted gens that chroma and qwen simply can't do. It seems it definitely does have its own weaknesses, but I'll continue to experiment.
>>
>>106775000
>I have seen wan gens that btfo chroma and qwen, especially crowds engaging in actions.
Absolutely. I'm only trying to point out (and I think it's a logical statement to say) that models designed for t2i are better at it overall than ones who only happen to be able to do it.
>please crit my WF if you think wan can do significantly better than this
I admit I probably can't help you much but I know there exists a single packaged wan sft that can to t2i. I thought it was linked in the Comfy github examples page but I can't find it.
>>
>>106774982
>>106774994
I have a single question, why is 4chan not allowing audio on embedded videos in the year 2025 of our lord? it's gonna be annoying spamming catboxes just to showcase that new model
>>
>>106775025
/gif/ and blue version exist, go make a thread there
>>
>>106775030
>go make a thread there
why? this is a local model that can do sound
>>
meant to say >>106775030 about the sora catboxes posted in these threads
>>
>>106774982
>Character AI
>Yale University
was about time murica would step a foot on the local space
>>
File: lets gooo.png (51 KB, 202x249)
51 KB
51 KB PNG
>>106774982
>Alibaba betrayed us
>2 weeks later, another company takes its place
kek, it's like we can't lose or something, we'll never run out of companies willing to save us, we're so back!
>>
Does qwen edit not do NSFW / nudity?
>>
>>106775057
>it's like we can't lose or something
>implying characterAI doesn't have something much better that they're keeping for themselves
>>
>>106775057
fucking holy this
>>
>>106775067
if the local one is better than wan 2.2 I'll take it
>>
>>106774982
now surely it's uncensored too, right?
>>
>>106775074
if you define "uncensored" like a faggot than no, but obviously it is
>>
>>106775074
we'll use loras like we used on Wan lol
>>
>>106775084
>>106775082
i hope they pull a microsoft and rugpull the training code
>>
>>106774982
>We use open-sourced checkpoints from Wan and MMAudio, and thus we will need to download them from huggingface
lol
>>
>>106775096
>We use open-sourced checkpoints from Wan
Wan? but that's a 11b model, did they prune Wan or something?
>>
>>106775101
it would be funny if 11b is just MMAudio + Wan 5b
>>
>>106775096
Nowhere in the paper do they say this
>>
>>106775106
unlikely, MMAudio is like 1b lol
>>
>>106775110
>https://github.com/character-ai/Ovi
>Download Weights
>We use open-sourced checkpoints from Wan and MMAudio, and thus we will need to download them from huggingface
>>
>>106774982
>THEY ARE OUT OF CONTROOOL OOWWW
kek
>>
>>106775116
>Wan2.2: Our video branch is initialized from the Wan2.2 repository
>MMAudio: Our audio encoder and decoder components are borrowed from the MMAudio project. Some ideas are also inspired from them.
Read.
>>
File: 1730349715607834.mp4 (1.16 MB, 640x640)
1.16 MB
1.16 MB MP4
>update comfy
>new template for wan 2.2 i2v
seems decent? just added interpolation node and it works fine with 2.2 loras.
>>
>>106775116
>>106775124
So... they're just using the code of Wan 2.2 but they made their own 11b model with it?
>>
>>106775131
probably just wan 2.2 5b finetuned + custom audio arch
>>
>>106774982
>T5 conditioning
>>
>>106774982
there's already a model released, that's probably the 11b version
https://huggingface.co/chetwinlow1/Ovi/blob/main/model.safetensors
>>
>>106775147
oof, 6b just for the audio, that's brutal
>>
File: 1757678670863830.mp4 (688 KB, 640x640)
688 KB
688 KB MP4
>>
>>106774994
eh...
https://huggingface.co/chetwinlow1/Ovi/resolve/main/assets/ovi_trailer.mp4
>>
>>106775168
if that's Wan 5b it doesn't look bad, they probably made it less shit with additional finetuning lol
>>
File: Screenshot_218.png (8 KB, 302x266)
8 KB
8 KB PNG
>>106774982
uh oh it loses to wan 5b
>>
>>106775179
ahahah, DOA...
https://www.youtube.com/watch?v=LHZIc3B4kfE
>>
File: 1758627684196955.png (166 KB, 1778x825)
166 KB
166 KB PNG
>>106774982
>>106775179
https://arxiv.org/pdf/2510.01284
So they finetuned Wan 2.2 5b and they made it worse? Great job guys!
>>
File: 1755165157613370.mp4 (659 KB, 640x640)
659 KB
659 KB MP4
>>106775167
>>
File: LOOOOOOL.png (415 KB, 857x1200)
415 KB
415 KB PNG
>>106775179
>>106775197
>We note, however, a slight degradation in video quality relative to the Wan2.2 base model, which is expected given that our joint training relies on a narrower audio-video dataset compared to the large-scale pretraining corpus used for Wan2.2. Importantly, this trade-off is marginal and does not diminish the overall superiority of OVI in joint audio-video generation.
OH NONONONONO
>>
File: ComfyUI_10028_.png (2.58 MB, 1152x1152)
2.58 MB
2.58 MB PNG
>>106774959
you may not like it, but it's the only T2I model that consistently produces images that actually look like real photos
>>
File: 1756524360364735.png (1.22 MB, 1080x906)
1.22 MB
1.22 MB PNG
>>106775207
it was more realistic before, you know it, I know it, everyone knows it
>>
>the more a model is trained, the sloppier it gets
hmmmm perhaps this is where the real research needs to go
>>
File: ComfyUI_00010_.png (2.95 MB, 1336x1952)
2.95 MB
2.95 MB PNG
>>
>>106774982
the sound quality is really decent, but goddam, having to add a 6b audio model on top of your video model is a tough pill to swallow, fucking Nvdia limiting the local space with their low VRAM
>>
>>106775224
wym nvidia gave us the RTX PRO 6000
>>
>>106775220
no, he changed the way it has been trained at v30, he aimed for a model that would work at low steps, and the tradeoff was more slop, he killed the realism to please his vramlet tranny discord retarded friends, let that sink in
>>
>>106775228
for the low price of 10000 dollars!
>>
>>106774937
that is clearly unreal
>>
>>106775179
>finetune a model
>the model gets worse
lmao
>>
>>106775179
Isnt wan 5b kinda shit already? This is a terrible sign. No wonder it's open sourced.
>>
>>106775230
oh so just like that sdxl lightning/turbo crap? more fast = more slopped, always. why do this shit while the model is still 'locking in'? it just destroys future epochs
>>
>>106775258
>Isnt wan 5b kinda shit already?
yep it's absolute garbage
>>106775262
>why do this shit
he's a retarded furry, that's pretty much it
>>
>>106775238
if you had 10k to burn you wouldn't buy one?
>>
>>106775270
I would, but I'm too poor for that :(
>>
File: 1754731921818271.png (108 KB, 1800x741)
108 KB
108 KB PNG
>>106774982
you know your model is dogshit when you have no choice but to compare to the worst models in existence kek
>>
So which startup is going to train a video model on all Youtube videos, movie scenes, music videos, TV shows, TikToks, internet videos in general, and Anime, then release the model as open weights?

What is the answer, anons? I don't want local to be cooked but Sora 2 is too good, give me hope.
>>
>>106775295
it'll never happen anon lol
>>
>>106775295
>which startup
A French one no doubt
>>
>>106775308
trust the Mistral
>>
>>106775295
do you have the resources to
1. store a dataset that big
2. efficiently clean it and caption it
3. do several test runs before a big training run
>>
>>106775295
literally wan. you think models are just rip off data and hit 'go'? Shit is hard bro.
>>
>>106775295
Why train on youtube videos when you could just train on Sora 2 videos? OpenAI already did the hard work for us
>t. china
>>
>>106775315
I was about to have a meltdown then I saw the 2nd line, kek
>>
>>106775295
anon, if a company can do the miracle move of having a sora 2 equivalent, you know damn well they're gonna keep it for themselves, a good AI model is like an infinite money printer, a shit ton of people will buy the subscription if it's good
>>
>>106775312
>1. store a dataset that big
SSD's aren't that expensive bro
>>
>>106775342
sir please lend me your 4PB SSD cluster
>>
>>106775348
relative to the cost of training the model, it's peanuts
>>
>>106775314
>Shit is hard bro.
exactly, that's why only one company in the world reached that level, and it's not like OpenAI will release a paper revealting their secret sauce, they know they have something special in their hands, and they intend of keeping it for as long as they can
>>
>>106775295
>So which startup is going to train a video model on all Youtube videos, movie scenes, music videos, TV shows, TikToks, internet videos in general, and Anime
they pretty much all do that, the difference with Sora 2 is that they proprely and manually annotate their clips, you have no idea how much work this shit is, we're talking about billions of clips there
>>
>>106775295
local hasn't even caught up to dall-e 3 which was released over 2 years ago. nobody releasing local models understands datasets, and perhaps none of us understands them either. we have a long long way to go
>>
File: 1758511556427836.mp4 (843 KB, 480x640)
843 KB
843 KB MP4
euler/beta scheduler

latest 2.2 wan i2v template workflow in comfy, with 2.1 lora for high noise at 3 strength, 2.1 lora for low at 1 strength.

seems to fix slow mo issue there can occasionally be?
>>
Ah... I remember my first cool cloud model release
>>
>>106775295
no one will replicate that, the only cope would be OpenAI releasing Sora 3, and this shit is like 99% the reality, and the chinks will train their models on Sora 3's outputs and call it a day, and then maybe it'll reach Sora 2's level kek
>>
>>106775397
>seems to fix slow mo issu
tested with some more dynamic motion?
>>
>>106775362
they literally have released papers in the past talking about cutting edge techniques retard. You think those fuckers don't have ego?
>>
>>106775409
>they literally have released papers in the past talking about cutting edge techniques
Still waiting for Sora 1's paper ahahah
>>
>>
File: 1753626525442859.mp4 (1.11 MB, 480x640)
1.11 MB
1.11 MB MP4
>>106775408
miku (my test case) seems to be more mobile: prompt is, the anime girl is dancing

also im trying the comfy template workflow which also has diff shift values than my old one, it seems better? (esp with the 3 high 1 low lora strength, but with the 2.1 lora)
>>
>>106775295
desu I still don't understand how they managed to reach that level, sora 1 was dogshit and then boom, they made a SOTA model that beats Veo 3, OpenAI is sure a surprising company, they are capable of the best and the worst at the same time
>>
why is stable-diffusion.cpp slower than comfyui with amd? but also it can do cpu, which is (slow) very cool.
>>
File: 1740034469891157.png (1.38 MB, 1216x832)
1.38 MB
1.38 MB PNG
wan t2i has a level of dynamism and consistency that no other local t2i can match. it's just not as good at text or complex visual concepts.
>>
>>106775442
looks like some random slop you'd see on Qwen Image
>>
>>106775386
>local hasn't even caught up to dall-e 3 which was released over 2 years ago.
I don't get that glazing over dalle 3, this shit is ass at humans it's obvious it looks AI generated
>>
>>106775459
if your only benchmark is blurry analog realism, you wouldn't understand
>>
File: 1745871721235351.mp4 (1.05 MB, 640x640)
1.05 MB
1.05 MB MP4
>>106775421
the men sitting down stand up and clap.

maybe the shift value in my old workflow was bad, also the ksampler was slightly diff

this works though.
>>
>>106775459
nostalgia, heavy nostalgia
>>
https://files.catbox.moe/wd7ecw.mp4
hee hee
>>
>>106775421
workflow catbox? curious
>>
File: 1739191701600679.png (2.03 MB, 1440x1120)
2.03 MB
2.03 MB PNG
>>106775444
nah this is qwen on the same prompt.
>>
>>106775409
>Chinese need a paper to tell them to stop slopping datasets
c'mon anon, like OAI would do that
>>
File: 1755267496590007.mp4 (881 KB, 640x640)
881 KB
881 KB MP4
>>106775468
>>
>>106775485
kek, this
>>
>>106775474
this would take a 200 node workflow and 4 hours to do locally btw. you would need to properly segment and combine two unique character loras because lora tech is still incapable of blending naturally into a model and requires multiple layers of copewrangling just to get two lora characters to stand next to each other
>>
File: 1749583364686509.png (1.6 MB, 1216x832)
1.6 MB
1.6 MB PNG
wan t2i... very basic workflow, I bet I can improve this wf a lot.
>>
I give it until the end of next week max.
>>
>>106775475
it's just the comfy i2v template one, with the 2.1 lightx2v lora, 6 steps total (3/3).
>>
File: 1728198750157284.mp4 (967 KB, 640x640)
967 KB
967 KB MP4
the men put on chinese rice hats and hold a bowl of white rice.

yep, this works pretty well. latest comfy 2.2 i2v template workflow, with the 2.1 lightx2v lora, 3 strength for high noise, 1 strength for low noise, 6 steps total.
>>
>>106775502
>cryptic message no one understand
debo wrote this, my pattern sensors are tingling!
>>
>>106775498
>this would take a 200 node workflow and 4 hours to do locally btw.
and it will never look as realistic as what Sora 2 did, it's tough to be a localkek these days...
>>
File: 1733942725506944.mp4 (1.67 MB, 640x640)
1.67 MB
1.67 MB MP4
the pink inflated anime girl stands up and runs very fast across the water to the left.

kek
>>
>>106775543
nice
>>
https://files.catbox.moe/7tfsgy.mp4
BAYONETTAAAAAA
>>
>>106775493
>ar15 floats in
>>
>>106774649
I'm subbed to some niche ai content creators on patreon. Just because l "could" gen it myself doesn't mean i "know how" to do it as well as the op of said gens I'm interested in viewing.
>>
>>106774982
> same 5secs restriction
>>
File: 1736057996292954.mp4 (1.47 MB, 640x640)
1.47 MB
1.47 MB MP4
the pink inflated anime girl stands up and runs extremely fast across the water.

quantifying the speed to see if there was a difference, it works.
>>
>>106775631
of course, since it's just a finetune of Wan 2.2 5b
>>
>>106775603
How did you get this way?
>>
>>
File: Tencent employee.png (194 KB, 626x417)
194 KB
194 KB PNG
>>106774953
>yeah, it seems like this model has solved concept bleeding, they probably have some secret sauce in there
they stacked more layers?
>>
File: 1741774291269007.mp4 (1.42 MB, 640x640)
1.42 MB
1.42 MB MP4
the white car drifts around a corner very quickly, on a street in Tokyo at night.

from an old miku gen:
>>
File: 1748972527276859.mp4 (1.63 MB, 640x640)
1.63 MB
1.63 MB MP4
>>106775685
>>
>>106775700
>>106775685
butiful lightx2v slop
>>
>>
>>106775543 >>106775634
very cool gen
>>
File: 1740070662134237.mp4 (640 KB, 640x640)
640 KB
640 KB MP4
>>
>>
File: 1737640801229798.mp4 (1.4 MB, 640x640)
1.4 MB
1.4 MB MP4
the man drinks the white can of energy drink.
>>
>>106775646
The people that I'm subbed to make interest stuff that justifies me giving them $5-$10 a month. The idea of supporting ai artists and content creators isn't far fetched or chucked in anyway.
>>
File: 1732040135777354.mp4 (914 KB, 704x400)
914 KB
914 KB MP4
can your sora 2 do this?

no lora. just "the woman grabs her breasts and pushes them together."
>>
>launch local ui
>it bitches about not being able to connect somewhere
local huh..
>>
>>106775764
XIR this violates the EROTIC POLICY
>>
File: 1754201733987283.mp4 (1.25 MB, 640x640)
1.25 MB
1.25 MB MP4
"the game carts fall off the shelf"

aww I was hoping they would all fall, need to promptmax.
>>
>>106775764
>>106775779
>but muhh cooom
>>
File: 1737399915251170.mp4 (1.54 MB, 640x640)
1.54 MB
1.54 MB MP4
the shelf of game cartridges on the left falls forward, onto the floor. as it falls, game cartridges fall off the shelf.

closer!
>>
https://files.catbox.moe/d10mci.mp4
>israel this, israel that
oy vey!
>>
File: 1754540887157922.mp4 (1.37 MB, 640x640)
1.37 MB
1.37 MB MP4
>>106775800
>>
File: 1751642324630533.mp4 (1.64 MB, 640x640)
1.64 MB
1.64 MB MP4
the anime girl spins 360 degrees and waves hello.
>>
>>106775857
omg it migu
>>
Is that Miku from Sora 2?????
>>
>>106775857
you try the new scheduler any or was it not worth the time?
>>
>>106775880
wait, Sora 2 can do miku? NO WAY
https://files.catbox.moe/1c3h2s.mp4
>>
File: 1755444912428113.png (3.49 MB, 2560x1280)
3.49 MB
3.49 MB PNG
the pink hair anime character with white skin and stubby white arms and legs is wearing a White t-shirt with the text "LDG" on it. change the background to white, and change the text in the top left to the color black.

qwen edit is so neat.
>>
File: 1754742373489558.png (692 KB, 1024x1024)
692 KB
692 KB PNG
>>106775882
euler + beta seems to work best so far.
>>106775892
the character is standing up.
>>
File: 1752973755670936.png (781 KB, 1024x1024)
781 KB
781 KB PNG
>>106775895
the character is sitting at a desk at a computer, with a white monitor.
>>
what's going on with civitai
>>
File: blobfish.png (617 KB, 900x599)
617 KB
617 KB PNG
>>106775887
kek, I'm curious what the prompt was to produce such an abomination of a pepe. It looks like those blobfish whose tissue decompresses into a pulp when yanked out of the deep water.
>please just kill me
>>
>>106775927
It's running like shit so nothing out of the ordinary
>>
>>106775929
Ikr, that makes the video 10x funnier desu
>>
File: 1741391624720018.png (894 KB, 1176x880)
894 KB
894 KB PNG
the white anime dog with pink hair anime character is dressed as Miku Hatsune.

actually pretty good.
>>
File: WAN2.2-Upscale_00002.mp4 (3.56 MB, 936x1248)
3.56 MB
3.56 MB MP4
>>
>>106775895
>euler + beta
yeah that and euler/simple are what i always default to
>>
File: 1733677458281144.png (1.31 MB, 1176x880)
1.31 MB
1.31 MB PNG
the white anime dog with pink hair anime character is standing on a large pile of money in a royal castle, and is wearing a crown.
>>
File: 1728595649303983.png (1.03 MB, 1176x880)
1.03 MB
1.03 MB PNG
the white anime dog with pink hair is standing on top of a white AE-86 Trueno car drifting around a corner in Tokyo, at night.

based qwen edit, it knows the car.
>>
File: 1746403598407364.png (1.04 MB, 1176x880)
1.04 MB
1.04 MB PNG
>>106775979
>>
>>106775985
GAS GAS GAS
>>
>>106775979
>>106775985
Qwen Edit Driftoo??!!
>>
>>106775993
it's such a neat model, also because of how it works, essentially every image is a LORA. you can manipulate in that style, or transform it, or apply it to other images. Or remove elements, copy font styles, etc. And the best part is it works with your illustrious/noobai/qwen/flux/etc gens, and with wan i2v.
>>
>>106776002
is it as slow as qwen image itself?
>>
>>106776002
but since it uses the VAE, you cant really iterate on the same image for continuous edits, or you end up killing all the detail.
SAD.
>>
>>106776052
this, at some point those edit models have to be VAE less
>>
>>106776093
give lodestone a call, tell him to drop this chroma nonsense and to go make QWEN EDIT RADIANCE
>>
>>106774578
>looking for cloud gens in Local Diffusion General
Retard
>>
>>106776044
no, 8 steps with the qwen image lightning v2 8 step lora, it's fast. same lora is good for the non edit model.
>>
>>106774578
>looking for cloud gens in Local Diffusion General
Based!
>>
>>106776171
What if you gen it on the cloud, put it through Wan at 0.01 denoise for 1 step, then upload it here? That counts as local.
>>
>>106776201
kek, he has a point though
>>
>>106775436
>naive toy project kernels
vs
>optimized pytorch kernels
I wonder
>>
i still don't really know what a seed does desu. should i keep it when inpainting?
>>
sora really would be incredible without the shackles and chains
>>
>>106776332
even if it was local I doubt we could run it, that's probably another one of those giant models, sigh... when we will be free of Nvdia's kikery??
>>
>>106776002
qwen edit doesn't really understand style transfer
>>
>>106776332
thats practically every saas model ever. i said a while ago but we're at least 2-3 years behind where we could be if everything was open.
>>
>>106776332
>sora really would be incredible without the shackles and chains
yep, I really wonder when local will catch up to that monster, if that happens ever lol
>>
>>106776354
People (mostly gamers) used to bitch at them all the time for developing/supporting CUDA for as long as they have. That used to be a "waste" of die space and engineering back in the ancient times. Nobody could foresee it would help them totally dominate the world of compute one day.
>>
>>106776420
if it gives you solace, we caught up to sora v1 in 6 months. i doubt thatll be the case here tho
>>
>>106776433
I have a hard time to believe that Jensen has forseen the revolution in AI 20 years before everyone, maybe he was just fond of this shit and saw that as a hobby lol
>>
>>106776453
>i doubt thatll be the case here tho
the best case scenario would be getting something on sora 2's level but without the characters/celebritie's concepts integrated in it, that would be so fucking lame, but I don't see any company giving us base models that can render anything else than Miku
>>
Ovi is absolute utter fucking horseshit wtf, the audio component is worse than VibeVoice
>>
>>106776493
>Ovi is absolute utter fucking horseshit wtf
what did you expect? it's a Wan 2.2 5b finetune kek
>>
>>106776493
show some results you got with, I wanna laugh a bit
>>
>>106776457
even before meme learning was the big thing CUDA was dominating professional visulization applications
>>
can confirm. Ovi is censored. garbage
>>
>>106776542
can you put NSFW wan 5b loras in it?
>>
>>106776550
idk i just briefly used the official inference code. no lora support there. i wanted lewd audio but it cant do that
>>
>>106776542
it's just slop to cash in on the latest saas breakthrough. we saw similar days after 4o with "lumina mgpt" autoregressive model. turns out it was shit.
>>
File: 00001-2687024042.jpg (1.57 MB, 4032x1728)
1.57 MB
1.57 MB JPG
>>
New

>>106776633
>>106776633
>>106776633
>>
limit
>>
>>106774618
Fking tell him bro! Maid power!
>>
File: 1759160886046359.png (57 KB, 1384x354)
57 KB
57 KB PNG
>>106775012
its based on wan ti2v 5b



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.