[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


NB4 Spam Edition

Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106755435

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Neta Lumina
https://huggingface.co/neta-art/Neta-Lumina
https://civitai.com/models/1790792?modelVersionId=2203741
https://neta-lumina-style.tz03.xyz/

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
>Discussion of Free and Open Source Text-to-Image/Video Models and UI
https://files.catbox.moe/ccd0qs.mp4
>>
Focus on the maintain thread quality part.
>Maintain Thread Quality
https://rentry.org/debo
He's having an episode so just do what's needed which is don't engage and do what you need to do
>>
File: 1753078401259449.png (293 KB, 549x617)
293 KB
293 KB PNG
When will local reach OpenAI Sora 2 quality?
>>
blessed thread of saas
>>
>>
>>106758722
then why do you keep engaging?
>>
>>106758722
you are truly obsessed, ranjeet
>>
thx 4 da phree bumps cloudcucks :D
>>
>>106758722
and remember guys, everytime you see a post you don't like, that's definitely debo, climate change? that's also debo? World War 2? that was debo as well!
>>
File: file.png (140 KB, 247x247)
140 KB
140 KB PNG
Wish I didn't waste so much time on the 1.2B model, something is seriously wrong with the architecture, probably the mlp. Using the HDM mlp setup the 600m model test is learning so much quicker and works with much higher learning rates.
>>
oh he big mad with those replies kek go back to spamming
>>
>>106758769
fuck off debo
>>
>>106758770
In his mind if he doesn't look at it in OP it doesn't exist
>>
>>106758769
>Using the HDM mlp setup the 600m model test is learning so much quicker and works with much higher learning rates
Based.
>HDM mlp
Wat is?
>>
>>106758722
ngl we hear more about you complaing about debo than debo posting, you are more annoying fag
>>
>debo
>>
>>106758769
would you humor me and upload the 1.2b? if anything, to admire the file since its one of a kind
>>
Newfag here, who is debo?
>>
>>106758784
class SwiGLUTorch(nn.Module):
"""
SwiGLU MLP: y = W3( SiLU(W1 x) ⊙ (W2 x) )
- Supports packed weights via a single Linear projecting to 2*hidden_features.
- For compatibility with callers that pass extra kwargs (e.g., HW=...), forward accepts **kwargs.
"""
def __init__(self, in_features, hidden_features=None, out_features=None, bias=True, _pack_weights=True):
super().__init__()
self.in_features = in_features
self.hidden_features = hidden_features or in_features
self.out_features = out_features or in_features
self._pack_weights = _pack_weights

if _pack_weights:
self.w12 = nn.Linear(in_features, 2 * self.hidden_features, bias=bias)
self.w1 = None
self.w2 = None
else:
self.w12 = None
self.w1 = nn.Linear(in_features, self.hidden_features, bias=bias)
self.w2 = nn.Linear(in_features, self.hidden_features, bias=bias)

self.w3 = nn.Linear(self.hidden_features, self.out_features, bias=bias)

def forward(self, x, *args, **kwargs):
if self.w12 is not None:
x1, x2 = self.w12(x).chunk(2, dim=-1)
else:
x1 = self.w1(x)
x2 = self.w2(x)
return self.w3(F.silu(x1) * x2)


It's how the layers and parameters are glued together and controls how the data flows..
>>
lol https://files.catbox.moe/203axg.mp4
>>
File: 1730864918478479.mp4 (1.78 MB, 640x768)
1.78 MB
1.78 MB MP4
>>
>anistudio again in the op
>>
>>106758815
lmaooo
>>
>>106758795
You enjoyed the best of it with the images I posted. And to use it you'd need all my code which I won't release. Anything I actually release would be a full open source drop including the modified Pixart training/inference code.
>>
>>106758805
https://desuarchive.org/g/thread/102628478/#q102629904
>>
>>106758823
>And to use it you'd need all my code which I won't release
ah, understandable. i enjoy reading your posts about it regardless. ty anon.
>>
File: 1730350897814088.png (261 KB, 1354x1142)
261 KB
261 KB PNG
>>106758815
I never thought sound would make memes that much funnier but now I realize we're still at the Charlie Chaplin stage with Wan 2.2, can't wait to get a local model that produces sound as well, maybe it'll be sooner than you expect
https://xcancel.com/bdsqlsz/status/1973262731311755334#m
>>
>>106758722
we should rename /ldg/ to /cadg/ -> complain about debo general
>>
Too bad it can only output "man impersonating celeb" instead of the actual celeb. But hey, Sama memes can be funny I guess.
>>
>>106758836
remember to get on your knees and beg them for it nicely
>>
>>106758850
man it got his voice perfect and his face almost perfect, what more do you want
>>
>>106758851
And remember, bet grateful for every release. Even if you think it's slopped, repeat after me: FINETUNES WILL FIX IT!
>>
>>106758858
>what more do you want
For it to actually look like him and not someone cosplaying as him. Pretty simple request desu.
>>
File: 00005-3184870178.png (1.37 MB, 1024x1240)
1.37 MB
1.37 MB PNG
>>
>>106758863
like are we talking perfect facial bone / skin alignment? cause I dont think that is gonna happen any time soon
>>
>>106758862
I swear to god if this general wouldn't cope so hard and be harsher on those slop chinks, we would've gotten better models at this point, ultimately we get what we tolerate, if we stopped hyping slopped models maybe it would make them understand that they should stop doing that
>>
>>106758829
But like I said this updated setup learns much much faster and is learning more how I'd expect, where the model goes from "abstract" patch blocks and slowly gets more and more detailed with each epoch. So it's going to take a fraction of of the time. The other model really felt like it had no noticeable improvement epoch over epoch (much like this video clip).
>>
>>106758863
>For it to actually look like him and not someone cosplaying as him. Pretty simple request desu.
no model can do that, neither Sora, Veo or any local model
>>
>>106758874
Damn, local wins that one I guess.
>>
>>106758885
im sure anon is just imagining all those celeb deepfakes made with local models kek cope harder cloudcuck go back
>>
why does debo defend local so much?
>>
>>106758836
Translation
>New intel: We will be sourcing Sora2 videos from here onwards! We will be benchmaxxing and claiming we have caught up with our slopped datasets and models! Worry not peasants!
>>
>>106758888
show me lol
>>
>>106758894
>im sure anon is just imagining all those celeb deepfakes made with local models kek
must be easy as fuck to find one example then, go on anon, I'm waiting for the proof (plot twist, he won't provide the proof)
>>
>>106758902
>>106758903
go on /gif/ /b/ or /r/ right now. surprised you havent heard of those threads yet... are you new?
>>
>>106758910
just one example, just one post, that's all we're asking, the burden of proof is on you, remember that
>>
>pretending that local models cant into deepfakes
>pretending most of /r/ isnt wizard requests
>pretending theres not a link to the celeb fake thread in OP
Just go back to spamming bro
>>
>>106758910
I just looked, none of them look closer to the real person in the one thread I found
>>
>>106758923
maybe you should learn how to argue your point debo
>>
>>106758903
>(plot twist, he won't provide the proof)
>>106758923
>narrator: he didn't provide the proof
like clockwork, damn I'm so good at this
>>
>>106758895
>>106758929
>spamming didn't work
>moves to muddying the waters
How many newfags are lurking right now do you think? KEK
>>
>sora can't do cele- acckk!
https://files.catbox.moe/l68j9b.mp4
>>
File: Chrowan_00026_.jpg (1.07 MB, 2016x3072)
1.07 MB
1.07 MB JPG
Anon, cousin, stop fighting.

>>106758609
Happy halloween!
>>
>>106758944
it doesn't load, catbox is down or something?
>>
>>106758941
now this is classic debo cope after losing an argument
>>
>>106758902
>>106758924
>>>/b/940546596
>>
>>106758914
>>>/b/940555975
bruh I’m not the guy you’re in a pissing contest with but like what are you even trying to argue. Deepfakes are one of the things people have been doing from the start
>>
>>106758952
its a big one, might take a min
>>106758959
>video
>>
>>106758959
>we asked for a video model capable of doing deepfakes since we're talking about the capabilities of Sora 2
>he provides an image instead
damn... must be tough living in this world while being this retarded, I have some pity not gonna lie
>>
>moving the goalposts
but keep spamming anime videos or whatever lol
>>
>>106758944
ok it works now
>>
>>106758963
>Deepfakes are one of the things people have been doing from the start
doesn't mean that they're perfect, remember, this is the criteria
>>106758874
>like are we talking perfect facial bone / skin alignment?
>>
>>106758974
>>106758944
>anime
>>
File: 00024-1558125614.png (1.41 MB, 1024x1240)
1.41 MB
1.41 MB PNG
>>
>>106758982
Qwen Edit is capable of perfect deep fakes. Wan is capable of animating those perfect deep fakes. And there's no content cop saying you're unsafe.
>>
File: 1756220638628612.png (1.73 MB, 1360x768)
1.73 MB
1.73 MB PNG
>>106758987
>Qwen Edit is capable of perfect deep fakes.
me when I lie
>>
I log off and go to bed and there's been three threads in the intervening twelve hours wtf happened
>>
Seems like you are butthurt.
>>
>>106758944
Unironically who is that supposed to be
>>
File: 00025-1558125615.png (1.44 MB, 1024x1240)
1.44 MB
1.44 MB PNG
Don't argue just post gens
>>
>>106758944
is this sora 2? the watemark logo is the sora 1 logo
>>
>>106758993
You don't know because you're poor and no one is going to post anything because deepfakes are illegal so right now the internet is just poorfags living in dirt hovels who aren't afraid of the law.
>>
>>106759000
Any time a new cloud model drops it's used as a wedge. Some discuss it in earnest but the guy who spammed the last two threads is just trolling.
>>
>>106759008
sora 1, music is Suno
>>
>>106759010
I literally posted a Qwen Image Edit render of Sam Altman with Will Smith, does that look "perfect deep fakes" to you?
>>
>>106759014
So... to prove that Sora 2 can do celebrities you post a Sora 1 video?? lmao
>>
>>106759016
This is no different than you building a shitty birdhouse and saying it's impossible to make a good birdhouse with home tools. You don't prove me wrong, you only show you're stupid.
>>
>>106758983
>generic ai face woman
whats that supposed to prove exactly
>>
File: 00013-2348413427.jpg (1.66 MB, 2450x2450)
1.66 MB
1.66 MB JPG
>>
>>106759028
>Qwen Image Edit can definitely do it, I just won't show it!
All right I'm completly convinced right now, great argument anon!
>>
>>106759025
He's not very smart. Just laugh at him getting all riled up.
>>
>>106759039
Your premise is completely flawed. I don't *need* you to use Qwen Edit and in fact the more you favela poorfags avoid the good local models the better.
>>
this is higher res with pro account
https://files.catbox.moe/4zf45q.mp4
>>
>other AI threads haven't been bumped in 40 minutes
Wooooaaaa I'm noooooootocing!
>>
File: 00035-2324060362.png (1.44 MB, 1024x1240)
1.44 MB
1.44 MB PNG
>>
>>106759025
isn't that even worse for local?
>>106759033
holy brown
>>
>>106759035
>muscular woman
*barfs*
>>
>>106759055
So tell me who it is so we can compare unless you're scared of being proven wrong again kek
Also it's more brown to think a random british guy dressed up as austin powers is mike myers but thats besides the point
>>
https://files.catbox.moe/r25a0j.mp4
>>
File: 00036-2324060363.png (1.93 MB, 1024x1240)
1.93 MB
1.93 MB PNG
He really is going all out today sadly he can't make anything locally that's passable so we have another week of the same non local trolling..
>>
>>106759094
Broke the 180 rule. Animators remain safe.
>>
>>106759003
>>106759053
>>106759100
Chroma?
>>
File: 00039-2324060366.png (1.47 MB, 1024x1240)
1.47 MB
1.47 MB PNG
>>106759128
Yes it's really good with more irl stuff, but I could probably do this with flux if I had it on my computer
>>
>>106759050
>one minute after this post they were bumped
this ability to notice is a burden some anon must carry
>>
File: 00040-2324060367.png (1.86 MB, 1024x1240)
1.86 MB
1.86 MB PNG
>>106759146
99% of anons notice you can tell by the engagement in the other thread nobody save for a few mentally ill anons and ones that want to quell him post there now. He has been losing this game since he lost the thread split.
>>
File: 00048-1749936220.png (1.91 MB, 1280x1024)
1.91 MB
1.91 MB PNG
>>
>only posts are rans
looking grim for local
>>
>>106758881
That gif reminds me of the one where it shows progressive steps of a prompt that's like "woman on the beach" with euler a. I thought that one illustrated convergence well and it's a shame I can never find it when I need it.
>So it's going to take a fraction of of the time.
Nice. I can't wait to see the progress gens.
>>
>>106759182
He's melting down over scarecrows and spamming the thread harder than the other schizo he's whining about.
>>
*yawn*
>>
>>106759182
>>106759191
yeah, at some point the complaining is even worse than the schizo himself, let's hope that one day he'll understand that he should stop feeding trolls
>>
Once again I wonder how many newfags you think are here right now
>>
>>106759120
>Animators remain safe.
remember, this is the worst we'll ever get
>>
File: 00051-3342282911.png (2.51 MB, 1240x1240)
2.51 MB
2.51 MB PNG
>>
>>106759217
I think you completely misunderstand that there's a ton of compute between right now and a model smart enough to not break the 180 rule and if the future is 100B+ models, yes, animators are safe.
>>
>>106759207
>>106759191
to be fair it is fun to shitpost at him since it's really easy to set him off
my favorite is when he goes to /sdg/ after to melty in there too thinking it was debo
>>
File: 1732511172659797.png (205 KB, 2096x883)
205 KB
205 KB PNG
https://arxiv.org/abs/2509.22935
Apple saved local, we'll be able to go for very low quant models and have good accuracy
>>
>>106759120
>flips the clip horizontally in editing
>>
>maybe if i samefag hard enough anon will believe me
>>
>>106759229
>there's a ton of compute between right now and a model smart enough to not break the 180 rule and if the future is 100B+ models
OpenAI has that compute, and we're far from optimizing the neural networks architecture, if you believe Transformers is the end of the road, you're not gonna make it
>>
>>106759214
Surely this will work after it not working for 3 years, doing the same exact style and bit.
>>
>superior sora saas starts sdgtard seething
love to see it
>>
>>106759254
They call that the definition of insanity, don't they.
>>
*brap*
>>
>>106759246
kek, and that's it, animators are dead now
>>
>>106759217
We are getting to the point of incredibly incremental upgrades. This has better editing for the overall video but the movement and consistency is still wack. I love AI, and it's fun AF but until hallucinations are fixed this tech really has no future.
>>
>>106759253
Actually they don't and realistically it's not going to be 100B to be "perfect", it's going to be something like 10T. And that's even talking about context problems, 10s is cute but last time I checked movies are 90+ minutes.
>>
>>106759238
id rather chop off my left nut and give it to chang than put trust into crapple. considering their record id be surprised if its not a nothing burger.
>>
>>106759278
>10s is cute but last time I checked movies are 90+ minutes.
the average length of a scene before a cut from a movie is 12 seconds, all they have to do is to make multiple cuts, like you would do on a real movie
>>
File: xyz_grid-0001-1970925430.jpg (1.21 MB, 4000x1111)
1.21 MB
1.21 MB JPG
>>
>>106759293
Neat
>>
>>106759288
>average length of a scene is 12 seconds
Okay you're totally rotted by TikTok. Have you ever watched a movie in your life? Yes, there are quick cuts that bring down the average, but scenes are not a series of 12 second cuts. Also that still doesn't solve the context problem, because you might realize, you need to have your 12 second clips work within a plausible, realistic, "physical" environment. So even if you managed to keep your characters consistent, you still need the model to not hallucinate physical differences in the environment, which means you NEED a full 3D global model.
>>
>>106759315
I don't know why providing that fact made you so upset but it is what it is, you can't just ignore reality
https://gointothestory.blcklst.com/the-shortening-of-movies-43dc906852f9
>The average shot length of English language films has declined from about 12 seconds in 1930 to about 2.5 seconds today
>>
>>106759315
You can use a 3d model of the environment as an input to ground the model when necessary.

Honestly I find it funny how you guys think everything has to be pure text to video. In a realistic production there would be a much more advanced pipeline with a lot of human talent in the loop. Live action and 2D/3D animated productions use an immense amount of human labor and talent for every minute of video.
>>
File: 00049-188999635.png (1.22 MB, 1024x1280)
1.22 MB
1.22 MB PNG
sloprangers
>>
>>106759338
You have no understanding of visual language. With AI videos every cut looks like it has a different cinematographer. It is one of the biggest glaring problems with making longer AI videos
>>
>>106759338
Wow you just completely ignored what I said.
- movies aren't just 2.5 second or 12 second clips stitched together
- even if they are, those clips are taken with 2 or more cameras inside of a physical scene, which means even if you had 12.5 second clips, you still need the previous clip for context
>>
>>106759359
You can use an edit model to provide plausible changes of angles, you add characters and you do some I2V shit, and boom
>>
File: 00062-3008569877.png (2.44 MB, 1240x1240)
2.44 MB
2.44 MB PNG
>>
>>106759357
Yeah anon, where's your 60 second clip. You're so confident it can be done I'm sure you can slap something together in 20 minutes. Really give us the SOTA state of video gen.
>>
>>106759388
>Yeah anon, where's your 60 second clip.
https://www.youtube.com/shorts/2njx4yIONSU
>>
>>106759375
lol, I should have realized you would have no idea what I am even saying before I replied. Good bait anon.
>>
>>106759404
It's all right, I accept your concession.
>>
>>106759403
>not 60 seconds
>literally hallucinates details clip to clip
>>
>>106759357
>In a realistic production there would be a much more advanced pipeline with a lot of human talent in the loop. Live action and 2D/3D animated productions use an immense amount of human labor and talent for every minute of video.
>>106759388
>I'm sure you can slap something together in 20 minutes
Kek, talk about missing the point.
>>
>>106759228
Its ability to capture TV fuzz and the general look of a tube display always amazes me. Thank god we have 16ch VAEs now.
>>
>>106759415
>I accept your concession.
classic debo phrase
>>
File: 1754151454735528.png (9 KB, 269x95)
9 KB
9 KB PNG
>>106759421
>>not 60 seconds
oh yeah my b, it's 62 seconds, the horror+
>>
File: 00052-845814631.png (2.46 MB, 1536x1920)
2.46 MB
2.46 MB PNG
>>
File: 1753774423789611.png (846 KB, 936x1112)
846 KB
846 KB PNG
the man is holding a taco and wearing a sombrero. keep his expression the same.

my food is augmented.
>>
File: proof.png (96 KB, 382x491)
96 KB
96 KB PNG
>>106759423
What point, you said it's been done, it's easy. I'm merely asking you to substantiate your claims.
>>
>>106759437
>only debo use this sentence in this world
completly brainbroken
>>
>>106759438
Please be honest, are you intentionally bad faith or are you retarded and thought I didn't mean a 60 second clip of a single scene. But it's still funny because you showed a series of 5 second clips and they don't even have continuity.
>>
>>106759445
>I'm merely asking you to substantiate your claims.
we already did but you decided to pretend it didn't exist, we can't do much more when dealing with someone in pure denial >>106759403
>>
>>106759000
hes one of those guys who as a kid would deflate the ball because no one wanted to play with him
>>
>>106759445
>you said it's easy
No, I said the opposite of that. Realistic film productions don't shit out a minute of finished film in 20 minutes. You're just incapable of arguing the point and decided to create that strawman to argue against instead
>>
>artists right now
https://files.catbox.moe/oii2yi.mp4
>>
>>106759462
Please be honest, are you intentionally bad faith or are you retarded and thought I didn't mean a 60 second clip of a single scene. But it's still funny because you showed a series of 5 second clips and they don't even have continuity.
>>
>>106759448
>brainbroken
not helping yourself debo at least try to change up your lingo
>>
>>106759459
>I didn't mean a 60 second clip of a single scene.
good thing it's not a 60 second clip of a single scene, it's a 60 seconds video with multiple cuts, all with a single prompt
>>
File: 00054-3320588119.png (3.23 MB, 1240x1240)
3.23 MB
3.23 MB PNG
>>106759426
Had to pull out the base model for these
>>
>>106759483
You mean like how actual cartoons are? Oh, the horror!
>>
SaaS is honestly insanely powerful, these threads move so fast when amazing new SaaS models drop. It's clear that OpenAI continues to dominate the conversation in the AI space. What must local do to remain relevant?
>>
>>106759483
> it's a 60 seconds video with multiple cuts, all with a single prompt
Okay should be simple for you to recreate this while screen recording.
>>
>>106759482
>only debo says "brainbroken"
damn, debo is such a unique guy, only him has the right to say words you don't like it seems
>>
Still can't into real people though unfortunately
>>
>>106759493
the paper and the project are here, fell free to read it, but I think you're too much in a denial and you will pretend it never existed I guess
https://test-time-training.github.io/video-dit/
>>
https://files.catbox.moe/cawxp1.mp4
>>
>>106759493
>but bro I made a Tom and Jerry LoRA for Wan and I just stitched together clips, video is solved!
>>
>>106759494
>debo is such a unique guy
classic debo talking in the third person
>>
>>106759511
You're doing everything except provide proof YOU can do this right now. I'll accept "I can't do this myself anon because I'm poor" as an answer btw.
>>
File: 00071-3008569886.png (2.66 MB, 1240x1240)
2.66 MB
2.66 MB PNG
>>
>>106759514
>I just stitched together clips
it's literally a 1 minute video made with a single prompt, no stitching or anything, the model did the whole minute by itself >>106759511
>>
>>106758944
>cele-
and it's an animation of AniStudiop
>>
File: 1740296506594498.png (909 KB, 936x1112)
909 KB
909 KB PNG
the man is holding a bowl of Chinese white rice and wearing a Chinese rice hat. Give him a moustache and goatee. keep his expression the same.

you should check out the lucky money club, anons.
>>
>>106759525
>well yeah some people did it and showed how it can be done but YOU didn't, therefore it doesn't exist
(You)
>>
>>106759529
You're doing everything except provide proof YOU can do this right now. I'll accept "I can't do this myself anon because I'm poor" as an answer btw.

Okay it's all one prompt, where's your 60 second video using only one prompt.

The prompt
"A cartoon airplane in an intense war scene from World War 2, the pilot is a nigger."
>>
>>106759522
oh hi debo, still talking with yourself?
>>
>>106759537
>t. I can't do this myself anon because I'm poor
It's okay anon, I know your pride won't let you say it.
>>
>>106759540
>it doesn't count unless you put 0 effort in
>>
>>106759551
i'm debo#2 you're debo#1
>>
>>106759557
You're doing everything except provide proof YOU can do this right now. I'll accept "I can't do this myself anon because I'm poor" as an answer btw.
>>
>>106759554
the model is here, feel free to test it out (you won't because you're too poor)
https://github.com/test-time-training/ttt-video-dit
>>
>>106759035
Exquisite. 2d > 3d every fucking time
>>
>>106759565
>moving the goalpost
we went from "a model cannot do a 1 minute video" to "w-well yeah it can do a 1 minute video but can YOU do it?"
>>
alive thread of stagnant (local) tech
it's so over
>>
File: 00056-2960080141.png (3.07 MB, 1536x1920)
3.07 MB
3.07 MB PNG
fml
>>
>>106759565
I could do it using the model anon posted but I would never waste time on low-effort workflow when I could build out a professional pipeline that uses 5% the effort of traditional filmmaking and surpasses it in results
>>
Guys I made a Tom and Jerry LoRA for Wan. I generate a still frame, input it for starting frame to video for 5 seconds, then I take the last frame as input for Qwen Edit, ask it to change the camera position, use a VLM to write a prompt, and repeat until I have 60 seconds of video. Video is solved!
>>
>>106759584
at least your gens are cool desu
>>
>>106759592
What's your point? Just saying something with a sarcastic undertone doesn't automatically invalidate it, you know.
>>
File: 1736625547512232.png (666 KB, 936x1112)
666 KB
666 KB PNG
view from the back perspective

neat
>>
>>106759621
Can it do an extreme angle from below? Like from his feet looking up, at an extreme angle
>>
>>106759592
>still lying about the true process
why is it so hard to admit they managed to make a model that can do a 1 minute video all by itself?
>>
File: ComfyUI_temp_qofnp_00059_.jpg (360 KB, 1024x1280)
360 KB
360 KB JPG
>>
>>106759633
probably low iq
>>
File: 1749737344170142.png (778 KB, 1098x790)
778 KB
778 KB PNG
>>106759621
with an anri pic

like magic.
>>
File: 1751388724538094.png (645 KB, 1162x717)
645 KB
645 KB PNG
>>106759649
diff girl to test

setup: qwen edit v2 (2509), 8 steps with Qwen-Image-Lightning-8steps-V2.0.safetensors (works better than the edit v1 lora)

pretty cool.
>>
>>106759649
she's a granny now bro time to move on already
>>
>wheelchairs are posted
>goes on the fritz
>>
Peacefully observing the localmonkeys in their natural habitat, don’t mind me
>>
File: 1736570015130821.png (1.61 MB, 912x1144)
1.61 MB
1.61 MB PNG
>>106759672
what's neat is you can't take a base image and just inpaint them in reverse. you'd need a new pose and to render it in proper detail. edit models can do that. it even works with large buildings or houses. this is just a fun way to test functionality.
>>
>>106759633
>multi-scene videos up to one minute in length directly from storyboards.
Actually it's funny that my sarcastic reply is actually their process.
>>
File: 1750255481991735.png (1.32 MB, 856x1216)
1.32 MB
1.32 MB PNG
DAMN. AI is pretty cool after all.
>>
>>106759708
not at all, it's literally one single prompt and you have the full 1 minute video, are you baiting or something?
>>
File: file.png (87 KB, 1323x836)
87 KB
87 KB PNG
>>106759708
You really should read your own papers.
>>
>>106759708
>if I say it sarcastically that means it's a bad thing!
You need to go back.
>>
>>106759649
That's not a view from her back though.
>>
>>106759727
>but you can just look at the JAV
the idea is to test functionality of the model.
>>
>>106759732
>>106759731
woo that aged poorly

>>106759736
No, it's funny because it doesn't make the video all by itself. You can literally do their paper at home with a Tom and Jerry LoRA for Wan.
- use a LLM and Qwen create a series of storyboards along with a video prompt
- animate the storyboard with Wan with the storyboard image and video prompt
- stitch together
>>
>>106759492
>SaaS is honestly insanely powerful

Only ClosedAI is. Not a single Chink model has ever held a candle to Dalle, nor 4o, and now this... They never will hold a candle to this.
>>
>>106759757
>woo that aged poorly
?? what are you talking about, it's still one single prompt with multiple descriptions, can you read? >>106759732
>>
>>106759775
>Only ClosedAI is.
Google has veo 3 remember
>Not a single Chink model has ever held a candle to Dalle, nor 4o, and now this...
Seedream is great though, still the best image model for realistic scenes
>>
Smells like cloudcucks in here.
>>
>>106759757
Let's say that's how it works, why is that a bad thing? Explain in plain english, don't just sarcastically describe the process like it's obvious why we should be predisposed against it
>>
File: 1729181907036598.png (981 KB, 1024x1024)
981 KB
981 KB PNG
one more, a test of a random anime pic (Haruhi):

even got the back of the uniform right.
>>
>>106759778
>one single prompt is actually a JSON input with with multiple prompts
Why do you insist on being this obtuse? It's okay to admit you're wrong. I get it, AI is like magic to you so you made fantastical assumptions about the state of the tech and you're upset I'm bringing you back to Earth. You wanted to imply they just typed in "Tom and Jerry in the office" and not actually a full series of prompts and storyboards fed into Cog.
>>
File: 1757992628634973.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
>>106759805
the character is sitting at one of the desks. her hands are on the desk.

and it works. 20-30 seconds, and more effective than spending more time with openpose to get this type of repose.
>>
>>106759806
>nooo, why can't it make a movie if I provide 3 words max
a prompt is a prompt anon, you just don't like it when it's long, but you're moving the goalpost, you said that it couldn't make a 1 minute video with a single prompt, now your argument is "I want that prompt to be short", it's all right, just say you were wrong and you have trouble admiting it because your ego is too fragile to do that, and we're good, deal?
>>
>>106759803
This entire conversation is about people overexaggerating the capabilities of video A and grossly misrepresenting the current state of AI and the requirements to achieve a human-free AI pipeline. There is NOTHING wrong with what I just said for making a 60 second AI video, my contention is the human effort to make that clip is non zero. Yes anon you can make a video right now only using AI with existing models and tools. But not without significant care and effort.
>>
>>106759790
Looking at their raw capabilities, Seedream is not even 1% as good as 4o. And kek, no, that slopped crap is not realistic.
>>
local tooling
you just cant beat it
>>
>>106759840
>Seedream is not even 1% as good as 4o.
you're talking about the piss filter model? ahah good bait, 7/10
>>
File: file.png (27 KB, 635x460)
27 KB
27 KB PNG
>>106759828
You wanted to imply they just typed in "Tom and Jerry in the office" and not actually a full series of prompts and storyboards fed into Cog.

When you have two text prompts in an array, it's called "prompts".
>>
>>106759851
>You wanted to imply they just typed in "Tom and Jerry in the office"
I never implied that, you're talking out of your ass again
>>
>>106759851
it's still having a 1 minute video with just pure text, I don't get what you're complaining about, now you don't want to write text to get that video? you want the model to guess what's in your mind or something?
>>
>>106759848
The piss filter is them not even trying though. Probably some really bad fingerprint/censorship. Their video model is what happens when they try. What do you think they will give us next for text to image? It will be over. Also, 4o still the best thing for text and overall concept knowledge. It's not even close.
>>
where did all the competent anons go?
>>
I've been thinking about how in the future movie studios and game studios will just make their own model and train it on a per movie/game basis
I know it's not a very complex thought but I felt like in the context of people expecting generalist AIs to make competent movie scenes it's kind of relevant. Like this wouldn't be what the studios use anyway.
>>
ComfyUI is such a wonderful program. If only it worked.
>>
>>106759877
>What do you think they will give us next for text to image? It will be over.
I don't talk in the wishful thinking language, they still haven't provided anything good, so it doesn't exist
>>
>>106759511
>last updated
>4 months ago

sigh, only tencent and nvidia can save us now
>>
>>106759894
>I don't talk in the wishful thinking language, they still haven't provided anything good, so it doesn't exist

>Sora 2 exists
>Dalle 3 still exists
>4o is just a teaser for what they will give us next...

Anon don't be so naive..
>>
>>106759903
>>Dalle 3 still exists
this shit is ultraslopped, can't believe you decided to bring this shit to the table
>>
File: 00086-534935041.png (2.8 MB, 1240x1240)
2.8 MB
2.8 MB PNG
>Eat lunch with GF
>Still doing it
>>
>>106759885
Realistically in video production where AI is most useful is inbetweening and neural rendering. Maybe some storyboarding and animatics help. Just like AI is useful today in programming and writing. Ultimately AI is a solver of the blank canvas problem, but the heavy lifting is the human judging, iterating and controlling the end result.
>>
>>106759913
The model is still them not trying and yet it's SOTA in concept knowledge.
>>
File: 1744788858507866.png (1.03 MB, 1024x1024)
1.03 MB
1.03 MB PNG
the girl is holding a baseball and is wearing a Dodgers baseball uniform.

hmm qwen knows the doyers.
>>
>>106759947
>t-they didn't try on Dalle3 and 4o, but next time they'll try! Trust the Sama
I won't
>>
>>106759885
They will train LoRA vWhatever, sure, but they will not be training new base models on a per-project basis. The economics make no sense.
>>
>>106759959
When they released Sora 1, they also were clearly not trying and noticeably holding back. Did you really think Sora 2 just came as a natural upgrade to Sora 1? It's several generations ahead of Sora 1, which itself was behind every Chink model that was being put out at the time (even Veo 3)...
>>
>>106759971
I just think that Sam Altman is a weird motherfucker, for his image model he added a piss filter because he was terrified of deepfake, and now he made Sora 2 and the cameo shit that lets every random make a video selfie of themselves and share this shit for everyone to see, he's so inconsistent lol
>>
File: 00095-2394186116.png (2.02 MB, 1240x1240)
2.02 MB
2.02 MB PNG
I wish I was interested more in irl style gens but I'm not, really liking these outputs
>>
>>106759971
Sora 2 is just a natural progression of the audio to video tech, I don't think it's that far ahead of the curve. Isn't Wan 2.5 basically on par with Sora 2, the only thing OpenAI has going for it is they obviously have the best dataset but that's ultimately beat by LoRAs as specialization beats generalization.
>>
Qwen Edit seems worse at t2i than plain old Qwen Image. Not significantly, just seems jankier with slightly worse prompt adherence, but maybe that's my imagination.
Are they going to update Qwen Image or are they going to focus on regular updates to Qwen Edit from now on?
>>
>>106759979
Or Sam is well aware that open source is catching up, and his time is almost up. He either puts up a good model or gets replaced.
>>
>>106759979
The man marketed his AI assistant with a movie about falling in love with your AI assistant, and then proceeded to kneecap the human model as much as possible as often as possible and virtue signal about how moral he is for not creating an Ani-equivalent. He has no plan or vision, he just does whatever and assumes he is entitled to AGI by divine right of kings.
>>
File: 1756656700051655.png (1.03 MB, 1024x1024)
1.03 MB
1.03 MB PNG
>>106759949
the girl is holding a baseball bat and is wearing a New York Yankees baseball uniform.

anyways you get the idea. it's an amazing tool for edits that can be used with either noob/illustrious gens or realistic gens/photos.
>>
>>106759991
>Isn't Wan 2.5 basically on par with Sora 2
not even close, the sound is terrible, the images are kinda slopped, the movements have a lot of glitches and it's far from having all the characters/styles concepts of Sora
>>
>>106759993
>Qwen Edit seems worse at t2i than plain old Qwen Image. Not significantly, just seems jankier with slightly worse prompt adherence, but maybe that's my imagination.
it is, because it's not the goal of QIE to be a t2i model, they sacrified that aspect so that it can be good only in editing
>>
>>106760011
Sora 2 sounds like they're chewing gravel.

https://www.youtube.com/watch?v=yvD8TxNsR4Q
It sounds like you're full of shit honestly.
>>
>>106759971
yeah, I never believed OpenAI would reach Veo 3's level this soon, they surprised me on that one, I should stop underestimating them lol
>>
>>106760028
>sounds like they're chewing gravel.
you're literally describing Wan 2.5, what the fuck are you talking about?
https://youtu.be/yvD8TxNsR4Q?t=126
>>
>>106760011
It doesn't have any real people other than Sam Altman though so
>>
>>106760018
I vastly prefer QIE and if you do a LoRA you can enhance its text to image capabilities making QI redundant or not worth switching between them, I personally think QIE is better because the model has a better 3D understanding of the generations which is hard to put into words.
>>
>>106760038
>It doesn't have any real people other than Sam Altman though so
Wan can only do Trump accurately lol
>>
>>106760037
There are Sora clips in this fucking thread that sounds like a wav file from 2001. But it's okay, you're the resident OpenAI shill.
>>
>>106760050
>A Wan 2.5 (API only) shill complaining about other API shills
kek, this is funni ngl
>>
>>106760011
ClosedAI understands that in order for a model to be good, it needs to understand as much as possible and naturally communicate with the world. Chinks don't understand this, they just want to benchmax, fit in as many generic concepts as they can, and have zero vision.
>>
File: 1746823423338828.png (981 KB, 1024x1024)
981 KB
981 KB PNG
>>106760002
remove the girls outfit and replace it with a two piece bikini with the Dodgers baseball logo on it.

qwen image edit is so smart (for detecting stuff and edits)
>>
>>106760047
I think we all know the real use case of Wan and it's nothing Sora will ever do without getting your account banned.

But please, give me an a woman in a bikini in a hot tub doing ASMR for her 18+ Twitch stream.
>>
>>106760047
With i2v it can do any real person. Can't say the same for Sora unfortunately.
>>
>>106760058
that's a chinese alibaba shill
>>
>>106760028
Can Wan 2.5 do songs?
https://files.catbox.moe/1c3h2s.mp4
>>
>>106760070
> do
do what
>>
>>106760077
>guys there's no gravel I swear in this 12kbps audio clip
>>
the localcope is real
>>
File: 1759022990171238.png (1000 KB, 1024x1024)
1000 KB
1000 KB PNG
>>106760064
remove the girls outfit and replace it a taliban terrorist outfit.

kek
>>
calling it now

>wan 3.0 - 2026, slaps sora 2 veo 3 and the rest
>local - 12 secs, 18B, 720p, 30fps
>ayy pee eye - 30 secs, 39B, 1080p, 60fps
>>
>>106760086
I never said the sound is perfect, but somehow you can't stop ignoring context, that sound quality is on par with veo 3 and is better than Wan 2.5, that's the best sound quality we get at this moment
>>
>>106760058
With Sora 2 it's inevitable Wan is going to release locally lmao because the economic damage to OpenAI is too juicy.
>>
saar you wish to make video? please only sama it is not safe for others. please saar.
>>
>>106760111
that's only if localkucks beg hard enough
>>
>>106760077
>https://files.catbox.moe/1c3h2s.mp4
THIS IS SO CUTE WTF
>>
File: 1735810645415272.png (1.13 MB, 1024x1024)
1.13 MB
1.13 MB PNG
>>106760098
remove the girls outfit and replace it with a Japanese samurai outfit.
>>
>>106760113
> is going to release locally
> 228B
>>
>>106760112
I'm don't care to verify capabilities because it misses the point. Given audio2video exist, why would I want a really shitty sounding song when I can use a better AI to make a good song and then use that to generate a video?
>>
https://files.catbox.moe/nxirwk.mp4
that's a weird demon souls mod lol
>>
File: 1736751045400578.png (971 KB, 1024x1024)
971 KB
971 KB PNG
>>106760126
the girl is wearing a large teddy bear costume.

cute!
>>
>>106760127
Wan 2.5 is not going to have exponentially more parameters than Wan 2.2. It's just like QIE, it's just a model that has longer context and has better input options (e.g. an audio stream).
>>
>>106760133
>why would I want a really shitty sounding song when I can use a better AI to make a good song and then use that to generate a video?
that's too much of a pain, I just want to write text and get a video with sound in the output, that's way more fun that way, having to provide an audio sound you already know removes the surprise magic
>>
>>106760152
>too much of a pain
It's a literal pain to listen to that clip it's so low quality. And we're talking about programming here, it's literally why things like ComfyUI exist, that's the whole fucking point of using node workflows.
>>
i dont want audio in any local video model unless there is an option to completely disable it to increase generation speed. we already have 5b wasted parameters in every local model dedicated solely to generating text on signs thanks to emad.
>>
>>106760077
ngl the song is quite catchy
>>
>>106760113
>the economic damage to OpenAI is too juicy
Oh, fuck. The entire western economy is currently hanging on almost exclusively by delusional valuations of nvidia. All China has to do is keep undercutting OpenAI's releases, or produce cheaper chips with some CUDA equivalent, and the entire thing will come crashing down. And OpenAI is easy to undercut due to the wastefulness of their models (4o vs DeepSeek) and safetyism, and the nvidia embargo means new chips are a matter of time.

This is all going to go really, really bad, isn't it?
>>
Local Sora 2 both porn and anime would be solved. A shame we will never have that.
>>
>>106760163
like I said, I don't want to provide my own audio, I want the model to make everything by itself, I agree that at that moment the sound quality isn't great, but let's not pretend they won't improve on that, they will
>>
>>106760179
Too unsafe, goy.
>>
>>106760183
>like I said, I don't want to provide my own audio
okay you're too stupid to understand what I said
>>
>>106760187
nigga that's gay
>>
>>106760179
Jesus all you people do is complain
>>
>>106760187
Oh no. The providers aren't going to like this one.
>>
https://files.catbox.moe/p8zyu7.mp4
>>
>>106760192
I accept your concession.
>>
File: 00114-2792768966.png (3.2 MB, 1240x1240)
3.2 MB
3.2 MB PNG
I'm going to aim for the VHS recording look next
>>
>>106760167
Shouldn't they be separate, dedicated models that communicate anyway? Like, I want to be able to add arbitrary foley synched to motions, not produce whatever sound the video model thinks is realistic. Like, I should he able to gen an image of a big tiddy asian girl bouncing, and them subsequently prompt "sound of milk jugs being swished around" as a separate process.
>>
>>106760205
>Sora 2 won't allow you to make edgy jok-ACK
>>
>>106760213
1) specialized models are better than general models
2) a specialized audio model will always beat a shitty video model wasting parameters on a shitty audio generator
3) functionally using a separate AI model to generate an audio clip and then feeding it to a video model is exactly the same to a retard like you
>>
>>106760167
>i dont want audio in any local video model
I do, it's way funnier with sound
>unless there is an option to completely disable it to increase generation speed.
fair, at least everyone is happy if there's an option to disable it
>>
File: 1750178826663197.png (1.03 MB, 1024x1024)
1.03 MB
1.03 MB PNG
>>106760148
the girl is dressed in a business suit with a red tie and is holding a black rifle and firing it towards the camera. she is smoking a cigar.
>>
>>106760233
>specialized models are better than general models
This incorrect assumption set back AI by 10 years btw
>>
>>106760233
>they can't do it, it's too hard!!
that's why they're engineers at OpenAI and you aren't, they have ambition and you don't
>>
btw normies doent know about sora 2 theyre all distracted by zucks thing
>>
File: 00126-2373027009.png (2.79 MB, 1240x1240)
2.79 MB
2.79 MB PNG
>>
>>106760251
>theyre all distracted by zucks thing
wait what? did something happened recently with zucc?
>>
>>106760228
Who are you quoting?
>>
>>106760281
debo
>>
>>106760241
No, it’s still correct. AI is fundamentally a statistical solver over data distributions. Diluting that distribution across modalities increases variance and slows convergence. The fix is exponentially more parameters, but that brings diminishing returns: more compute, longer training per step, and far more steps to converge. That’s why specialized models outperform general ones: drastically smaller, yet higher quality in their domain. But I understand why people who own the datacenters want you to believe that's not true, they want every model impossible to run locally even if it's grossly inefficient. Literacy is dangerous in the hands of the peasants after all.
>>
File: 00000-1390880363.png (728 KB, 1024x1280)
728 KB
728 KB PNG
>>
The brain is a generalist that coordinates between specialists. The purpose of specialised neural anatomy is to solve specific problems. You need both, and you need both to be in communication.
>>
>>106760281
>Who are you quoting?
the guys who said you can't do edgy jokes, you need to lurk the previous thread to see that
>>106751481
>anything exciting would have not passed the censorship anyway
>>
>>106760319
oh my bad i must have lost it in all the spam and totally reasonable criticisms
>>
>>106760293
that's it guys, we pack it up, that random anon said it can't be done so it must be true
>>
>>106760327
it's all right, that's why I'm here to remind it, I know how to navigate between noise, it's a skill not a lot of people can get
>>
>Exciting = Edgy
They need to find smarter jeets
>>
File: 00001-1296422259.png (1.55 MB, 1024x1280)
1.55 MB
1.55 MB PNG
>>
>>106760334
>>Exciting = Edgy
we're on 4chan, we only get excitement through edgy jokes, DUH!
>>
File: 00136-1668900265.png (2.99 MB, 1240x1240)
2.99 MB
2.99 MB PNG
>>
>>106760347
>>106760347
>>106760347
>>106760347
>>106760347
>>
>>106760334
And I'm still waiting for Sora 2 to do that one though
>>106751628
>maybe if the Jew dog was dancing while the twin towers were falling I'd chuckle but we both know it's going to be censored
>>
>>106760315
The brain isn’t a single generalist blob, it’s a federation of specialized modules. Visual cortex, auditory cortex, motor cortex, language centers, etc. all evolved for domain-specific processing. Coordination doesn’t erase specialization; it depends on it. And by the way, the brain has trillions of synapses: orders of magnitude beyond any AI model. If you think that comparison justifies wasting parameters on unfocused multimodal models, you’re proving my point: specialization is what actually makes the system efficient. The irony is our brain is more akin to a MoE model with specialized domains all of which filter and prepare inputs for the "generalist" model. Do you think your brain processes the raw auditory data?
>>
>>106759949
>>106760002
>>106760064
>>106760098
>>106760126
>>106760148
>>106760237
The results are quite amazing, are you using the default comfy workflow?
Also how does it handle LoRA's with concepts it doesn't understand?
>>
>>106762189
default comfy with the 8 step qwen edit lightning v2.0 lora (not edit v1), 8 steps, with qwen edit 2509, Q8 version.

it works with loras too, just chain it to the lightning lora (or use them by itself).
>>
>>106762211
Noted, time to download it and play with it.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.