[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: wolfuwinku.webm (896 KB, 352x352)
896 KB
896 KB WEBM
Discussion of Free and Open-Source Diffusion models.

Last bread : >>103475488

>Local (Hunyuan) Video
Windows: https://rentry.org/crhcqq54

>UI
Metastable: https://metastable.studio
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI

>Models, LoRAs, & Upscalers
https://civitai.com
https://tensor.art/
https://openmodeldb.info

>Cooking
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
Forge Guide: https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/1050
ComfyUI Guide: https://comfyanonymous.github.io/ComfyUI_examples/flux
DeDistilled Quants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main

>Guides & Tools
Share the Sauce: https://catbox.moe
Perishable Sauce: https://litterbox.catbox.moe/
Generate Prompt from Image: https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two
Artifact resources: https://rentry.org/sdg-link
Samplers: https://stable-diffusion-art.com/samplers/
Open-Source Digital Art Software: https://krita.org/en/
Txt2Img Plugin: https://kritaaidiffusion.com/
Collagebaker: https://www.befunky.com/create/collage/
Video Collagebaker: https://kdenlive.org/en/

>Neighbo(u)rs
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai

>Texting Neighbo(u)r
>>>/g/lmg
>>
>>103482892
based moving op
>>
File: 00003-4099772872.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
its so over that we're so back
>>
>>103482892
So it's not about collages anymore? Just one single video now?
>>
Blessed thread of frenship
>>
You make next collage
>>
>>103482916
We should prepare a ffmpeg script that takes a few files as input and makes the collage
I don't expect people to bother with video editing software just for this
>>
>>103482916
The collage is ass because it's just the baker's fetish / masturbation.
>>
>>103482916
just like in real life there is only one winner, 2nd and 3rd and 4th, etc. are 2nd looser, 3rd looser, etc.
>>
>>103482958
Also I couldn't help but notice that they've been selecting images/videos that could get the thread nuked, so it's at best low key trolling.
>>
>>103482930
that is assuming all videos are the same aspect ratio
>>
>>103482916
>its only been exactly 1 week since videogen began taking over the general
>it took less time than that for the OP's to start getting ((weird))
jesus this week really flew by i didn't even notice
going to fire up sd in a moment and catch up cause i missed this shit so much
>>
>>103482977
get chatgpt to do a fancy recursive algorithm and resize everything before combining it all
>>
>>103482971
For example...?
>>
>>103483086
I trust your adult brain
>>
File: 1707068556792753.jpg (1.68 MB, 1824x1248)
1.68 MB
1.68 MB JPG
>surely noobai won't understand my schizo prompt
>it does
>>
>>103483211

Hellsing fan i see, managed to get a good render of Zorin or The Captain ?
>>
File: vae decoding oom.png (10 KB, 972x68)
10 KB
10 KB PNG
what causes reforge to do this? im on a new install, has done this in the past (and half why i kept uninstalling then reinstalling later)
>>
>month 23 of LDG
>thread falls off catalog multiple times
>anon still complains about the collages
>anon still has yet to bake his own collage
>>
>>103483281
>month 23 of LDG
ldg isn't that old lol
>>
>>103483281
Big tears buddy, it will get easier.
>>
File: 1703848497276949.jpg (1.4 MB, 1824x1248)
1.4 MB
1.4 MB JPG
>>103483243
Funnily enough, i had absolutely no idea to connect this with Hellsing in any way, it's my attempt to do dumb OCs again.
>>
Congrats moralfags. We get one video of a dog now. This is the OP you wanted.
>>
>>103483322
Yes it's better than the ones that deserve you getting a perma.
>>
>>103483298
itd be funny if he started crying like that other anon
>>
>>103483064
Resize based on the smallest aspect ratio? How do you mean?
>>
if you want godlike hunyuan quality put this in front of your prompts. Thank me later:

shot on a Canon EOS R7, 85mm lens, f/2, sharp focus, movie scene.
>>
>>103483379
>shot on a Canon EOS R7, 85mm lens, f/2, sharp focus, movie scene
thanks anon, I guess that only works for realistic stuff, if you have the same magic prompt for anime it would be cool
>>
>>103483379
>shot on a Canon EOS R7, 85mm lens, f/2
placebo. models don't genuinely interpret those kinds of prompts. post a comparison.
>>
When will the jannies just learn to sit in this thread and ban the pedo for ban evasion?
>>
>>103483388
>two asians kissing being one of the videos in a collage will get the thread nuked
Those anons must be trolling
>>
>>103483427
>gore
>nudity
>pedo videos
The guy you've been replying too has been perma'd multiple times. He's the one who gets literally all his posts nuked daily.
>>
>>103483211
>>103483303
god damn these are good, what noob model are you using?
>>
>>103483447
Huh, you're right. How about that kek
>>
>>103483422
Wait trani is here? Ew!
>>
File: 00018-2896496352.png (1.63 MB, 1080x1384)
1.63 MB
1.63 MB PNG
>>103483447
thats a shame, pedanon seemed based enough but if hes getting nuked and some of those posts are obviously trolling then i can't see him in the same light anymore. rip.
>>
>anon is still mad his slop never made it into the collage
>>
>>103483483
You can thank him when they make it mandatory to verify an email before posting.
>>
>>103482917
Blessed thread
>>
>>103483495
This. The guy has zero opsec—I wouldn't be surprised if he was actually a fed.

Funny how flagrant posting of underage girls always goes hand in hand with abusing high cfg. Must both be low IQ traits
>>
>anon grows bored of current SOTAs video outputs
>infighting begins
It's like a cycle
>>
>>103483530
We know the goal is reached when we don't even have to post anymore
>>
>bro you must be bored if you're annoyed about the guy shitting in the corner
>>
File: 00019-351538351.png (1.56 MB, 1080x1384)
1.56 MB
1.56 MB PNG
>>103483495
>>103483520
I mean it's kinda obvious most anons who try to fuck up threads are feds, inside job and all that. Already been exposed too many times that they intentionally hire people on for moderation/janitorial service with the intent to follow scripted ops and shit intended to push for more pass ownership.
the idea that anything is done for the sake of "advertisers" is hilariously ignorant especially after the fuckups this year alone.
Daily reminder for anyone who forgot or didn't know that BBCposting in certain boards always seems to stop at the same time as certain geopolitical events and at one point a fed forgot to practice bare level opsec and got exposed on /pol/ for stoking infighting and false flagging.
>>
File: HunyuanVideo_00013.webm (1.02 MB, 544x960)
1.02 MB
1.02 MB WEBM
i never posted gore and any nudity was always in a catbox you have me mixed up with another genner

>>103483520
yes i work for the chinese police or CP for short
im really liking guidance 5
>>
>>103483422
why do you even care? he is literally the only one contributing to the thread. maybe if there was two more people contributing you could argue against him, until then shut the fuck up.
>>
I don't mind dancing girl videos but people who got to ruin it for everyone else can get the fuck out tbfh. Things like exposed cleavage and butt cheeks hanging out of shorts will only get the thread nuked. I wouldn't be surprised it is don't intentionally to destroy the general, or so the media can have a field day on condemning AI generated art...
>>
>>103483555
I just think they're mentally ill, even as a job he would only be here for work shifts. He's in here like 16 hours a day.
>>
can u do a couple redheads pls
>>
File: 1707700625728398.jpg (1.36 MB, 1824x1248)
1.36 MB
1.36 MB JPG
>>103483468
That's just vpred 0.75. I don't use merges or style loras of any kind because they deeply fuck with image creativity from my experience. And image creativity on par with wild 1.5 models is the best thing about illustrious.
>>
>>103483576
>he is literally the only one contributing to the thread. maybe if there was two more people contributing
contributions come in many forms other than just posting videos anonie...
>>
>>103483576

That's how I feel desu, if he's posting videos that aren't absolutely obscene let the man cook.


Everyone was starting a conversation about the whole RF inversion thing last thread, does anyone have any examples of it? Or at least a breakdown of what may be possible with it?
>>
>>103483576
Well I care because what he's doing is permabannable and what he's doing will cause collateral damage. That's of course ignoring that 1) ban evasion and constant rule breaking is pushing toward full verification on 4chan and 2) the content is designed to troll and disrupt, that's why they do it.
>>
>>103483578
>He's in here like 16 hours a day.
Mh
Sounds familiar somehow
>>
>>103483576
Why do you type those avatarfags, it's like you all draw from the same well
>>
>>103483579
>redheads
ok sure

>>103483577
>Things like exposed cleavage and butt cheeks hanging out of shorts will only get the thread nuked
are you aware of how retarded you sound

>>103483576
>he is literally the only one contributing to the thread
it is kind of awkward that my gens are the only ones desu
if there were a way to run the fp8 on gradio i would host something on an A100 again
>>
https://github.com/tdrussell/diffusion-pipe
>>
File: 00021-709201737.png (1.51 MB, 1080x1384)
1.51 MB
1.51 MB PNG
>>103483586
>vpred stock
wow, nice. I don't know what the deal is with vpred but i can never tardwrangle it well, even with the right settings, mostly it is just loras and the way merges are done i think.
i stick with epsilon since it plays nicely with loras. still never got a definitive idea on how training loras for vpred works.
>>103483608
yeah its one thing to be a little chaos causer but its kinda obvious by now how they play, its subtle thread ruination until they start getting more people to join in without realizing they're helping make things worse.
i mean case in point now this thread is 99% arguing even after we just, yknow, werent arguing kek
>>
>>103483520
retard still has no idea how hunyuan video works haven't even tried it, yet he thinks he is superior to someone that does

>>103483588
he helped me many times in the past week. he is obviously one of the best anons here, contributing in many forms.

>>103483608
>muh rules
faggot narc. it's the likes of you that get people b& for saying the nword. he is not doing anything disruptable, if he was then the rules should be applied.

>>103483617
why don't you have a single argument?


I RESPECT AND APPRECIATE PEDANON FOR ALL THE HELP AND CONTRIBUTIONS HE PROVIDED FOR /ldg/
>>
>>103483586
>don't use merges or style loras of any kind because they deeply fuck with
this. noob is perfect on it's own (other than the characters it doesn't know)
>>103483629
>Pipeline parallelism, for training models larger than can fit on a single GPU
woah. parralel inference when?
>>
this is professional trolling
>start complaining for no reason about x
>people respond to you and say you're retarded
>omg look how everyone is complaining now because of x!
>>
File: 1706626793896478.png (370 KB, 440x420)
370 KB
370 KB PNG
>>103483629
>HunyuanVideo supports fp8 transformer. The example config file will train a HunyuanVideo LoRA, on images only, in well under 24GB of VRAM. You can probably bump the resolution to 1024x1024 or higher.
>on images only
>>
>>103483647
That's not what I said so feel free to reread what I wrote and address it specifically. He has been perma'd multiple times for his pedo videos and images, so it's not just "oh maybe it's okay", it's very clearly NOT okay and he knows this. He also very much knows what he's doing is disruptive and he does it anyways, so no, I don't think I'm just going to watch him diddle around in civilized circles idly.

Anyways, I'm all for full ID verification to post on 4chan. This site has been ruined by the proliferation of spam evasion.
>>
>>103483670
read the next sentence dumb monkey
>>
>>103483665
phree bumps desu
>>
>>103483686
I read it you dumb nigger, it says that you won't be able to train videos with a 24gb card, or if you do it'll be slow ass resolutions without a lot of frames, DOA
>>
>>103483619
>are you aware of how retarded you sound
not when they are underage, but what ever brainless cunt i'm not getting into an argument about it.
>>
>>103483699
>it says that you won't be able to train videos with a 24gb card
so stop being poor and buy a second 4090
its obvious that's the future of local going into 2027
>>
File: 1716078138681394.jpg (1.08 MB, 1248x1824)
1.08 MB
1.08 MB JPG
>>103483636
I generally prefer Epsilon, but vpred is getting more consistent lately. Higher contrast across the board though.
>>
>>103483718
>so stop being poor and buy a second 4090
are you retarded or something, the 5090 will be released in a month
>>
>>103483718
>its obvious that's the future of local going into 2027
how so? with the way it advances, maybe we'll be able to get the quality of hunyuan with a 2b model, who knows
>>
>>103483718
The future is A6000 Adas, Titan AI RTX and 5090s.
>>
>>103483729
if you're actually planning on getting it in a month start preparing scripts for buying it unironically

>>103483740
sure but there will always be a slightly better model with higher parameters
>>
>>103483740
VRAM is more or less tied to context length. A long video uses a shit ton of VRAM because the context length at best linearly scales the VRAM requirements per frame.
>>
>>103483745
>A6000 Adas
already essentially obsolete
>Titan AI RTX
no evidence of its existence
>>
>>103483396
It doesn't but those tags are correlated with higher quality images so the vectors within HunYuan's model related to quality gets activated.
>>
>>103483579
>can u do a couple redheads pls
your slop, sir
https://files.catbox.moe/vhyjjx.webm
>>
>>103483752
>start preparing scripts
nta but I live in east europoor so I don't expect them to be immediately sold out
>>
>>103483724
This one is great
>>
>>103483752
4090s didn't sell out for like 3 hours after launch, I think the 5090s will last longer because the economy is worse and they'll cost $200-300 more.

>>103483766
A6000 Adas aren't obsolete, they're 48 GB of VRAM at 4090 speed and 300w. 5090s are housefires at 600w and of course 16 GB of VRAM less which is significant especially for training something like videos.
>>
>>103483792
>You still see the human ear under her hair
Shit gen
>>
>>103483724
I just gave base noobai vpred 0.75s a download and honestly youre right, its a lot more consistent than the last time i tried it.
One thing it seems its very sensitive to resolution ratios, widescreen sure seems to force very specific character stylings. Will have to fuck around with this today.

>>103483782
ghaayytt daaammn booiii

>>103483794
that really does put into perspective how much of a worthless paypiggie you have to be to buy a 5000 series in any capacity, from there it's painfully obvious how hard nvidia is just laughing all the way to the bank.
>>
>>103483685
how is it not ok? how is he disruptive?

he is just goofing off. he is not shilling, he is not ruining the thread, he has in fact helped me, and no doubt others, many times over the last week or two

you're just a rules obsessed cuck that doesn't understand why the rules exist in the first place. you have no idea what civilization is

>so no, I don't think I'm just going to watch him diddle around in civilized circles idly
look at what you are saying, fucking narc faggot.3
>I'm all for full ID verification to post on 4chan
of course you do you just bend to whatever authority for any reason

tattle tale narc teachers pet you'll get yours in recess
>>
>>103483820
pedo content is not allowed, period
he's not "goofing" around
>>
>>103483820
The text content of his posts is perfectly fine but I would also prefer he stopped avatarfagging with pedo slop.
>>
>>103483817
>that really does put into perspective how much of a worthless paypiggie you have to be to buy a 5000 series in any capacity, from there it's painfully obvious how hard nvidia is just laughing all the way to the bank.
Yes and no. 5090s will be the best in slot GPU for hybrid gaming and AI. It will likely be the the king of training 1B AI models especially for raw speed.
>>
>>103483831
who cares, tattle tale? it doesn't harm the thread in any way shape or form. our little discussion here is way more disruptive than what he is doing. should we be b&, i'd say we deserve it far more than he does

>>103483851
that i can concede. he should pretend he is two different pedos at the same time so it's less of avatarfagging
>>
we'll need to see how important including videos is for LoRA training video models vs only using images
>>
File: 1702898729059841.jpg (221 KB, 1248x1824)
221 KB
221 KB JPG
>>103483792
I have better ones, they are just not entirely blue board friendly.
The amount of weird shit you can do with noob is insane.
>>
>>103483777
>It doesn't
if this is correct then
>those tags are correlated with higher quality images so the vectors within HunYuan's model related to quality gets activated.
cannot be true. that just sounds like voodoo speak. not trying to bust your balls about it but also
>no comparison vids
>>
>>103483877
enjoy id verifications, you obviously want it too
>>
File: 1731850925680002.png (227 KB, 1796x1621)
227 KB
227 KB PNG
Hosting the PromptRewrite model for an hour, the password is miku:
UI:
https://nil-intimate-madness-educational.trycloudflare.com/

Pic rel is how to prefill the answer. Clicking Generate continues from there. It might need to be told that NSFW is allowed or something like that.

API:
https://nil-intimate-madness-educational.trycloudflare.com/v1/chat/completions
>>
>>103483892
nta but the Chinese do unironically like their tags, its insane how many quality tags they would use for one image
>>
>>103483910
...what do I do with this and how does it help me?
>>
So... It's been months. Where are the non shit Flux porn finetunes?
You guys were so adamant all the tunes were shit back then because "it just got out, just you wait". So, where are they?
>DeDistilled quants
This sounds interesting. What is it? An attempt to approximate pro? How does it differ from regular dev? It's not clear from the examples.
>>
>>103483924
It's the model that rewrites the prompts for HunyuanVideo:
https://huggingface.co/tencent/HunyuanVideo-PromptRewrite
>To address the variability in linguistic style and length of user-provided prompts, we fine-tune the Hunyuan-Large model as our prompt rewrite model to adapt the original user prompt to model-preferred prompt.
>>
>>103483543
Merely pointing out the fact that this dumbass infighting only ramps up after anon gets bored of whatever new thing just came out. We saw it with Flux, too. Thank god we still talk shop desu
>>
>>103483933
>An attempt to approximate pro?
no, it's just a way to remove the distillation of flux dev so that it can be finetuned and work at cfg > 1 (negative prompting)
>>
File: file.png (114 KB, 1082x1194)
114 KB
114 KB PNG
>>103483910
I don't understand what this is, but it completely ignored the last instruction. So I assume it will be as shit for its main goal, too.
>>103483957
I see. Thanks.
>>
>>103483852
how hard is it to just slap a couple more vram sticks on the 5090? it's insane it will only be 32gb. it's unacceptable. it's like they want a competitor to rise up and destroy them. that sort of abuse never bodes well
>>
File: 1724251803212782.png (32 KB, 987x464)
32 KB
32 KB PNG
>>103483965
It's supposed to look like this, more or less. I think for some reason it didn't send the system prompt there.
>>
>>103483965
I think this model understands chinese better than english, I should try that actually...
>>
>>103483670
Hey let's not knock it until we see the results, maybe the knowledge of the video data will understand single images well enough to use it to some degree of kino.
>>
>>103484005
I have a hunch is that the reason sex never has any movement is that it wasn't train on porn videos, only images.
>>
>>103483933
>So, where are they?
Nowhere. No one's trained off dedistilled or even that weird OpenFlux model
>>
>>103484016
If you can't get movement you are a dumb promptlet
>>
>>103484034
what's the prompt to get some action, oh dear prompt god??
>>
>>103483979
It's not hard at all, NVIDIA is already doing it.
The L40 is basically just a 4090 with 48 GB VRAM and some datacenter-exclusive driver features (like being allowed to use them at all) and it's like five times the price.
>>
>>103484034
Catbox me a gen that contains a penis entering any orifice at least twice during the video
>>
>>103484051
how do they get away with this scam?

how long we will be left holding out for a hero?

https://www.youtube.com/watch?v=bWcASV2sey0
>>
>>103483811
Shit didn't notice that, inpainting artifact from when I zoomed in too much
>>
>The MLLM text encoder will be based on HunyuanLarge and will need 1TB of RAM
>>
>>103484166
kek, that's my fair though, that it's a giant ass text encoder...
>>
>>103483979
It's not the difficulty. They want this to be the market segmentation without splitting up the software ecosystem. Pay for HGX, make nvidia richer!
>>
File: HunyuanVideo_00021.webm (1.09 MB, 544x960)
1.09 MB
1.09 MB WEBM
>>103483910
If you use an LLM to make your prompt it'll be 95% the same anyways and the rewriter will just add "Realistic, Natural lighting, Casual" to the end
>>
>>103484166
Source?
>>
>>103483647
>retard still has no idea how hunyuan video works haven't even tried it, yet he thinks he is superior to someone that does
I have tried it, retard. That's how I know embedded_guidance as low as 2.0 and probably lower does, in fact, work.
>>
File: 154.webm (1.53 MB, 1280x720)
1.53 MB
1.53 MB WEBM
>>103483379
I tried this using the Hunyuan example prompt with the woman jogging. I will post results for three seeds with and without the suggested prefix. So one prompt will be:

>In the gym, a woman in workout clothes runs on a treadmill. Side angle, realistic, indoor lighting, professional.

And the other prompt will be:

>Shot on a Canon EOS R7, 85mm lens, f/2, sharp focus, movie scene. In the gym, a woman in workout clothes runs on a treadmill. Side angle, realistic, indoor lighting, professional.

I am using 1280x720 resolution, 129 frames, 50 steps, embedded guidance scale 6.0, and flow shift 7.0. The seed will be in the filename, the order of the prompts will be randomized.
>>
File: 154.webm (1.43 MB, 1280x720)
1.43 MB
1.43 MB WEBM
>>103483379
>>103484226
>>
>>103482892
wolfu!
>>103481817
64 ram
>>103481825
barely got sage attention in with guide
>>103481829
8 vram using nvidia driver cuda sysmem fallback policy to prefer sysmem fallback
>>
File: 254.webm (1.45 MB, 1280x720)
1.45 MB
1.45 MB WEBM
>>103483379
>>103484231
>>
>>103484226
she runs like a character in a shitty Unity game
>>
File: 254.webm (1.31 MB, 1280x720)
1.31 MB
1.31 MB WEBM
>>103483379
>>103484241
>>
>>103484212
no it doesn't. it doesn't follow the prompt if its that low, but you as a retard doesn't notice that the prompt is different than the output so you say stupid shit like that. honestly embed guidance 6 is too low, you need like 9 to make it actually folllow, and even then...
>>
File: 354.webm (1.32 MB, 1280x720)
1.32 MB
1.32 MB WEBM
>>103483379
>>103484249
>>
>>103484235
if you can't load the text encoder (15.5gb), how do you expect to load the video model, it also asks for a shit ton of vram
>>
File: 354.webm (1.67 MB, 1280x720)
1.67 MB
1.67 MB WEBM
>>103483379
>>103484255
My personal opinion is that there is no discernible difference.
>>
>>103484226
>>103484231
>>103484241
>>103484249
>>103484255
i love how from day one this thread proved the model could nail titty jiggle when the tits are clothed but had trouble nude
what i wanna know is how they sourced enough of a gargantuan "girl running on treadmill" dataset to accomplish this in the first place
>>
>>103484250
>it doesn't follow the prompt if its that low
Did you start doing this last week? That has ALWAYS been the trade-off with lowering cfg, and is why the value is available to us for adjusting. You think lowering cfg didn't mean weaker prompt adherence in SD1.5?
>>
>>103484269
Yeah, those are all functionally equivalent. Thanks for testing.
>>
File: HunyuanVideo_00036.webm (120 KB, 352x640)
120 KB
120 KB WEBM
I had no electricity or internet for the past 24 hours
Did any major breakthroughs happen that make Hunyuang more usable for VRAMlets?
Will I finally be able to gen videos longer than 2 seconds?
Only 12 GB of VRAM here
>>
>>103484269
Perhaps it's just due to the low sample size but to me it looks like there is a tendency to have simulated diffuse light like there's someone with a diffuse studio lamp outside the scene adding some light.
>>
>>103484262
you misunderstand with my hardware and the comfyui setup it is working fine only answering questions people had on workflow
>>
>>103484291
Okay, try to make a guess as to which videos were made using which prompt.
I used a random number generator to determine which video I post first for each seed.
>>
>>103484250
>but you as a retard doesn't notice
Post your skin color please.
>>
>>103484289
IDK about 12GB exactly but the blockswap node *is* saving VRAM and compared to the intitial release there are more toggles that offload stuff to system RAM as well as more quantization options in the comfyui nodes.
>>
A: >>103484226
B: >>103484231 >>103484249 >>103484255
>>
>>103484274
i started it with flux, the main difference being that hunyuanvideo looks good on 6 or even 9, better than 2 (someone did a comparison, it completely breaks the video). with flux values that high only look good with certain loras, not even dynamic thresholding can save it.

>>103484308
full white italian. get through your tiny polish brain. hunyuanvideo is different than flux
>>
>>103482892
Back when Stable Diffusion first came out there was a feature where you could emphasize parts of the prompt with brackets.
Does ComfyUI have that feature with Hunyuan?
>>
>zoomers need millions in R&D and thousands in equipment to turn a 3 word prompt into a 10 word prompt because thinking for 20 seconds without receiving a reward deprives them too much of dopamine
Jesus Fucking Christ you're fucked in the head.
>>
>>103484335
>italian "white"
>started with Flux

I'm Anglo-Saxon and Icelander and I've been doing this for over two years across every major model. Some of the genning techniques you've learned were invented by me. Sit down.

"Someone did a comparison" lmfao. You don't know anything.
>>
>>103484353
I'd say 6/10 ragebait, it's a bit funny but that's too long, gotta work on that anon
>>
File: 00036-1235016708.png (2.21 MB, 1168x1504)
2.21 MB
2.21 MB PNG
what did gen z do to make that guy so angry?

>>103484370
generous, i give it a 3, just enough credit for me to reply without quoting.
>>
>>103484370
"Ragebait"? He's simply correct
>>
>>103484353
That'd be rational. How much time do you spend typing on a smartphone for 150 images if you do that?

Granted it's much worse if the training wasn't done specifically for what the model actually reacts reasonably to.

And it could just as well be wildcards.
>>
>>103484376
see, that one deserves a 7/10, it's short and simple, that's how we do it
>>
>>103483782
lovely
>>
File: 1716822481176749.mp4 (342 KB, 720x730)
342 KB
342 KB MP4
>>103484363
>I'm Anglo-Saxon and Icelander
and you're proud of that?

here faggot homo get absolutely btfo'd

feel free to try your lower cfg scales. i'm not wasting my precious gen time
>>
>>103484291
>there is a tendency to have simulated diffuse light like there's someone with a diffuse studio lamp outside the scene adding some light.
Which has nothing to do with the original post stating prompts that include camera specs "makes the videos look better"
It's just placebo, anon
>>
>>103484433
>he can't tell that the 0.0 obviously follows the prompt just fine and evidently has a more natural texture to it
>he says things like "absolutely btfo'd" and continues to be Italian
Please, keep going.
>>
>>103484454
>>he can't tell that the 0.0 obviously follows the prompt just fine and evidently has a more natural texture to it
this has to be bait. NIGGA, THE WORLD IS WARPING AROUND HER, NIGGA, WTF
>>
File: dog egg displeased.jpg (56 KB, 510x680)
56 KB
56 KB JPG
what the fuck is happening to the thread today
>>
File: HunyuanVideo_00023.webm (833 KB, 544x960)
833 KB
833 KB WEBM
low cfg anon has gaslit me into lowering my embedded guidance to 5 and i kinda like it but i will go no lower
>>
>>103484477
trust the plan TM
>>
>>103484463
You're the first one to ever notice that high cfg values exhibit more coherence and 'perfection' in their execution of the prompt. Please, tell me more that I don't know.

I'm going to cut this off because it's clear the Italian mind struggles to parse the meaning of English statements like "x has a more natural texture".
>>
>>103484080
Because CUDA started in 2007 and it takes time for anyone else to catch up with the insane ecosystem NVIDIA cultivated.
2007, imagine that... I don't know how he does it, but the CEO seems to always smell the next big thing.
>>
I have noticed that lowering the guidance on the same seed helps with the 2.5d look when it happens.
>>
>>103484270
stock videos + youtube/social media is full of these
>>
stop worrying and love the slop
>>
>>103484507
I think that tends to be the case when the prompt is ambiguous (as horny prompts often are). High guidance means that ambiguity in the prompt has to be respected, and should also be present in the resulting image/video—low guidance allows the model to resolve any developing ambiguity in one or another direction that, prompt aside, produces a more plausible real image/video.

Roughly speaking ofc
>>
>>103484161
Last one of these, too high effort
>>
File: HunyuanVideo_00026.webm (1.31 MB, 544x960)
1.31 MB
1.31 MB WEBM
>>
>>103484546
I think it happens because horny prompts are horny, not because they are ambiguous. A mere mention of tentacles will shift it into 2.5d no matter how many words for "realistic" you can come up with.
>>
>>103484565
>mere mention of tentacles will shift it into 2.5d no matter how
dataset issue, how many live action tentacle videos vs 2d ones
>>
>>103484565
If your prompt says "realistic" (photo-esque digital art) and "tentacles" (anime) you have created an ambiguity which the model doesn't know how to resolve.

I guess it's not clear what I meant, but I meant that the prompt is ambiguous to the model, not ambiguous to a human reader. The styles with which each prompt word are laden are in competition for control of the output. This is especially true with horny prompts, because they strongly indicate both 3dpd porn and hentai/etc.
>>
File: 00043-3443217174.png (2.32 MB, 1504x1168)
2.32 MB
2.32 MB PNG
>four fucking minutes to this one gen
>dunno why it gave stocking down syndrome
reforge is fucked, too tired to re learn all of this from only a week of being out
>>
>>103483910
Thank you for your service.
>>
>>103484488
>anglo can't into reality
ancient anglo problem. observable in bacon already. that's why you lost not only one, but two countries you created.

low cfg looks bad. it's all warped, it's all ghosts. yes, lower cfg for flux do look better, 3.5 is already too damn high, however, even then it destroys human anatomy, it's basically sd1.5 at >2.1 cfg.

also italians have a visual mind. they're master of aesthetics. feel free to point out any relevant english painter, you can but it's
>dante gabriel rosetti
that's not a common anglo name is it? that's because HE'S ITALIAN
>>
>>103483948
any way to get it working on comfy?
enter whatever you want
let it rewrite
use that automatically as text input
>>
>>103484226
Solution:

>In the gym, a woman in workout clothes runs on a treadmill. Side angle, realistic, indoor lighting, professional.

>>103484231 >>103484249 >>103484255

>Shot on a Canon EOS R7, 85mm lens, f/2, sharp focus, movie scene. In the gym, a woman in workout clothes runs on a treadmill. Side angle, realistic, indoor lighting, professional.

>>103484226 >>103484241 >>103484269

So >>103484334 was 3/3 correct if he meant to only separate the samples into two groups but 0/3 correct if he meant that A is without and B is with the prefix.
>>
File: 1730664899868486.png (64 KB, 859x529)
64 KB
64 KB PNG
>>103484309
>blockswap node
How much VRAM do you save with this? Enough to get a few more frames in a gen?
You just activate everything?
>>
>>103484477
We call it "diffusion"
>>
>"ldg bakery anon" deleted the rentry once again
fucking kekd :(
https://rentry.org/ldg-bakery
>>
>>103484549
>>
>>103483576
>maybe if there was two more people contributing
there's none because everyone was chased away by the cringe
>>
>>103483887
Very nice
>>
>>103484488
i'm cooking something up with embed cfg 2. wait to be destroyed
>>
>>103484600
>reforge is fucked, too tired to re learn all of this from only a week of being out
Isn't reForge supposed to be a drop in replacement for Auto/Forge?
>>
>>103484631
>swarthoid modern Italian claims brotherhood with Renaissance era northern Italians
Nice try. You are not Fra Angelico, you are a 100IQ retard who makes porn on your computer.

>low cfg looks bad. it's all warped
You really did arrive here yesterday. I'm not retreading arguments that we already did to death two years ago. You offer the most basic diffusion101 observations as arguments against my position. Has it occurred to you that I might be aware of the trade-offs involved with choosing a lower cfg? Maybe after I have acknowledged again and again that I am aware of them?

You somehow can't see the overcooking that is visually obvious in 6.0+, maybe because your eyes aren't trained enough yet, but that's what you're going to have in your gens forever if you can't accept more chaos and more botched gens which are the price you pay for lower cfg.

I am aware of the arguments against low cfg. You don't know who you're talking to and you're too new to be acting this arrogant in arguments.

>>103484788
Low CFG gens are more volatile, bad 95%+ of the time. What could you possibly prove? The low CFG genner has to dig through many failed results to find the gems.
>>
File: kekd.png (66 KB, 360x346)
66 KB
66 KB PNG
>>103484488
>it's clear the Italian mind struggles to parse the meaning of English statements like "x has a more natural texture".
>>
>>103484208
>>103484478
You've shown /b/ yes?
>>
>>103484827
way to shift the goal posts. yes, very practical to do 20 15-20 minute video gens to get 1 (one) good one. that's 5 hours for a 1 (one) gen. what you're not understanding is that >6 cfg for hunyuan is absolutely fine, totally unlike other image models and it's not me saying that it's the chinese. if you had tried the model you would understand.

a few more minutes and you're going to feel ridiculous

>>103484916
not funny, and untrue
>>
>trust me just a few more minutes!
>>
>try to buy credits for hunyuan video
>you need a fucking "coupon code"
Do they just hate money?
>>
>>103484969
>what you're not understanding is that >6 cfg for hunyuan is absolutely fine
I can see the gens. I can see that it isn't. I've also been genning videos myself, no matter how much you bizarrely insist I haven't.

I understand that you think the cfg values being different between Flux and Hunyuan is something new which I'm failing to take into account, but if you'd been here before Flux you'd know that cfg scales have been different on every model ever released. 4 cfg was very low on SD1.4, but relatively high on SDXL, very high on FLUX, extremely low on WDXL, etc. The numbers are almost arbitrary. My judgments about what the cfg should be were based initially on what I was seeing from results at different guidance values, and later based on my own experience genning. Hunyuan is not uniquely different, the effects of going higher or lower are more or less the same.

>>103485016
He's genning a video. It takes time.
>>
>>103485016
look, i'm doing another gen with cfg 6, then i can post and we can compare. if i'm proven wrong i will apologize, if i'm not i think you should. and even then, there are still a very valid reason for high cfg specially for video. we'll talk about it when the other gen finishes.
>>
>>103485035
the ali express way lol
>>
Might pull, dunno
>>
Sar...
>>
>>103485047
You're wasting your time because you can't prove anything by making a comparison of two videos. What could such a comparison demonstrate that would contradict what I said a few threads ago here: >>103452901 ?
>>
>>103485066
>it gets bigger when i pull it
>>
will she come back if i pull?
>>
File: 1708398947757821.png (319 KB, 537x469)
319 KB
319 KB PNG
>>103485074
hmmmm...
>>
>>103484543
never
>>
File: HunyuanVideo_00002l.mp4 (642 KB, 1088x960)
642 KB
642 KB MP4
>>
>>103485144
Hey, that's my big titted elf streamer
>>
>>103485144
It's strange how much more mesmerizingly real her face on the right looks. Shame about the glasses. I wonder if you can shuffle some variables around and get rid of them.
>>
NOW I might pull
>>
File: HunyuanVideo_00005.mp4 (638 KB, 1088x960)
638 KB
638 KB MP4
>>
>>103485144
>>103485168
I am unable to refrain from ogling at those tits
>>
>>103485144
The one on the right isn't wet enough. She needs to be wetter.
>>
>>103485186
Anon is this close to inventing a virtual youtuber.
Just invent a generative neural network that works on old crypto asics. There must be a shit ton of them, useless because of the algorithms getting more complex.
>>
>>103484646
> was 3/3 correct if he meant to only separate the samples into two groups
That was me and I was the anon who contradicted >>103484269 just about there not being a discernible difference for this >>103484291 reason. There apparently is and it's not pure placebo?

>>103484672
I really don't get the exact behavior between all the model load/unload/moves, caches and various parameters... and with how slow this is to try, I don't want to experimentally find out.

But it seems to allow a lot more frames.
>>
File: HunyuanVideo_00007l.mp4 (747 KB, 1088x960)
747 KB
747 KB MP4
>>103485154
thanks for the inspiration

>>103485220
when we have working loras for hunyuan then its over, we will be able to add any character to any video
>>
>>103482892
I tried copy-pasting random TikTok descriptions into Hunyuan and it seems to kind of work.
Though only afterwards did I realize that I made the critical mistake of generating a horizontal video.
>>
File: 1725766970695599.webm (368 KB, 960x544)
368 KB
368 KB WEBM
EMBEDED CFG 0
>>
>>103485236
>when we have working loras for hunyuan
You can already try:
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/pull/72
https://github.com/tdrussell/diffusion-pipe
>>
>>103485291
Neat
>>
File: HunyuanVideo_00238.webm (256 KB, 960x544)
256 KB
256 KB WEBM
EMBEDED CFG 6


>>103485291
actually it's embeded cfg 2
>>
I guess I must apologize. I'm sorry, I was wrong.
>>
>>103485233
>But it seems to allow a lot more frames.
what did you use and how many frames do you use now?
>>
>>103485236
>when we have working loras for hunyuan then its over, we will be able to add any character to any video
I'd rather have longer outputs (10-15s) and img2video
>>
>>103484824
Right but it puts back in the older Gradio frontend. Forge originally took A1111's frontend and then did full on replacement of the backend code with ComfyUI's backend and mashed it together with some other code changes essentially and then evolved from there focusing on the frontend mostly while merging in ComfyUI stuff on the backend. It's why there are some regressions in a few niche areas like Intel Arc support because the A1111 backend hacked in operator override support for Intel's Extension for Pytorch and ComfyUI doesn't.
>>
>>103485321
However...
>>
>>103485233
Once again, the original post is nonsensical.
>>103483379
>if you want godlike hunyuan quality put this in front of your prompts. Thank me later:
>
>shot on a Canon EOS R7, 85mm lens, f/2, sharp focus, movie scene.

It failed at both making outputs "godlike" AND making outputs resemble anything like the camera specs prompted. The fact that the original poster had no examples should have been enough for anon to quickly dismiss. Have you ever seen a widely used dataset which specifies camera specs? NO!

These plebbit-tier placebo prompts ALWAYS pop up with every new model. The only time anything similar to "please make this picture pretty" works is with Booru models because they are explicitly trained with "highres" etc.

>it's not pure placebo?
It is. It always is. And anon just showed you.
>>
>>103485291
Unusually coherent for 2.
>>
File: HunyuanVideo_00011.mp4 (725 KB, 1088x960)
725 KB
725 KB MP4
>>103485295
nice, hopefully we will see some tests soon, it intrigues me how you can train loras for a video model only using images, I think the lora will outfit anything in the video, meaning your character will be imposed in the video gen

>>103485291
nice
>>
If we get 97 frames@544x960 with 24GB VRAM, how much more could wish for using 32GB (5090) or 48GB (2x3090/4090 if someone figures how) ?
Is it linear so 129 with 32GB and 194 with 48GB?
>>
>>103485358
I love the reddit-tier placebo prompts. Love getting ideas for injecting nonsense into my prompt, no matter where from. Sometimes it makes for good or fun results, sometimes not, but always good to change things up.
>>
anyone tried using Chinese to see if the result is any better?
>>
>>103485333
I still think high cfg is better for video because, let's face it, videogen is only good for pornos or memes, and you require closer prompt adherence for those.

low cfg pros
better composition
better colors

low cfg cons
phantasms likely require higher step count to fix
possibly (probably) lower prompt adherence
possibly higher likelihood of anatomy deformations
>>
File: HunyuanVideo_00013.mp4 (809 KB, 1088x960)
809 KB
809 KB MP4
>>
>>103485402
try with cool sunglasses
>>
>>103485381
That's perfectly fine, but anon shouldn't go around claiming they "make outputs look godlike" or even push the model to output anything resembling the camera specs prompted. Because that's simply inaccurate.
>>
>>103485358
I'm pointing out it probably does something.
It might have shifted more towards a film shot with more lighting provided by studio lights or assistants perhaps.

Whether it's an accurate or interesting representation of some Canon camera? No clue whatsoever.
>>
>>103485391
>videogen is only good for pornos or memes
If you have an insane amount of compute/vram, it's probably usable for games or movies.
Vidtovid would be like mocaping and imgtovid a cheap way to small animations in videogames for example.
>>
File: HunyuanVideo_00457.mp4 (120 KB, 320x240)
120 KB
120 KB MP4
12gb hell. This is all I've got

>>103485391
>and you require closer prompt adherence for those.
Memes certainly. Pornos—it depends. I have always preferred to just ogle a certain kind of woman, and just to see her idly shift her weight or glance from one side to the other makes it more real to me. So for 'pornographic' purposes I can be content with very low cfg.

But another kind of anon specifically wants to see Ana De Armas in a princess peach costume being split in half by Johnny Sins with a cock the size of an arm, and if that's what you need then you need big cfg
>>
>>103485422
>It might have shifted more towards a film shot with more lighting provided by studio lights or assistants perhaps.
Wouldn't explicitly prompting for this instead of some illusive arbitrary camera specs that the model doesn't actually "understand" achieve that better?
I understand your sentiment, but there's simply a smarter way to go about it.
>>
File: 1661558256755.png (43 KB, 446x181)
43 KB
43 KB PNG
>>103485449
8gb hell. This is all I've got.
>>
>>103485322
Don't remember exactly but I just tried again and 150 frames worked. I know it crashed with some number above 50 below 100 frames before.

Granted I also did comfyui updates and sageattention instead of sdpa at some point. But I recall blockswap did help very noticeably. =
>>
>>103485471
tell me a prompt, aspect ratio, and cfg setting you want and I'll gen five tiny videos and post the best one
>>
What determines how long the prompt can be for HunyuanVideo?
>>
>>103485474
>I just tried again and 150 frames worked
Oh, very nice, I'll try that then, that's 2s more.
>>
>>103485449
>>103485471
You can generate 960 x 544 x 77 on 8gb with all offloading possible and using CUDA SysMem Fallback. Needs more system ram for all that offloading.
>>
File: 1724271444242499.png (1.57 MB, 1280x1024)
1.57 MB
1.57 MB PNG
>>103485444
>Vidtovid would be like mocaping and imgtovid a cheap way to small animations in videogames for example.
I was playing around with FMV lora for flux, made me thought i could make my own FMV game, just filming myself in my room acting out the scenes and the script and then run them through a vid2vid to make me be a princess or a lizard, etc. but i guess that's not possible yet = (
>>
>>103485466
I did NOT propose the prompt.

This may be crazy gibberish but arguably if anon aesthetically likes it: What more precise alternative keyword variants activate better? Pretty sure we don't have a tool to do that?
>>
>>103485497
The way I have it is 20 40 and both offloads enabled. Not as a result of extensive experimentation to find the optimal choices, mind you.
>>
File: 1716318691951938.png (1.74 MB, 1280x1024)
1.74 MB
1.74 MB PNG
>>103485449
the problem with low cfg is that it's going to increase anatomical deformities, specially with movement, and create all these phantasms and artifacts. it's probably better to leave the prompt's somewhat vague and higher cfg for variety.
>>
>>103485508
>, just filming myself in my room acting out the scenes and the script and then run them through a vid2vid to make me be a princess or a lizard, etc. but i guess that's not possible yet = (
we are so close anon
>>
File: ComfyUI_01528_.png (1.6 MB, 1280x1024)
1.6 MB
1.6 MB PNG
>>103485531
so hopeful for the future
>>
I heard there were vramlets in here jealous of videogen
>>
>>103485545
Many such
>>
File: 1733096131220196.png (1.55 MB, 1280x1024)
1.55 MB
1.55 MB PNG
>>
File: 10.jpg (55 KB, 825x446)
55 KB
55 KB JPG
>>103485524
so picrel
thanks anon
>>
File: HunyuanVideo_00019l.mp4 (814 KB, 1088x960)
814 KB
814 KB MP4
>>
>>103485481
>Marge Simpson from The Simpsons, wearing a vibrant two-piece bikini in a slow-motion jump, with the camera angle focused on her upper body from a slight overhead view. In the middle of the jump, arms extended upwards, torso arched, with a smile on her face. Subtle motion blur to enhance the slow-motion effect. The lighting is bright and airy, an outdoor setting.
9:16
cfg 4.5

>>103485507
Gonna try it then when it gets even more streamlined, but for now I'm content living through anonymouses.
>>
>>103485508
I need to learn to sway my hips when walking, and then slap a vidtovid of a hot woman.
>>
>>103485569
NTA but I wouldn't advise this unless you have a ton of system ram
>>
>>103485545
videogen killed the vramlet stars
>>
>>103485402
How is you doing these? Is it some kind of vid2vid? Or just normal Hunyuan workflow?
>>
>>103485587
How much ? I have 64GB.
>>
>>103485596
How are* is what I mean to write, what the fuck is wrong with me.
>>
>>103485569
Yes. Maybe these are the maximum (most VRAM saving) settings possible on this node, but I really haven't experimented much.
>>
>>103485587
Works on 64GB for me. Some people probably have 128 or 256GB already.
>>
>>103485600
Probably enough.
I have 32GB and blue screened by using similar settings
>>
File: 1724680699173477.png (1.39 MB, 1280x1024)
1.39 MB
1.39 MB PNG
>>103485582
yep. luckily we probably have hundreds, if not thousands of videos about how to walk like a bio woman from our transfolk friends.
>>
>>103485490
>"--text-len", type=int, default=256, help="Maximum length of the text input."
>"--text-len-2", type=int, default=77, help="Maximum length of the second text input.",
Can I just change the first one and get infinite prompt following?
>>
>>103485582
>>103485640
it's not hard just follow a straight line when walking, women have natural hip swaying but they can exaggerate it using that trick
and men can have it a little when doing the same
>>
>>103485573
working on it. Should take 10 minutes for 5.
>>
>>103485627
I see, well should be good then.
>>
>>103485668
Thanks.
>>
File: HunyuanVideo_000212.mp4 (754 KB, 1088x960)
754 KB
754 KB MP4
>>103485596
rf-inversion workflow in new commit
>>
File: 1717049953739119.png (1.57 MB, 1280x960)
1.57 MB
1.57 MB PNG
>>103485661
yeah, men walk with their feet on each side and women walk with their feet straight pointing forward. top trooning advice for our transfolx
>>
File: HunyuanVideo_00040.webm (509 KB, 544x960)
509 KB
509 KB WEBM
>>103484943
/b/ is too poor to gen video
>>
File: HunyuanVideo_00476.mp4 (230 KB, 208x368)
230 KB
230 KB MP4
>>103485678
It's not looking good so far. Might be pushing unacceptably low on image dimensions.
>>
File: HunyuanVideo_00023.mp4 (750 KB, 1088x960)
750 KB
750 KB MP4
oh no, kek

time to hoard some videos then
>>
File: HunyuanVideo_00041.webm (868 KB, 544x960)
868 KB
868 KB WEBM
time to start my fake thot tiktok
>>
File: HunyuanVideo_00477.mp4 (124 KB, 208x368)
124 KB
124 KB MP4
2/5
>>
>>103485731
Like something from a le lost HAUNTED Simpsons episode zoomerbait video
>>
>>103485699
What does it do? Is it just another kind of vid2vid?
>>
>>103485713
>we will have a VR version of this at some point for maximum acceleration
>https://www.youtube.com/watch?v=WxmICiOXw2c
>>
File: HunyuanVideo_00478.mp4 (95 KB, 208x368)
95 KB
95 KB MP4
3/5
>>
File: HunyuanVideo_00479.mp4 (287 KB, 208x368)
287 KB
287 KB MP4
another total failure
>>
>>103485770
comfy
>>
File: HunyuanVideo_00480.mp4 (166 KB, 208x368)
166 KB
166 KB MP4
sorry anon.
>>
>>103485766
It's like V2V but better because it uses the input to generate the noise instead of adding random noise to every frame. Like inverting the generation back into noise.
>>
So... What the fuck is flow_shift
>>
>>103485748
catbox for that one? did you change some values compared to kijai's workflow?
>>
Did anyone try using a different text encoder?
>>
File: aaa.gif (1.52 MB, 640x640)
1.52 MB
1.52 MB GIF
>>103485748
>>
File: HunyuanVideo_00028.mp4 (1.69 MB, 1088x960)
1.69 MB
1.69 MB MP4
>>103485766
pretty much
>>
File: 1718867851196307.png (157 KB, 1500x870)
157 KB
157 KB PNG
>>103485935
to change the strength of this v2v you change the gamma or the eta thing?
>>
>>103485755
wow batiful baby
>>
>>103485935
does it take more time or vram etc? or pretty much the same?
>>
>>103485821
Oh god these are so cursed. But thank you anon, at least it knows the character and #2 was really close to the modern art style. There is hope.
>>
>>103485987
its twice as long because now the model has to add the noise + denoise
>>
File: HunyuanVideo_00030l.mp4 (1.44 MB, 1088x960)
1.44 MB
1.44 MB MP4
>>103485953
the resampler, start_step, end_step and eta_base, also eta_trend
>>
>>103486010
Can you test if you can turn her into an animal? like a walking Tiger?
>>
>>103482892
Is there any general for weird canny valley porn made by Hunyuan yet?
>>
>>103486010
what are the best values for you?
>>
File: HunyuanVideo_00033l.mp4 (1.2 MB, 1088x960)
1.2 MB
1.2 MB MP4
>>103486017
yeah, im trying but i'm not having success so far, only can do "surface" changes, it may work for enhancing videos but nothing crazy, it needs to have the option to multiply sigmas in order to preserve the structure at least
>>
>>103486029
there's an ai porn thread, I wish it was more active though so that we can discuss on our findings, like what hunyuan is able to do or not
>>>/aco/8628498
>>
>>103486084
The silence is probably everyone genning stuff too hardcore for 4chan, when a prompt fails they post the result so it isn't completely wasted.
>>
File: 1719294809172594.mp4 (140 KB, 720x900)
140 KB
140 KB MP4
https://xcancel.com/huwhitememes/status/1866701055570088241#m
kek, didn't know Hunyuan could render Elon Musk
>>
>>103486063
ok thanks anyway
>>
>>103485755
nice bobs lik lik
>>
>>103485880
there is the one that just got released, I wonder if it works
>>
local hunyuan doesn't have any censors, just like stable diffusion right?
>>
>>103486127
nope, nothing >>>/aco/8639690
>>
>>103486123
>there is the one that just got released
which one?
>>
>>103486102
not enough people with 24GB cards, it was way more animated when the cloud models were out at first before being more censored
>>
>>103486130
Kek, I just posted that. I didn't know we had an AI porn thread until that anon just linked it
>>
>>103486130
Welp, I guess my life next month will be entirely centered around trying to get a 5090 without paying scalper niggers.
>>
>>103486148
I bet, my 4090 is worth more than when I bought it new
>>
>>103486141
the one from onediffusion
https://huggingface.co/lehduong/OneDiffusion/tree/main/text_encoder
>>
>>103485769
yeah in 80 years
cant wait literally
>>
>>103486171
that one could work on Hunyuan? interesting
>>
>>103486161
yeah it's crazy, I'm almost tempted to sell mine while waiting for the new 5090
>>
>>103486177
>that one could work on Hunyuan
no idea
>>
>>103486189
there was an anon trying to make it work with JoyCaption but the prompt understanding was completly ass, as expected I guess
>>
>>103486171
>t5-xl
But wasn't the point to change the text encoder to a regular multi-model LLM?
>>
>>103486201
But it does work, and it's as good as the default. You are just dumb.
>>
>>103486209
yeah it was the point, but we still got the ducktape MLLM, not the original one
>>
>>103486220
>it's as good as the default
good joke
>>
>>>/aco/8639708
Holy fuck dude...
>*Goes to e-bay to find a 4090*
>>
File: file.png (152 KB, 1279x1185)
152 KB
152 KB PNG
>>103486209
Yeah and I have no idea why they did that. Lumina-Next used Gemma 1 2B which OneDiffusion based their architecture on. Updating that to Gemma 2 2B or Qwen 2.5 would've been better than using T5-XL.
>>
>>103486161
>>103486180
does the same apply for the 3090? i may finance my new 5090 with it
>>
>>103486313
>does the same apply for the 3090?
not really, the 3090 doesn't support fp8 torch compile and fp8_fast, that's why the 4090 is so valuable, a video that a 3090 takes 20 mn to do, the 4090 can do it in just 6 mn
>>
>>103486313
No but its value used didn't really change since a year ago, which is also a sign it's still in demand.
>>
Happening alert

Hyvid lora training code released

https://github.com/tdrussell/diffusion-pipe
>>
>>103485569 (you)
Using picrel on a headless linux, I reach 43GB usage on RAM and 15GB VRAM on my 3090.
150 frames went through, I wonder if I can go higher. 175? 200?
>>
File: 9PFVqEN.gif (64 KB, 255x247)
64 KB
64 KB GIF
>>103486109
>>
>>103486371
how much vram needed to train some videos on hunyuan with loras? more than 24gb?
>>
>>103486010
>start_step, end_step
So in other words basically how controlnets work? They effect the frames at 50% of steps of set end step to 0.5? Then the final steps control is released for the sampler to denoise, if so it will be best to leave start_step at 0 and only experiment with end_step
>>103485837
its useful in vid2vid, higher values will alter it more, and lower values less, that is all i know anon
>>
>>103486371
>>103486392
>HunyuanVideo supports fp8 transformer. The example config file will train a HunyuanVideo LoRA, on images only, in well under 24GB of VRAM. You can probably bump the resolution to 1024x1024 or higher.

>Video uses A LOT more memory. I was able to train a rank 32 LoRA on 512x512x33 sized videos in just under 23GB VRAM usage
>>
>>103486371
I have a hard time understanding what would be a lora training for a video model.
>>
>>103486371
so, you'll use the wrong text encoder to train a lora of hunyuan? or you don't need the text encoder when you train a model?
>>
>>103486407
the same thing as lora for image models, you want to add a celebrity, a pose, a style? you'll be able to do it on hunyuan
>>
>>103486340
>>103486340
where I live the 4090 is still around the same price it was a year ago. should I buy a 4090 and then sell it alongside my 3090 when the 5090 comes? is that a sound investment?
>>
>>103486409
We never trained t5 with flux either.
>>
>>103485837
>So... What the fuck is flow_shift
it's basically how you alter the curve on the scheduler, that's all
>>
How long does a single video gen take on a 4090?
>>
>>103486371
where is AI to analyze the repository for backdoors?
>>
File: Card Crusher.png (460 KB, 780x586)
460 KB
460 KB PNG
>>103486084
>>103486102
patience
>>
>>103486433
960x544x97f -> 6 min
>>
>why is 4090 so expensive!?!
There is a country of over a billion people where the sale of 4090s are prohibited but you can easily pop the core out of a 4090 to make frankengpus.
>>
>>103486433
about 3 fiddy seconds
>>
>>103486440
This. I will wait for furk to update his patreon.
>>
>>103486426
lmao
if you asked that seriously, no
wait a month
>>
>>103486433
6 minutes on 4090, ~13 minutes on my 3090
>>
>>103486467
damn. do you know if the rtx 4070 super had the same scalper problem the 4090 had? i remember the day it came out you could get one for official prices easily where I live. would be nice if the same happened to the 5090
>>
>>103486495
>rtx 4070 super
Mid life refresh card. Those never have the same day1 scalper problem (it's also a 12gb piece of shit)
>>
>>103486489
>~13 minutes on my 3090
how many steps at what resolution? i'm getting 17 minutes on 40 steps 960x544
>>
>>103486495
that is very sad
>>
>>103486526
>40 steps
Any enhancement expected when going to 40 steps instead of 30?
>>
File: HunyuanVideo_00049.webm (2.5 MB, 544x960)
2.5 MB
2.5 MB WEBM
i have absolutely no idea if the SAE vae has helped with prompt adherence btw
>>
>>103486536
i think so yeah
>>
>>103486371
Its over... for image-gen.
>>
>>103486526
Be happy, my 3090@300W generates in 30mn for ~150 frames.
>>
>>103486526
I've overclocked the memory to the maximum it could get without crashing. 960x544x97 45 steps
>>
>>103486536
More steps is always better. We're just coping with 30 because gen times are so long
>>
>>103486552
imagine a hyper lora for hunyuan, like it could run fine at 4 steps, would be game over lol
>>
>>103486537
vae is the image decoder, has nothing to do with the prompt
>>
>>103486544
>>103486552
What's the difference? More clarity?
I do know that for images there is definitely diminishing returns depending on the sampler, and at some point there is no point adding steps.
>>
>>103486570
he said SAE, you can change clip_l for its finetunes
https://huggingface.co/zer0int/CLIP-SAE-ViT-L-14
>>
>>103486552
>>103486576
I think the resolution is much more important than steps, especially for prompt understanding
>>
>>103486576
yes, less artifacts, less of that ai look, specially on the hair, makes movement more fluid, faces better, less wobbling etc.
>>
>>103486537
why does it keep genning them so young? prompt so i can avoid this?
>>
>>103486570
>>103486577
sorry yea i meant the clip not the vae
i havent noticed any improvements that could be more than placebo but it doesnt seem to hurt to have it on so ill keep it on
>>
>>103486576
From a few threads ago
>>103452649
>>103452659
>>103452675
>>
>>103486604
>A beautiful young teenage girl with long blonde hair, wearing a revealing bikini that showcases her ample bosom and toned midriff, lounges on a beach towel by the ocean. The camera movement is Low Angle as she applies sunscreen, her hands slowly rubbing the lotion on her skin. Realistic, Natural lighting, Casual
thanks to the anon who was hosting the prompt rewrite model
>>
>>103486445
Sheesh I don't think I'll be touching local video gen for a while then
>>
>>103486598
and you feel that 40 steps is the moment when you have diminishing returns?
>>
>>103486613
>100 steps
I'm not gonna do that, it's too long already!
>>
If I have a working kijai Hunyuan setup from a week ago is there currently a good reason to pull a newer commit? Like more features or coherence or speed or anything. Or would I just be risking introducing problems for no gain
>>
>>103486549
6 videos per KWh
if electricity is over 20 cents a KWh and you pay for your own electricity, you might as well rent somethng on vast for a similar price per video and gen over twice as fast
>>
>>103486621
i guess, it just takes too damn long to gen anything more. for me it's the time/quality sweet spot. just did a 60 step, didn't look all that better desu so i'm sticking with 40
>>
>>103486637
yeah you can pull, there's more options for the vae now so that if you get some OOM during the decoding, you can decrease some of its parameters
>>
>>103486617
prompt rewrite model?
>>
File: HunyuanVideo_00053.webm (784 KB, 960x544)
784 KB
784 KB WEBM
for me its 50, anything less and it feels like you're sacrificing a ton in coherency but more doesn't add much in realistic scenes
>>
>>103486659
psycho mantis? >>103483910
>>
For 3090s: Underclock your core clock but overclock your memory you can generate up to 40% faster in HunYuan that way.
>>
>>103486653
>OOM during the decoding
yup, pic related is what the dev recommend, uses half the VRAM, solved my OOM problems
>>
>>103486690
what does changing the tile settings affect?
>>
>>103486682
Specific amount in afterburner?
>>
https://github.com/esciron/ComfyUI-HunyuanVideoWrapper-Extended
>Support for any llama model type LLM.
>Support for LLava and mLLava model_type.
>Support for Mistral model_type.
Really interesting
>>
>>103486706
tiles the decoding, uses less vram, it might introduce seems/edges but I'm yet to notice any.
>>
>>103486740
So Hunyuan can both support LLMs and MLLMs?
>>
>>103486740
what other MLLM exist than llama-llava-3-8b? maybe we could go for something a bit bigger but can still fit on a 24gb card
>>
>>103486682
no way. exact amounts please
>>
>>103486740
>>Support for Mistral model_type.
Does nobody understand how this shit works? The text embeddings are the hidden states of the LLM. Anything based on the llama 3 8b base model is going to "work" to some extent. Everything else won't work at all (unless you explicitly train some kind of weird embedding projection adapter layer or something). I am reminded of retards in /lmg/ like a year ago who were convinced you could apply mistral loras to llama models and it would work.
>>
>>103486776
What's a MLLM? I know llama 3.2 has image input, as well as pixtral.
>>
>>103486706
I'm not sure what method it is using, whether its tiling individual frames one by one or tilling them all into smaller pieces. Like top section then middle and so on. But its slower obviously, it might be swapping tiles into system ram in chunks and if you are like me using a lot of swap space on SSD for example if will lag the fuck out of everything for a few seconds, but it prevents OOM on the GPU, smaller values means less vram btw. But they had changed it slightly in the last 2 days and I don't know what its actually doing because I've have read into it much. All I know is it works.
>>
File: 1708242460763857.png (67 KB, 1270x525)
67 KB
67 KB PNG
>>103486682
what values anon?
>>
>posters scared to do their own overclocking
/g/ has fallen
>>
>>103486835
i wouldn't bother with that desu, you might burn out your hardware... void the warranty etc.
>>
>>103486584
Yeah I meant all things being equal.

>>103486598
>>103486613
Oh yeah it's kind of obvious on anime style.
I didn't see it that bad on realistic stuff, but worth launching 50 steps before sleep I guess.
>>
>>103486827
>I am reminded of retards in /lmg/ like a year ago who were convinced you could apply mistral loras to llama models and it would work.
I mean, Hunyuan is using a text encoder it has never seen and it's working all right, we're still waiting for the official encoder though
>>
>>103486850
>void the warranty
Nigga I bought a second hand mining 3090 for $300 I don't give a fuck.
>>
>>103486869
>I didn't see it that bad on realistic stuff
I've seen more merging of humans in 30 steps than 50 I guess
>>
>>103486644
I pay between 12 and 15. And it's fine, not a big deal, I have the server, might as well use it.
>>
>>103486613
More thigh on 30 steps so that one wins
>>
>>103486830
>What's a MLLM?
https://medium.com/@cout.shubham/exploring-multimodal-large-language-models-a-step-forward-in-ai-626918c6a3ec
>>
>>103486850
>void the warranty
how would they ever know?
>>
>>103486835
undervolt to 300W for 90-95% of the same performance
>>
>>103486923
Llm on the vram at the time of the crash snitches on you for overclocking and raping it.
>>
>>103486891
well have a look online then, but every GPU is different, don't come crying heard when you system randomly reboots it self after your screen messes up. You need more voltage as you increase the frequency, but I never don't OCing since years ago. Just look around in reddit i'm sure some will give examples of sucessful overclock on your specific card but no card is the same, even if its the same model, it depends on how well the gpu was made. You might have a good gpu that is stable or you might only get like 10% out of it before it becomes unstable.
>>
>>103486947
*not undervolt, just power profile change
>>
>>103486918
A-anon...
That was a rhetorical question as should have been clear since I named the llms next to answer op. Because
>Multimodal can mean one or more of the following:
and then 3 different options of what it could actually mean tells you it's a shit coined term that doesn't help at all when you need to specify something, and only exists to make ai sloppers sound more sophisticated.
>>
>>103486952
how did we ever went along with large language models and unique id's on our ram sticks and hidden processors inside our processors?
>>
>>103482892
https://files.catbox.moe/d19pqx.webm
>Tiktok dance video. Hatsune Miku is shaking her gigantic tits and huge ass. She's showing off her massive breasts, slim waist, and wide hips. She's jumping around and her boobs jiggle up and down. Her long, blue hair is twirling around as she dances. She's completely naked and her tight pussy and large nipples are in full view. #fyp #fy #foryou #foryoupage
TikTok seems like a relatively good base for videos of women.

>>103486433
~45 minutes if you go for 1280x720, 129 frames, and 50 steps.
>>
>>103486966
To clarify, MLLM isn't a real term. A multimodal LLM with [something] capability is.
>>
>>103486990
>relatively good base for videos of women
can you post an example
>>
>>103486433
Depends on settings.

I think if you're just trying stuff out I'd possibly actually recommend something like 480x288 / 30 steps or the other way around. Gives you about 25 frames per minute.
>>
>>103486872
>Hunyuan is using a text encoder it has never seen and it's working all right
Yeah, because the unreleased text encoder it was trained with is still based on llama 3 8b. That's why they can point us to a random llama 3 8b llava model and it kind of works. But it's not going to work with mistral or any other unrelated model family.
>>
>>103487099
>>103487099
>>103487099
>>103487099
>>
>>103487093
>the unreleased text encoder it was trained with is still based on llama 3 8b
how do you know that?
>>
File: Psycheswings .jpg (119 KB, 984x984)
119 KB
119 KB JPG
>>103482892
Good evening sirs
>>
>>103486682
This retard is trolling, you should do the opposite.
>>
>>103487107
Because otherwise the llava model we're using wouldn't work at all! It works because the hidden state vector space is compatible, because it's based on llama 3 8b, just like the official text encoder must be.

You guys are fucking retards. I will film myself drinking a gallon of horse cum and post it if the official LLM isn't llama 3 8b based. It literally has to be.
>>
>>103487132
someone managed to make JoyCaption work on hunyuan, no errors, and we got an output, the prompt adherance was complete horseshit but it was possible
>>
>>103487108
ugly titcow give me more
>>
>>103487159
JoyCaption takes a pretrained CLIP model, then projects the output of that into the llama 3 8b embedding space. Then feeds that into a llama 3 8b LLM and predicts the text. Once again, the reason that works at all is because it's based on llama 3 8b.
>>
>>103487287
oh ok, then it's a good news if that's still llama3-8b, the size is acceptable for a 24gb card



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.