[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: HunyuanVideo_00235.mp4 (277 KB, 512x320)
277 KB
277 KB MP4
Discussion of Free and Open-Source Diffusion models.

Keys to the Castle Edition

Previous: >>103491402

>UI
Metastable: https://metastable.studio
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI

>Models, LoRAs, & Upscalers
https://civitai.com
https://tensor.art/
https://openmodeldb.info

>Training
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>HunyuanVideo
Comfy: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/
Windows: https://rentry.org/crhcqq54
Training: https://github.com/tdrussell/diffusion-pipe

>Flux
Forge Guide: https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/1050
ComfyUI Guide: https://comfyanonymous.github.io/ComfyUI_examples/flux
DeDistilled Quants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main

>Misc
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Generate Prompt from Image: https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two
Archived: https://rentry.org/sdg-link
Samplers: https://stable-diffusion-art.com/samplers/
Open-Source Digital Art Software: https://krita.org/en/
Txt2Img Plugin: https://kritaaidiffusion.com/
Collagebaker: https://www.befunky.com/create/collage/
Video Collagebaker: https://kdenlive.org/en/

>Neighbo(u)rs
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai

>Texting Neighbo(u)r
>>>/g/lmg
>>
Blessed thread of frenship
>>
Trani free zone
>>
I enjoy these threads.
>>
bog edition
>>
kek >>103492818
>>
File: HunyuanVi45564007 (3).mp4 (638 KB, 960x544)
638 KB
638 KB MP4
reposting the Rapeman intro I made:

https://files.catbox.moe/mc3nqd.mp4

Lyrics:
Rapeman! The crimson savior,
A warrior born to face behavior.
Through the smoke and fire, he stands tall,
A hero answering the city’s call.
Rapeman! The primal force,
Breaking through with relentless intercourse.
No face, no name, just justice's flame,
Rapeman, they’ll remember your name!

Every scar, every fight,
Every rape in the dead of night.
He bears it all, he feels their anal pain,
But through the darkness, he’ll remain.
>>
rapeman i...
>>
>>103495904
Now I want to see the turkish grifter as rapeman kek
>>
HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models
https://openreview.net/forum?id=TwJrTz9cRS
>We propose Hadamard High-Rank Adaptation (HiRA), a parameter-efficient fine-tuning (PEFT) method that enhances the adaptability of Large Language Models (LLMs). While Low-rank Adaptation (LoRA) is widely used to reduce resource demands, its low-rank updates may limit its expressiveness for new tasks. HiRA addresses this by using a Hadamard product to retain high-rank update parameters, improving the model capacity. Empirically, HiRA outperforms LoRA and its variants on several tasks, with extensive ablation studies validating its effectiveness. Our code will be released.
is this the lora killer?
>>
>>103495800
I see loras are here!! nice gonna train a nice one in my 4090 and uoooh ToT the whole night.
>>
>>103496007
>uoooh ToT the whole night
If you train a lora worth sharing make sure to, code is speech after all
>>
>>103496040
drugged up, zoned out, drooling, living in decrepit despair
>>
oh look it's merged https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/pull/72
>>
File: HunyuanVideo_00085.mp4 (121 KB, 320x208)
121 KB
121 KB MP4
I feel like I'm back to 56k modem days
>>
>>103496064
But enough about trani
>>
>>103495932
dunno, how many nsfw and anime hira did they make to show it works better?
>>
>>103495932
>for Large Language Models
>>
>>103496132
the resolution and shortness of the clips is a bit like that, yea
>>
>>103496282
and the waiting lol
>>
>>103496132
I miss the 2000's aesthetic so much dude...
https://www.youtube.com/watch?v=y0zcDFtsw8A
>>
>>103496064
No anon that would make me sad :(

>>103496239
Loras were originally for LLMs too
>>
>>103496303
She has a nice voice
>>
>>103495932
>is this the lora killer?
member b-loras?
>>
in 2025 what can I do with a 3060 12GB and 48GB DDR3 RAM
>>
Can hunyan recognize celebs? Like could I prompt eva green blowing me a kiss?
>>
File: 1709034430907086.png (105 KB, 1685x753)
105 KB
105 KB PNG
now that we can do loras on hunyuan I guess that civitai will add it to that list right? can't wait to see how it'll flurish
>>
>>103496408
Illustrious, pony, ltx, flux.s, sigma and more, plus their lora controlnets and stuff for the most part. Also low res hunyuanvideo or higher resolution flux.d if you're quite patient/batch a few over night.
>>
>>103496409
Hunyuan can do celebs but they have to be pretty famous. I have no idea who Eva green is so I'm confident that hunyuan won't know either
>>
>>103496633
https://www.youtube.com/watch?v=acCDliQ3GbY
>>
>>103496365
Still don't know what a DoRA is
>>
File: fa.webm (224 KB, 272x480)
224 KB
224 KB WEBM
>>103496409
why would she be trained any more than a random bond girl? i'm thinking some porn comic characters would have 1000x more exposure on most datasets
>>
>>103496409
prompt your full name date of birth and watch the magic
>>
>>103496715
Tried that and I got abducted by aliens, they told me about the future of all of this and took me to their home planet.

A year passed, but when I came back here only a few seconds had passed...
>>
>>103496675
i think you just load them like a lora anon, don't worry about it, just download and place in your lora directory
>>
File: 1704047342249971.gif (1.35 MB, 200x336)
1.35 MB
1.35 MB GIF
>>103496715
>prompt your full name date of birth and watch the magic
shit you're right, it got me in there, what the fuck??
Dunno why the video is black and white though I'm not that old lol
>>
Does hunyuan understands chinese prompt better than english?
>>
How good is this python diffusers library? Can a retard (someone who knows python but nothing about AI) implement his own AI GUI with it, or is not possible to reach the performance and features of webui?
>>
>>103496825
if you mean "the" diffusers library that's been used in various of the common UI in the backend
>>
>>103496805
It probably doesn't. When they use the prompt rewrite model one of the instructions is to translate chinese to english.
>>
>>103496805
the image gen variant yes. but not generally the video generator as far as I can tell. maybe it's also partly because of the text encoder we use.
>>
>>103496928
I'm also guessing they might have trained parts of a booru or western ai generated images with tags or some other english training dataset as-is.
>>
>>103496932
>the image gen variant yes. but not generally the video generator as far as I can tell.
how do you know that? we haven't gotten the i2v model yet
>>
>>103496921
Really? There are examples how to generate an image with just a page of code. It almost looks too simple.
Why does webui even use gradio?
>>
>>103496955
>Why does webui even use gradio?
because the average /g/entleman doesnt read code
>>
>>103496825
Yes. New features or models are added slower because it's a professional library and they don't just copy and paste like forge or comfy. Experimentation is more involved, they don't support a plugin ecosystem like forge or comfy, but they are working on that with a modular system similar to comfy. There are already UIs based on Diffusers like InvokeAI or SD.Next. Every UI uses Transformers which is the sibling library of Diffusers.
>>103496955
Gradio is a UI library, also a HF project.
>>
>>103496950
I2v? No - the older imagegen model (HunyuanDit), that one was one where you actually quite possibly wanted Chinese prompts.

>>103496955
>Why does webui even use gradio?
uh... voldy and others wnated to easily have an ui in the webui? gradio is for the buttons and sliders and such
>>
>>103496955
There are two kinds of people, frontend developers who want to dabble in the backend, and backend developers who want to dabble in the frontend. gradio is clearly made by the backend folks.
>>
>>103496928
>It probably doesn't. When they use the prompt rewrite model one of the instructions is to translate chinese to english.
that's weird don't you think? they're a chinese company and they focused on making this model good in english but doesn't care about being it good in chinese, as if they don't want to target their own country or something
>>
>>103497003
it's probably capable of both chinese and english prompting, we don't have that actual model yet tho

also reality is that the best tagged datasources are english. in the end the image/video LLM trainings all sucked less with *booru and so on, kek.
>>
>>103496972
Sorry for being completely ignorant, I'm just trying to get the big picture before possibly getting into details.

>>103496982
So it's Transformers versus Diffusers? What's the role of PyTorch and TensorFlow?

>>103496983
So Gradio is both a toolkit to make webapps and somehow AI specific?
>>
>>103495891
What are Rapeman's superpowers?
>>
>>103497051
Gradio is a python package that lets you make a web UI with just python code (instead of having to write a bunch of HTML and JS for your UI).
It's not that Gradio is specific to AI. It's that AI projects tend to use python. So authors of AI projects use Gradio as an easy way to add a web UI.
>>
>>103497051
>So Gradio is both a toolkit to make webapps and somehow AI specific?
well yes https://github.com/gradio-app/gradio

but frankly any search engine could have told you that. better fire up comfyui and post some gens.
>>
>>103497048
>it's probably capable of both chinese and english prompting, we don't have that actual model yet tho
what do you mean? we got the t2v model
>>
>>103497051
>Transformers versus Diffusers
They're not competing, they're for different types of model. PyTorch and TensorFlow are the underlying machine learning libraries.
>>
>>103497080
Gradio describes itself as
>Gradio: Build Machine Learning Web Apps — in Python
Putting UI controls on a web app doesn't have anything to do with machine learning, so that confuses me.

>>103497099
That doesn't help.

>>103497110
So which part is used to generate images from stable diffusion models? They have to do things like parsing the model file and running the cuda programs.
>>
>>103497125
Both. For Stable Diffusion models Diffusers implements UNet and VAE, CLIP is implemented in Transformers. forge and comfy also use Transformers for CLIP.
>>
Is it true that Macs are really bad for local image and video AI stuff?
They kind of have a lot of Vram shouldn't they be good for the usecase?
>>
>>103497149
Macs got dey unified memories and shiet for dat CPP models aight?
>>
>>103497125
>That doesn't help.
it's starting out with the explanation what Gradio is AND showing what happens if you program it and obviously goes into detail in the linkage. read it as long as needed until you get it.
>>
>>103497149
its CUDA what makes speeds tolerable
>>
>>103497149
Macs used to have Radeon graphics so if it's still like that they are pretty bad.
>>
>>103497158
Yes, that's why I'm asking.
They're Okay for LLMs (not fast but at least you can load large models without a problem) but I've heard supposedly bad things about image and video stuff
>>
>>103497162
this, CUDA is the only reason Nvdia is dominating the market
>>
>>103497101
This is the TE they actually trained: https://github.com/Tencent/HunyuanVideo?tab=readme-ov-file#mllm-text-encoder

>>103497149
Depends on which one, for some they're not that bad. Generally speaking yes, AI is nvidia-dominated which gives it a software advantage too, and the Nvidia GPU are more powerful anyhow.
>>
>>103497182
>This is the TE they actually trained:
oh ok, guess that we'll have to test it out on the API to see if it understands chinese better than english then, wonder why they still haven't released it yet
>>
File: s.webm (512 KB, 272x480)
512 KB
512 KB WEBM
>>103497218
dunno either. probably tuning it more or working on software or something?
>>
>>103495891
I LOVE YOU ANON
>>
>>103497148
So does Diffusers use Transfomers for CLIP?
>>
>>103497149
Macs are struggling with compute-bound tasks which is what image and video generation falls under.
>>
File: ComfyUI_00007_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
Is there any way of knowing if the model you are using was trained for a word in particular? I always wanted to know if a model I am using "knows" what something I am describing is or if I have to describe it in other ways.
>>
>>103497353
If only there was some kind of way we could prompt the model to produce images based on words.
>>
>>103497353
It's mostly a bit more complicated than that even, you might need to find out if a text encoder triggers a strong reaction of sorts in a video/image model.Or where it triggers a reaction vs not having a token in the output image (generate both...).

Pretty sure some wrote heat map something type tools or such for SDXL/Flux but even those aren't making things actually entirely easy to understand many times.
>>
I can smell your loose anus from here trani
>>
Why the fuck did AnythingLLM not include a Gitea data connector? Like what, I am supposed to run local llm and then just push all my data onto github if I can have gitea?
>>
>>103497406
Wrong thread, sorry.
>>
File: HunyuanVideo_12112 (1).mp4 (595 KB, 960x544)
595 KB
595 KB MP4
>>103497057
nobody really knows
>>
>>103497296
Yes
>>
>>103497416
Prepare your butt for RAPE MAN. You won't make the same mistake twice.
>>
>>103497482
oh no
>>
File: 1704670682059367.mp4 (542 KB, 640x400)
542 KB
542 KB MP4
>>103497482
He raped?
>>
>>103497482
>RAPE MAN
as long as it's not pig man that's all right
https://www.youtube.com/watch?v=O2cl1P5HxR0
>>
>>103497510
Being Igor Bogdanoff (1999)
>>
https://www.reddit.com/r/StableDiffusion/comments/1hcwu57/i_installed_comfyui_wsage_attention_in_wsl/
>I Installed ComfyUI w/Sage Attention in WSL
>Generation went up by 2x easily AND didn't have to change Windows environment.
????
>1.30h video
OH COME THE FUCK ONNNNNNNNN
>>
>>103497730
HAHAHA someone's gonna have to generate a 2ldr for you faeggits
>>
>>103497730
I fully expected to see furkan
>>
>>103497730
ok in all seriousness, who's gonna try this? if this is a true 2x I'm ok to watch a fucking movie length out of this dude
>>
>>103497785
>>103497768
>>103497730

Temper expectations, this was probably his workaround to get triton working on windows. If you already have it you're not gonna see the gains.
>>
>>103497797
I don't remember having too much trouble getting triton to work on windows
>>
>>103497797
I don't believe there's a 2x speed difference between spda and sage though
>>
>>103497797
I don't remember ever having triton on windows (i dont have a triton compatible gpu)
>>
>>103497826
you need triton to make sage work though
>>
>>103497822
He's overselling it. It's like a 1/3rd increase.
>>103497826
There's literally a guide in the op.
>>
vramlets running linux, if your system is crapping out when system ram hits 0 or close to that as you start swapping use these nodes for freeing images and latents

https://github.com/ShmuelRonen/ComfyUI-FreeMemory

The wires must pass through them at every connection for it to be effective. You will then not experience system lock up and be able to move your mouse and shit. I have no idea why this shit isn't just fucking built into comfy. Basically when linux hits a threshold it will shit the bed even if you have an entire SSD dedicated for swap, what I think is happening here is because comfyui does not call garbage collection in some way that data is not considered trash and alot more is going on that causes the system bus to get really bogged down and your cpu will spike like crazy, then the inevitable 20 minutes later you decide to terminate the server after like a 5 minute wait to gain access to the terminal.

Then you will coredump and that takes fucking ages, i recommend Disabling automatic core dumps

https://wiki.archlinux.org/title/Core_dump

Saved me a fuck ton of bullshit with needing to restart the damn server after some hours, and fuck that bullshit.

This will also likely enable you to increase block swap with out your system shitting the bed, also higher resolutions or perhaps more frames = more power more fun. My system would hang if I tried higher resolutions and I increased block swapping to avoid OOM for vram, but I think this will cure it, because ram use continues to increase to the point the system becomes unusable, some something shit is going on here. Something isn't freeing up system resources like it should do. But these nodes really did help with that helping my system ram free up enough that things don't lock up.

I'm not good at typing btw, sorry about, but any help I can offer, i'd rather share it.
>>
>>103497730
Just use Linux, if you use Windows you deserve the gimped speed.
>>
>>103497730
>Be lost, noob, clueless, new thing comes along, decide to google how to install
>no definitive guide, only fucking asine ai generate slop videos by the dozen
>decide fuck it, challenge accepted
>3 mins into the video
mfw
>>
File: 1704064663076357.png (205 KB, 1410x988)
205 KB
205 KB PNG
>>103497880
>Just use Linux
I don't think I will
>>103497881
>no definitive guide
? https://rentry.org/crhcqq54
>>
>>103497891
>https://rentry.org/crhcqq54
yes I remember when i was young and could be full of energy to explain everything, but i've watched as they've completely destroyed the easy goto answer, just google it bro. Now I have to either scroll through or watch billions of useless shite that does even explain how the settings works on the new nodes i've some how managed to install. But its ok, man i get it, you younger cunts are lazy and all about just read the install bro...
>>
>>103497867
>I have no idea why this shit isn't just fucking built into comfy
lol
>>
>>103497926
the fuck is this schizo talking about?
>>
>>103497929
watch it buddie i'll hack your grandma's chair life and steal all her mhz
>>
look all i am asking is, when adding new nodes the documentation is provided, not everyone enjoys scrolling through git hub issues for answers.
>>
>>103497830
>>103497854
youve been epically trolled and didnt even realize it you fools, you absolute buffons, possibly pseuds even
>>
>>103497973
Literally nothing useful is built into comfy.
>>
>>103497992
>not everyone enjoys scrolling through git hub issues for answers.
that's whay the fucking guide exist in the first place, we had to look at a lot of github issues to make this shit work in the first place, it's a collection of all the best findings on those issues and they are all in a single page, ready to be read
>>
and I really appreciate peoples generosity and time but I'm talking about when I used google back in the day i'd find and answer in like 3 minutes. These days every nob jocky on earth wants clicks for a quick buck and i can't be arsed listening to thier intro, the whole fuck they speak is annoying to an autist, its just like get to the fucking point! I'lll give you money if you help me but if you are going to waste my fucking time with your bullshit you get nothing.

/rant

Yes i do donate to people that offer something valuable with out the crap.
>>
>>103498102
you don't have to watch that 1.30h video, just use the guide, there's only text in there, zero bullshit, only instructions on how to get it done
>>
>>103498119
No, its doing something, these read outs or your system ram not your swap

Garbage collector: collected 1565 objects.
is likely your swap and that is a lot. just trust me on this, i would not be posting here if I didn't do all this because my system would be shitting all over with lag... It just discards them, thus freeing that space up.
>>
You know a model is good when the vramlets are willing to kill their SSD with all those memory swap and people are willing to watch a 1.30h tutorial on how to make it faster or something kek
>>
>>103498119
>GPU VRAM: Initial usage: 6.95 GB, Final usage: 6.95 GB, Freed: 0.00 GB
this is your hardware status report

>Garbage collector: collected 1565 objects.
This would have been in virtual, thus it reports 0.00GB freed

its hard for me to explain, i have both a comfyui resource monitor and I3 window manager, both report more available ram. And that is what keeps my system running smoother and my sanity also...
>>
>>103498118
i've had it installed for ages... Stop responding to me, jesus...
>>
>>103498212
wait, you think you can make a retarded rant without feedback? you think this place is your blog or something?
>>
i have a PSA to make here
everyone in this thread is on the spectrum that is all
>>
>>103498210
it should freeze a lot less and shorter pauses, you might get a little spike when it starts genning and when it starts the decode but it will not be anywhere near as bad. There are other nodes that do the same thing you might want to look into in the manager, some 8GB card user was using a node called free all models, aye and was using a trick i often use in tricky situations with the decode, batch processing his latents one at a time, then re-batch images before video combine
>>
Are people really falling for the stupid video claiming a 2x increase in speed when all the retard really did was install triton?
>>
>>103498247
I'm both on the autism and ADHD spectrum (probably).
>>
>>103498223
i do yeah i can just say fuck and and walk away for the next 4 hours... It was just an opinion btw. I already posted what i cam to post fuck head now you go sit the fuck down.

>>103497867
>>
>>103498269
maybe the speed increase comes from going to WSL?
>>
>>103498275
Just because right?
>>
>>103498274
>i do yeah i can just say fuck and and walk away for the next 4 hours
and guess what, people can respond the fuck to you and call you a retard when you have retarded takes, isn't this place beautiful?
>>
>>103498279
why not? WSL is really different from CMD, maybe triton is more optimised on linux therefore more optimised on WSL
>>
>>103497891
>I don't think I will
Why not?
>>
>>103498295
have you seen the image? why would I want to support a retard like that? >>103497891
>>
>>103498292
Why don't you test both instead of speculating.
>>
>>103498299
he should've made a speed comparison first before going for the "trust me bro route" and expect us to go for a 1.30h long tutorial
>>
>>103498298
Are you mentally ill? Why don't you care what Bill Gates says? Or the thousands of Indians working for Microsoft? Also with WSL you're already using Linux, just that it runs on a VM. You're already using a shitload of tooling that was developed for Linux anyway.
>>
>>103498308
I haven't watched the tutorial but I bet 90% of it is just getting the linux environment running.
>>
>>103498315
>Also with WSL you're already using Linux,
I'm not using WSL, the guide works fine on cmd
>>
I guarantee the speed boost he got is purely from from sage attn+triton and has nothing to do with WSL. Probably just some retard who couldn't get it functioning on Windows
>>
>>103498328
>Probably just some retard who couldn't get it functioning on Windows
his settings to make it work on WSL is long and complex though, he definitely can follow the guide
>>
>>103498321
Thank you for supporting Bill Gates' agenda.
>>
>>103498343
better than troon's agenda that's for sure, that's why Trump was elected in a 2nd time, people prefer anything than woke communist propaganda
>>
>>103498334
>his settings to make it work on WSL is long and complex though, he definitely can follow the guide

Sometimes you have to be especially retarded to not be able to take the easy path presented before you.
>>
>>103498351
Enough of those troons and communists work for Microsoft. You're retarded for not considering an entire operating system just because a cherry picked screenshot made your poopy brain seethe.
>>
>>103498362
>a cherry picked screenshot
cherry picked? he truly believes that, what the fuck?
>>
So what LoRAs are you training? I've trained 4 so far.
Here's my takeaways

More images are better 30-100 seems to be a good range.
1000-1300 steps seems to be enough
No idea about captioning yet
>>
>>103498368
If you don't want to be retarded, read the political opinions of every single developer of all software you use and stop using that software if you don't like the opinions of one of the authors. If you do that, I might stop thinking that you're a retarded fucking moron.
>>
>>103498374
and training on images translates into video?
>>
>microsoft vs linux war console discussion
BOOOOOOORING, I have a real question there, has someone made a lora porn of hunyuan yet?
>>
File: HunyuanVideo_00247.mp4 (490 KB, 640x400)
490 KB
490 KB MP4
>>103498381
correct
>>
>>103498391
Not even a console war, just plain retardation of the lowest order. Good bait anyway.
>>
>>103498395
lmao, I hope he've seen this
>>
>>103498374
I would be utilising my vast collection of Flux loran datasets but I can't get deepspeed working
>>
File: 0.jpg (475 KB, 1024x1024)
475 KB
475 KB JPG
>>
>>103498422
I know it's a bad time to say this given the discussion, but just use wsl2
It just werks for the training script. Got it up and running within 5 minutes of biting the bullet.

And I'm also the same person who thinks the 2x speed improvement in wsl for hyvid is bullshit.
>>
File: HunyuanVideo_00002.mp4 (686 KB, 544x960)
686 KB
686 KB MP4
>she's holding a sign that reads: i2v waiting room
Will we get functional text with the proper text encoder?
>>
>>103498482
Looks like she's trying to verify for a pornography subreddit.
>>
>>103498482
>Will we get functional text with the proper text encoder?
I think so, there was some API vs local comparison and the text was better on the API
>>
>>103495800
>>
>>103497939
About getting old and and tired and being a cunt mostly was my takeaway.
>>
Guys what if they just... don't release anything more and this is all we get?
>>
>>103498724
desu if this is the end of the road, this isn't too bad, hunyuan t2v is an awesome model already
>>
>>103498724
That's the default assumption for everything at this point
>>
>>103498724
I'm sure we'll be fucked on something, whether it's the MLLM or the i2v model, which one would hurt less?
>>
File: 1726940495104737.png (3.63 MB, 2048x2048)
3.63 MB
3.63 MB PNG
Sup bros
>>
File: 0.webm (265 KB, 850x624)
265 KB
265 KB WEBM
saw artwork I liked, tried to re-create it in hunyuan, felt like I got kinda close
>>
>>103498752
we hack it obviously dur. i want to know what its doing to my prompts, that is what i want to know then i can try intercept it and correct it. Forgive me if I'm retarded because I am but i'm not, its using a MLLM, right so its translating my prompts, their is your limitation... Hence it can't do real porn, because a cucked MLLM will block it, so I want to know how and why and what.

We already know what the video model is capable of.
>>
>>103498907
>because a cucked MLLM will block it
llama3-llava3 (the duck tape encoder) is cucked too, there's no such thing as an uncucked llm model
>>
>>103498752
MLLM>Quants>i2v
>>
>>103498918
>>103498932
why does it need them, sorry i'm not fully clued on these things, why can't it just take prompts?
>>
>>103498947
>why can't it just take prompts?
because the text we write has to be first transformed into numbers, and to do that we need text encoders, it encode your prompt into numbers so that it can do its matrix math shit
>>
what's its api like and what is it being sent? I'm trying to fully grasp if anything I envision is possible or not, or just really fucking hard. Nothing is too hard for me...
>>
>>103498954
>transformed into numbers
right, i'm with you on this, ok i think i could figure it out :-0

trust me...
>>
>>103498967
What's what's API like?
>>
>>103498975
a model only works with numbers anon, your whole computer only works with numbers, so for example if you write "A girl walking down the street", it'll be transformed into a set of numbers by the text encoder that means this sentense for the model
>>
How the fuck did we get LoRA training before gguf quants?
>>
>>103499012
Will we ever get the gguf quants at some point? That's what I'm worried about
>>
>>103499019
GGUF quants are not a closely guarded secret. Just takes a motived person a day or so to figure out what goes where and let it quant.
>>
File: Smith_Tries_To_Relate.png (1.1 MB, 1248x800)
1.1 MB
1.1 MB PNG
>>103498954
>first transformed into numbers, and to do that we need text encoders, it encode your prompt into numbers so that it can do its matrix math shit
https://youtu.be/JrBdYmStZJ4?feature=shared&t=162

mmmm, *brushes documents a side*
Tell Mr Hunyuan, what good is a phone call if you can... not... speak? hm?
*smirks*
>>
>>103499035
I loved that trioly, too bad the film makers became troons though
>>
but in all seriousness we can monitor its i/o and discover trigger words at least and being to build a dictionary.
>>
>>103499051
for the moment it's useless to try something like that, we're working on the wrong text encoder, once they'll release the good one then we'll try to crack the code or something....
>>
>>103499051
such an attack would consist of identifying its header length, its packet length or data string, then begin by sending a random number not already checked against a random set of seeds and see what it's prominent result is and use a tagging model to id the image and then build a database of what each string of these numbers represents. That is my theory on how to hack it, but we don't know what any of these numbers are yet, so it will be random chance.
>>
>>103499073
The sentinels are already being deployed.
>>
File: HunyuanVideo_00007.mp4 (962 KB, 960x544)
962 KB
962 KB MP4
>>103495800
Linux poster. I suck at prompting but I have useful performance data to share. I will also report the results of changing the encoder and CLIP model, along with the Torch Compile soon.
Anyways, you can run 960x544 AND 1280x720@129 frames on a single 3090 and 30-64GB extra RAM. Use sageattention (1 is fine, 2 isn't needed) and blockswap. No need to quant the encoder on 3090.
These prompts use 960x544@129 at 50 steps, guidance=6. flow=7, framerate=24. Prompt 1 and 2 has 20 double blocks swap, Prompt 3 has 10 double blocks swapped.
1: (realistic, sfw) Woman running on treadmill. prompt from >>103484226
https://files.catbox.moe/1ozfsl.mp4
Speed/VRAM/RAM: 2831s(47 minutes)/17.9GB VRAM/30GB ish? pushed to RAM.
2: (Hentai) stroking dick. good motion bad quality:
https://files.catbox.moe/x8x519.mp4
Speed/VRAM/RAM: same as above i believe (lost track of it)
3: (Hentai) stroking dick. bad motion good quality:
https://files.catbox.moe/rgk52h.mp4
Speed/VRAM/RAM: 2638s(44 minutes)/20.6GB VRAM/30GBish? RAM
Extra Experimental Info:
- 960x544@129f might work with just 5 double blocks/0 single blocks offloaded. 23.4GB VRAM + 26.9GB used. might OOM soon though....
- All gpus power limited to 250.
- PCIE 3x1 at 10 double blocks offladed gives 69.46s/it but PCIE 3x4 at 5 double blocks gives 53.32s/it.
- With 4x3090s and 128GB ddr4 ram, it seems you can batch four 544p videos @50 steps in 45mins ish, or 11.25 minutes per video.
- 720x1280p@129f uses 21.5GB VRAM allocated and I think 30-40GB RAM (maybe more?). Takes 6564s(109mins) with 20 double blocks/40 single blocks offloaded and 50 inference steps.

WAIT!! WE ARE SO BACK!!! As I typed this up two of my gpus finished. I'm adding it above. 1st hentai gen is visually impaired but has ACTUAL STROKING MOTION! Second one looks beautiful but little motion. Any suggestions?
>>
https://github.com/Tencent/HunyuanVideo/issues/8
my theory is that they'll release the i2v with the MLLM, because it won't work with the duck tape and they won't have other choice but to release it
>>
>>103499149
Google gemini helps make good work safe prompts anon, just give it the custom prompt template and what you want. Be sure it understands things like camera movement and cut scenes etc.
>>
>>103499149
>2 isn't needed
2 uses a new method to optimize the KV cache, so you get less memory usage with it
>>
>>103499151
How many days are we past day zero now? I think we should just be thankful and wait and see, it would be disrespectful to try and mess with their shit this early.
>>
>>103499159
>>103499160
thanks for the tips lads
>>
>>103499226
yeah it really works mate, just get into a nice conversation with that thing, it will make you a complex prompt on what you want. You will see the results are good. I suck at long complex prompts also, i should get to learning it, if you drag from your prompt node, i forget what its called, some custom prompt template it will be select able, just spawn it and copy paste that somewhere. I don't know what that node does, perhaps its a way to change the default directives of the MLLM or someshit. But any copy that out into a note node so you have a template to how a prompt should be structured for a good video.

But i'd deleted that custom prompt node because I don't know if it effects the prompt or not.
>>
https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers/discussions/2
>i was testing it parallel to llava-hf/llava-1.5-7b-hf, and noticed that the img-to-text wasn't performing as well as the llava-hf/llava-1.5-7b-hf.
What if we replace our duck tape with llava-1.5-7b
>>
anybody got any good Christmas card prompts?
>>
>>103499286
i really need to learn all this stuff, I'm being a lazy fucking cunt and i do hate myself for that, i'm not a stupid guy i could have a look at what and how etc and devise a strat. if you're talking about hunyuan that is.
>>
>>103499317
>i'm not a stupid guy i could have a look at what and how etc and devise a strat. if you're talking about hunyuan that is.
there's already a github that lets you choose any duck tape you want on kijai's node, if this what you wanted to do
https://github.com/esciron/ComfyUI-HunyuanVideoWrapper-Extended
>>
>>103499286
So let me get this straight, ok. its generating numbers from tokens that trigger or activate the neural pathways so to speak of a neural network which then causes it to infer the random noise. So what came first (chicken or egg) it had to be the language models first right? And so the image and video models are like a layer on top? I'm I correct in thinking like this so far?
>>
>>103499307
just add "Christmas"
>>
>>103499332
i will look at that also thanks.
>>
>>103499332
>>103499286
all right I'll try it
https://huggingface.co/llava-hf/llava-1.5-7b-hf/tree/main
>>
>>103499342
that's too complicated for my smooth brain, the only thing I understood is that it's more efficient than a regular text encoder like t5
https://github.com/Tencent/HunyuanVideo?tab=readme-ov-file#mllm-text-encoder
>>
>New commits
>rf inversion fixes
Lame
>>
>MMLM is keeping us back from true nirvana
Come on guys. It can't be THAT big of a boost over a regular mmlm
>>
>>103499372
what I know is that llama-llava isn't really good at prompt adherance, and the API using the official text encoder unerstands my prompt way better, at some point I suppose they'll release their HunyuanMLLM but we can pass the time by finding a better duck tape, I'll come back with some results after trying llava 1.5b
>>
>>103499362
i'm getting the gist of it anon, the language models act as translators and thus filters, very clever NOT! because someone could bypass circumvent, unless they built in censors but its intriguing to think about *keeps falling backwards*
>>
>>103499391
>unless they built in censors
there's none, Hunyuan is completly uncensored
>>
>>103499368
yeah, never liked rf-inversion, this shit doesn't work that well
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/131
he should focus for gguf quants instead of going for those fucking gimmiks
>>
>>103499394
yeah then its the text encoders, i don't a little with them or LLM local and they are pain in the neck but can have the primary system changed, and so they must be taught to accept user input and produce a prompt. There is a whole general on this subject on /g/ but I don't know if I'm barking up the right tree or not. The problem is it require a lot of vram to get these things the way you want because you need context and the bigger that context the more vram. But I don't even know if that is how it works with regards to interfacing with SD models or video or what ever.
>>
>>103499452
yeah, I'm a /lmg/ fag aswell, and yeah the llm models we're using are really uncensored, but when it's on decoder mode or whatever, looks like there's nothing we can do to trigger their goody2 mode, which is a good thing really
>>
>>103499452
>>103499473
>llm models we're using are really uncensored
*censored, my b
>>
and if you can't use this model commercially then why would even bother? I never thought it till now actually. can it be used commercially or not?
>>
so what are you creeps using the HY lora for?
>>
File: 1721366974948600.png (569 KB, 3454x1680)
569 KB
569 KB PNG
>>103499354
>all right I'll try it
>>103499332
>https://github.com/esciron/ComfyUI-HunyuanVideoWrapper-Extended
>Support for LLava and mLLama model_type.
my fucking ass
>>
>>103499518
>the HY lora
which one?
>>
>>103499534
the one people are training privately
>>
>>103499526
well if the clip model and precision right for that? I don't know I'm just asking.
>>
>>103499553
you won't get an answer, it's private after all kek
>>
>>103499553

I wanna see the kino videos that people are making of their oneitis too anon, because that's exactly what I plan on doing
>>
>>103499560
went for fp16, still got the same error
>>
>>103499577
https://huggingface.co/models?search=llava-llama
maybe there's a better version of llama-llava, that would be safer to load it's the same exact architecture
>>
>>103499565
you what sorry? Are you a fed by chance? Don't worry we taught younger anons well, very well... You won't be seeing shit mate, waste of tax payers money you. trying to get anons to do wrong things again with ai officer?
>>
I dread to think what that anon is making poor Furkan do
>>
>>103499617
I'm still waiting for someone to use it as Rapeman
>>
>>103499518
>>103499553
>>103499561
can you explain this like I just came into the discussion
>>
>>103499623
we can make loras with Hunyuan now >>103497510
>>
File: 0.jpg (496 KB, 896x1152)
496 KB
496 KB JPG
>>
>>103499632
oh i thought there was some kind of secret code that only a few people had, not that the loras themselves are something kept private
im retard
>>
anons beware feds are posting shit to actively encourage you, do not fall for it. They post and then delete their own posts to make it like its serious, its not. they post images and videos of younger girls to encourage you to do the same and you slip up and make it too spicy, this justifies their pay checks...
>>
>>103499660
>They post and then delete their own posts to make it like its serious
it's more like the jannies are doing their job of removing this shit in the first place
>>
they have been tasked by government or others to shutdown AI.
>>
>>103499667
this is probably one of them, oh joy here we good again... guess I will talk another 4 hour time out because i don't care retard at all. Oh you do not like it, then stop posting faggot.
>>
Golden noise SDXL works, noticeable difference even for Illustrious, so this isn't placebo.
But what about Step-aware Preference Optimization / SPO lora https://civitai.com/models/510261 ?
Since I feel it might be placebo that means it is? Testing with Illustrious again.
>>
File: Furkan.png (641 KB, 1046x1164)
641 KB
641 KB PNG
He hasn't posted anything related to image/video AI in months. I doubt he's even aware of HunyuanVideo.
>>
>>103499634
Nice
>>
>>103499682
>Golden noise SDXL works,
you have to keep training the model to use golden noise right?
>>
>>103499682
>Golden noise
im more of a perlin or fractal noise man myself
>SPO
on n00b it functions, i stopped using it desu
>>
>>103499687
>I doubt he's even aware of HunyuanVideo.
he keeps posting random ass questions on github, there's no chance he hasn't heard of hunyuan
>>
File: 1706076088866272.png (52 KB, 462x407)
52 KB
52 KB PNG
https://arxiv.org/abs/2412.07730
>An 8.7B model with 512 resolution achieves 83.1 on VBench T2V, surpassing both leading open and closed-source models like CogVideoX-5B, Pika, Kling, and Gen-3. The same-sized model also achieves a state-of-the-art result of 90.1 on VBench I2V task at 512 resolution. By providing a transparent and extensible recipe for building cutting-edge video generation models, we aim to empower future research and accelerate progress toward more versatile and reliable video generation solutions.
>transparent
>doesn't share the weights publicly
yeah right
>>
>>103499695
You have to train "something" for a particular model to make the thing perform in the best way. But like I said I do see a difference using the provided SDXL model with illust and illust-based. Would be cool if somebody did make it for illust-based.
>>103499699
Okay, guess I'll stare at the pics with and without SPO some more. There is a subtle difference but it's like give or take a step of denoising, nothing drastic.
>>
>>103499723
>ctrl+f hunyuan
>zero results
>>
>>103499723
How can CogVideoX achieve those scores? this model fucking sucks, they need harder mememarks
>>
>>103499723
let me guess, an exaggerated paper of with cherry picked results only to lure investors with some sham ai startup business
>>
File: HunyuanVideo_00248.mp4 (384 KB, 640x400)
384 KB
384 KB MP4
>>103499687
That's where you are wrong. He is aware of the model but doesn't seem to be aware of the wrapper we use. I don't know how he missed it.

The guy who wrote the training script for the LoRA is an anon so I am looking forward to seeing how he balances dealing with furk while also maintaining a veneer of professionality.
>>
File: 00006-3844423913.png (2.72 MB, 1144x2008)
2.72 MB
2.72 MB PNG
Anyone have any idea what the color strip on the side means? This is a very botched (but funny) flux img2img, but i was just surprised by that strip appearing.
>>
https://youtu.be/UD_pcB4AzR0?t=368
imagine you could do something like this on Hunyuan
>Transform this woman that's on the video into another woman
and boom, it only changes the woman and nothing else surrounding her
>>
>>103499817
it means they fucked up the model
the team is made up of former SAI employees afterall...
>>
File: HunyuanVideo_00023.mp4 (750 KB, 1088x960)
750 KB
750 KB MP4
>>103499819
rf-inversion
>>
>>103499833
it doesn't work that well though, and it's changing the background behind her, here we're talking about an editing process that's only changing the woman and nothing else, like you would do when doing some precise masking on i2i
>>
File: 00004-4019764512.png (2.73 MB, 1144x2008)
2.73 MB
2.73 MB PNG
>>103499831
It's just weird, because the nightmare fuel i made before this one didn't have it, but after that pic suddenly the next one had it too. Wondered if it was a memory error or something. I'd understand if it happened all the time.
>>
>>103499831
>the team is made up of former SAI employees afterall
you're talking about BFL?
>>
File: Capture.jpg (39 KB, 1065x360)
39 KB
39 KB JPG
>>103499811
Here he is pestering on the official Hunyuan repo for a Gradio demo
PhD Computer Engineer btw
>>
>>103499861
I would assume anon is, because that's who made flux.
>>
>>103499883
I don't get those vertical glitches on Flux, the fuck is this anon doing kek
>>
>>103499887
NTA it's happened to me at least once and ive seen it on other anons outputs
>>
>>103499880
>PhD Computer Engineer btw
is he working at all? he spends way too much time on github to have a job
>>
The "original" story with Fagkan was he pilfered code anon posted ITT, right? I believe that anon also posted it to Reddit (after posting here). Trying to find it in the archive but no luck.
>>
File: 00007-4158100966.png (2.78 MB, 1144x2008)
2.78 MB
2.78 MB PNG
>>103499887
I have gotten it today for the first time, I don't know what I did, that's why I'm asking. I've never seen this before. Here's the second one that had it, right after the first, same settings except different lora and changed the triggerwords. But I did several variations of this image before this without this issue on those same settings. It's a mystery.
>>
>>103499880
>Hello sir please do the needful and implement up the feature
kek
>>
>>103499931
show us a screen of your workflow maybe?
>>
>>103499931
put jpeg artifact in the negs?
>>
File: HunyuanVideo_00178.mp4 (479 KB, 1104x632)
479 KB
479 KB MP4
>>
>>103499939
that guy is turk but he's acting more like an indian than a real indian when you think about it kek
>>
>>103499921
I dont remember all the details but I think someone wrote a guide how to set up joycaption locally here and Furkan was selling it as his own on patreon
>>
>>103498374
I'm gonna wait for kohyo's implementation, it probably has a better shot with my meagre vram
>>
Does anyone remember a software called DreamTime that was like an open source version of the first deepfake software that went viral? You could upload a girl in bikini and it would undress her and you could even tweak the tits size. Shit was so cash. Of course github deleted it, so now im paranoid of downloading any links. Was all of this deleted off the internet? It was just easy as fuck, all of this new shit seems too convoluted for anyone that has things to do.
>>
File: HunyuanVideo_00181.mp4 (532 KB, 632x1104)
532 KB
532 KB MP4
>>
>>103499941
I was using forge. I re-generated the image, it's there again.
>>103499944
I'd have to use flux dedistilled, flux doesn't have negatives. Waiting three minutes for an image this size is stupid enough, waiting for 6+ on the dedistilled is worse.
>>
>>103499947
What is the best top of the line software for NSFW faceswapping?
From what I heard Rope Pearl is the best but it has a NSFW filter. Theres Rope Unleashed which supposedly has no filter but the results are not as accurate.
>>
>>103500030
>I'd have to use flux dedistilled, flux doesn't have negatives.
yes it can, if you go for anti burner CFGs
https://www.reddit.com/r/StableDiffusion/comments/1eza71h/four_methods_to_run_flux_at_cfg_1/
>>
>>103500039
Oh interesting. Thank you anon I'll look into this.
>>
File: 1708829697930144.png (264 KB, 1806x467)
264 KB
264 KB PNG
>>103500030
the fuck is that name? it's a finetune or something? and don't use nf4 dude, Q4_0 has the same size and gives better quality
>>
>anti burner CFGs
biggest flux cope
>>
Ahh the deep fake face swap jeets have arrived
>>
File: 1730455783864405.png (591 KB, 673x680)
591 KB
591 KB PNG
>>103500056
they fucking glow if you ask me
>hello saar, please do the needful and give us tool to do nasty things saar
>>
>>103500050
https://civitai.com/models/638187?modelVersionId=713648
I don't control the naming schemes of madmen. Half the loras i download are named shit like maximar1paw-000003_v2_epoch_5
>>
>>103500067
>nf4
>hyper meme merge
no wonder you got glitches, you're doing anything to break flux dev at this point kek
>>
>>103500063
Indeed. In fact these threads have been glowing a lot since Hunyuan dropped, curious
>>
>>103500100
Stop nothing things!
>>
File: 00010-553544167.png (2.77 MB, 1144x2008)
2.77 MB
2.77 MB PNG
>>103500073
I've been using this since flux came out without error. I literally picked this only because of an anon recommendation shortly after flux came out, and just never changed because until today, it worked.
Oh well, next time I want to change an anime picture into 90s CGI I'll use Q4.

To be clear, I'm not using the hyper version, I'm using the regular one, that page has multiple versions on it. I think If I was using Hyper I'd be meant to be using less steps.
>>
>>103499880
He actually baffles me. Like a gradio interface is nothing. I don't do anything programming related for work and even I can set up a gradio interface.
>>
File: HunyuanVideo_00184.mp4 (467 KB, 632x1104)
467 KB
467 KB MP4
>>103500038
the best method is not being a lazy bum and train your own loras
>>
We will never rid ourselves of the sameface
>>
>>103500189
You can train loras for video models now?
>>
File: wwu.jpg (318 KB, 748x1056)
318 KB
318 KB JPG
https://www.youtube.com/watch?v=OMOGaugKpzs
https://www.youtube.com/watch?v=56hqrlQxMMI

<3
>>
File: HunyuanVideo_00267.mp4 (248 KB, 640x400)
248 KB
248 KB MP4
>>103500225
Kinda?
>>
>>103500245
How?
>>
File: HunyuanVideo_00268.mp4 (287 KB, 640x400)
287 KB
287 KB MP4
>>103500260
https://github.com/tdrussell/diffusion-pipe
>>
File: HunyuanVideo_00269.mp4 (334 KB, 640x400)
334 KB
334 KB MP4
>>
>>103500225
i find it hard to believe people do not understand the fact you can throw a bought of latent's and 100% de noise and not heavily influence the outcome of the result lora or not...
>>
>>103500289
>bought
bunch
>>
you don't even need lora, you just need a fucking brain.
>>
File: HunyuanVideo_00270.mp4 (155 KB, 640x400)
155 KB
155 KB MP4
This is not what I wanted at all but lol.
>>
the donyuan of loras, what a legend
>>
>>103500304
>>
its a video model ffs, so then just pack 30 frames of what you want and fire it into it, describe the person and boom.
>>
>>103500289
I'm not asking whether or not you can achieve a given result, because the video is proof such a result is possible. I don't need to ask "wow can I make this thing that has already been made and clearly exists?" I'm asking if X thing can be applied to Y thing.
If I asked "is it possible to use a pencil to trace an image" would you assume I was asking "is it possible to copy an image, by any means, no matter what that means is?"
>>
i'm going to bed
>>
File: HunyuanVideo_00271.mp4 (242 KB, 640x400)
242 KB
242 KB MP4
>The helmet was supposed to come off an reveal a bog
>>
>>103500307
他拉了?
>>
>>103500329
Dump it
>>
File: HunyuanVideo_00272.mp4 (235 KB, 640x400)
235 KB
235 KB MP4
>>
>>103500169
Then you are smarter than all the computer scientists in all of Turkey
>>
>>103500351
Did you test concept bleeding yet with loras?
>>
File: HunyuanVideo_00273.mp4 (139 KB, 640x400)
139 KB
139 KB MP4
>>103500371
It bleeds heavily.
>>
>>103498770
would
>>
>>103500394
Damn, expected tho
>>
File: HunyuanVideo_00274.mp4 (454 KB, 640x400)
454 KB
454 KB MP4
I don't know what I want to train next. Bog and Furk were the two things on my list and now they're done. I got some porn and shit on the backend but that's not really fun to post.
>>
File: HunyuanVideo_00275.mp4 (219 KB, 640x400)
219 KB
219 KB MP4
>>
>>103500441
Follow your heart anon. Your heart says you need CIA and Bane.
>>
>>103500441
What do you think would happen if you train it on say a collection from a specific artist rather than a face? Obviously a famous director is the first that comes to mind, but I'm more thinking along the lines of traditional paintings.
Or something like a regular style lora.
>>
File: bogsylvania.webm (211 KB, 720x480)
211 KB
211 KB WEBM
>>103500470
holy shit finally
now you know what you must do
>>
File: HunyuanVideo_00276.mp4 (298 KB, 640x400)
298 KB
298 KB MP4
>>
>>103500501
Cmon make the girl Turkish too and give her a beard
>>
>>103500512
I'd probably need to crank up the LoRA strength to the point of deep frying it.
>>
File: HunyuanVideo_00277.mp4 (303 KB, 640x400)
303 KB
303 KB MP4
Concept bleed isnt too bad so long as the other concept is well established, ie miku.
>>
>>103500501 >>103500532
is that a decent lora... so soon?
>>
>>103500484
Is this idea retarded
>>
>>103500545
it's the test lora, v3 I think, it was made last night through images alone
>>
>>103500569
No nobody has ever attempted to put an artists work into the form of a LoRA before. You're the first one.
>>
>>103500576
No ones done it with a video LoRA before, correct.
>>
>>103500545
https://gofile.io/d/1MU6lB

Feel free to test it if you want. No trigger words or anything. Just like 30 images of his face and no captions for 1000 steps.

I set the LoRA to around 1.2 strength when I run it.
>>
>>103500574
> it was made last night through images alone
few attempts, worked with images rather than video? even better.
>>
I will buy a 5090 just for Hunyuan.
Convince me not to buy it.
>>
>>103500589
More vram is never a bad idea.
>>
File: HunyuanVideo_00278.mp4 (184 KB, 640x400)
184 KB
184 KB MP4
>>103500496
Lemme try again lol.
>>
File: HunyuanVideo_00279.mp4 (247 KB, 640x400)
247 KB
247 KB MP4
God damn it lol
>>
>>103500589
You need to gen for 4 hours a day for 2 years to justify a 5090 just for hunyuan
If you're on an amd or 1000 series shitcard or a pedo this is a fantastic idea but otherwise just rent it or a 4090 for a dollar an hour, who knows what will exist in July 2025
>>
File: HunyuanVideo_00280.mp4 (287 KB, 640x400)
287 KB
287 KB MP4
>>
and remember the future of AI isn't GPUs but NPUs (neural processing units) so in two years we might have discrete NPUs 10x cheaper and 10x faster at inference than SOTA GPUs now
>>
>>103500607
>>103500614
>>103500632
lmao hunyuan is spiting you directly with this guy
>>
>>103500614
borderline incomprehensible meme value i need to invest
>>
>>103500441
>I don't know what I want to train next.
Unfortunately you can only do images because I would have asked for some animation style lora like claymation or Lego stop motion from YouTube videos etc
Would probably need captioning
>>
File: HunyuanVideo_00281.mp4 (215 KB, 640x400)
215 KB
215 KB MP4
>>
Can hunyan not do anime style?
>>
>>103500642
so enterprise graphics accelerators?
>>
>>103500673
check previous
>>
>>103500642
We live in a world where companies intentionally develop and maintain expensive and clunky hardware for the expressed purpose of making sure consumers rely on them to maintain it. These NPUs may as well be cars that run on water because the moment they get close to entering consumer hands the factory that makes them will mysteriously blow up.
>>
File: HunyuanVideo_00283.mp4 (272 KB, 640x400)
272 KB
272 KB MP4
>>
File deleted.
>>103500752
this isn't pizza dough karate
>>
>>103500767
You posted the png.
>>
File: HyVid_00038_.webm (1.81 MB, 960x544)
1.81 MB
1.81 MB WEBM
>>103500767
fuck. wrong file
>>
File: 1709386156620827.png (1.56 MB, 946x946)
1.56 MB
1.56 MB PNG
>>103495800
What's the use case for parquet datasets?
>>
File: HunyuanVideo_00284.mp4 (140 KB, 640x400)
140 KB
140 KB MP4
Prompt?
>>
throw some rutkowski in it ffs
>>
File: f.webm (204 KB, 272x480)
204 KB
204 KB WEBM
>>103500581
thanks anon! gonna see if I can gen him
>>
File: HunyuanVideo_00285.mp4 (244 KB, 640x400)
244 KB
244 KB MP4
>>
>>103499012
>How the fuck did we get LoRA training before gguf quants?
That's how it's been thus far, anon
>>
>>103500834
this is a lora right? is that why it keeps skipping?
>>
>>103500849
It is a LoRA yes. Not sure if that's why it skips though, but it probably is desu. Needs captions. The porn shit I captioned doesn't skip.
>>
File: AnimateDiff_00097_.webm (3.34 MB, 960x544)
3.34 MB
3.34 MB WEBM
>>103500859
if your framefrate isn't matching your training number of frames, it will ghost or skip like that. reminds me of the loras I used to train when there wasn't any vid models
>>
>>103500879
>if your framefrate isn't matching your training number of frames
Now I gotta do math??
>>
>>103500879
>stroke victim vtuber and fat asian man in the background

kino
>>
>>103500894
>framefrate
sorry, meant number of frames.
>>
>>103500905
I inferred. So if we are all choosing 24fps, that means there are some frame numbers we cant use if we want to avoid ghosting?
>>
File: HunyuanVideo_00286.mp4 (197 KB, 960x544)
197 KB
197 KB MP4
>>
how do hunyung Lora? how mayli into video?
>>
>>103500939
https://github.com/tdrussell/diffusion-pipe/commits/main/
>>
>>103500939
in op as well
>>
>>103500879
>the loras I used to train
What the before or after your once in a lifetime opportunity to blog post about visiting a lactating whore?
>>
>>103500918
if you inference for 3 second clips it would be ~73 frames right? so the dataset vids should be that many for training. less would have skips and more wouldn't inference out the whole clip

>>103500974
man we get it, you are a no life pathetic loser. cry piss and shit about my life more kek
>>
>>103500985
>vids
bruh I use still images for all my training.
>>
>>103500992
well that explains so much more kek
>>
>>103500501
miku would never have tits like that
>>
>>103500995
Are you the guy that wrote the script for training?
>>
File: HunyuanVideo_00057.mp4 (347 KB, 960x544)
347 KB
347 KB MP4
Anime hours
>>
>>103501010
no but loras are notoriously destructive
>>
>>103501021
Okay so of the four LoRAs I've seen so far, I've trained four of them. Why are you pretending to be so sure that the framerate and ghosting have anything to do with each other?
>>
>>103501028
>framerate and ghosting
frame count not rate. it will hallucinate the in-betweens and has no sense of duration if you use singular images for anime. the realistic motion training will be in the prediction which doesn't work for anime
>>
>>103501059
Ok sure, go train a LoRA on some anime clips and figure out if that's what's causing it.
>>
>>103501076
I'm busy inferencing and I need a small amount of vram to test rendering some crap in the c++ app. I don't have the time to curate a dataset on top of all the other shit I have to do. you try doing clips instead
>>
File: 007419.png (2.68 MB, 1328x1944)
2.68 MB
2.68 MB PNG
>>
>>103501089
>you try doing clips instead
I am also using my GPU. I don't want to spend time training a LoRA I don't want to test a hypothesis that might be bunk.
>>
File: gigakek.gif (170 KB, 360x346)
170 KB
170 KB GIF
>>103501028
>Okay so of the four LoRAs I've seen so far, I've trained four of them. Why are you pretending to be so sure that the framerate and ghosting have anything to do with each other?
>>
>>103501106
I don't get it. the guy who trained dozens of models prior has less experience than a guy who trained four loras and can't draw any conclusions?
>>
>>103501126
schizo anon isn't very smart
>>
File: HunyuanVideo_00002.mp4 (488 KB, 960x544)
488 KB
488 KB MP4
What are we arguing about
>>
>>103501131
training clips over images. hmmm I wonder which would be better for capturing motion hmmmmm
>>
>>103501140
You're making all kinds of crazy assumption dude.

1. Look at the issues on the github. Initial results from the measly 512x512x33f videos that fit on 24gb of vram do not seem to be turning out well
2. Hyvid itself was trained on still images first before video was added
3. We can demonstrably see still images in the LoRA dataset translate to the subject in the video

Am I saying that training on video data is impossible? No. Not at all, but your assumption that the number of frames and any skipping being linked was pulled out of your ass.
>>
>>103501163
1. just look at the op vid. it demonstrates the bad parts of training on images clearly
2. the range of motion is compromised in all your vids so far.
>>
>>103501175
Yeah it was completely baked. I posted it to give an overtrained example.
All of them?
>>
>>103496049
couldn't play with it, but I got datasets already
>>
>>103501183
pretty much all of the anime ones. the grifter ones work a little better since it's mostly trained on realistic
>>
>>103501193
>since it's mostly trained on realistic
You don't know why it turned out better. Why are you so sure? That's what's annoying me right now.
>>
File: 007430.png (2.59 MB, 1328x1944)
2.59 MB
2.59 MB PNG
>>
File: 20230301220316.webm (405 KB, 512x512)
405 KB
405 KB WEBM
>>103501203
>You don't know why it turned out better.
yes I do because I've seen it thousands of times before with anidiff, prompt travel, loopbacks, interpolations and training models specifically for those use cases. I trained foundational 3d models too and they also require basically photogrammetry data. also sorry it took so long to reply, I crashed four fucking times because these goddamn nodes can't offload memory properly holy shit
>>
>>103501309
If you say so, but I'm not gonna be the one to test it.
>>
File: 1691975528654994.png (227 KB, 589x767)
227 KB
227 KB PNG
>>103501322
then at least have a counterpoint before you start calling people's observations bogus doofus
>>
>>103501341
I did and you tried to squash them with your resume. Like dude, I'm not convinced it's entirely to do with the number of frames in the training data. Take a look at the other anime from hyvid. It's all kind of stiff.
>>
File: HunhuyanVideo_00030.mp4 (530 KB, 512x400)
530 KB
530 KB MP4
>>
>>103500939
>how mayli into video?
Real shit?
>>
>>103501131
Can this model do the thing with the colorful background streaming past whilst the character remains still to save animation budget but give the illusion of motion?
>>
File: AnimateDiff_00198_.webm (1.08 MB, 960x544)
1.08 MB
1.08 MB WEBM
>>103501350
>I did and you tried to squash them with your resume
for a good reason. I spent three years understanding the relationship between images, temporal models and 3d models

>Like dude, I'm not convinced it's entirely to do with the number of frames in the training data
you can;t improve motion estimation with still images, it reduces the range of motion unless the model is heavily trained on certain videos. in this case, mostly realistic footage probably scraped off youtube

>Take a look at the other anime from hyvid. It's all kind of stiff.
I don't think people are trying hard enough. I've seen enough good ones to know it's possible to do something more
>>
>>103501385
I would be happy to be shown what a good dataset for hyvid looks like and I'll leave it there.
>>
>>103501397
>I would be happy to be shown what a good dataset for hyvid looks like
cool. don't have the time because takes way too long and I'm passing out. gn
>>
>resident pedophile starts throwing a tantrum when challenged
>>
>on images only, in well under 24GB of VRAM
how well under
>>
>>103501604
20-22gb
18 one day (lol)
>>
File: ai9.jpg (369 KB, 964x1025)
369 KB
369 KB JPG
Guys can we take a moment to appreciate how far we've come?
>>
>>103501671
No, I'd rather complain and wait for the next model
>>
>>103501671
considering you can't do that on any modern base without lora copium, nah.
>>
>>103501385
>I spent three years understanding the relationship between images, temporal models and 3d models
>>
File: ai5.png (357 KB, 764x780)
357 KB
357 KB PNG
>>103501682
fuck we were good at prompting back then
>>
>>103501671
for every step forward we take two steps back in sovl
>>
>>103501671
imagine one day they just let us have that model
>>
File: ai14.jpg (1.74 MB, 1966x1958)
1.74 MB
1.74 MB JPG
fuck I saved heaps of these, thank god, these are great
>>
File: ai16.png (749 KB, 588x643)
749 KB
749 KB PNG
>>
>>103501696
unironically this is a better greta than Flux can do
>>
>>103501724
>>103501728
>has a bigger and more diverse dataset than flux
man, what the FUCK!
>>
File: ai15.png (1.84 MB, 911x921)
1.84 MB
1.84 MB PNG
>>
File: ai12.jpg (190 KB, 779x834)
190 KB
190 KB JPG
>>
File: ai10.png (479 KB, 765x777)
479 KB
479 KB PNG
>>
File: 1689706738482004.png (1.1 MB, 1024x976)
1.1 MB
1.1 MB PNG
>>103501671
LK
>>
File: ai11.png (1.37 MB, 973x1056)
1.37 MB
1.37 MB PNG
>>
>>103501731
I said on day 1 that Flux was garbage the moment I tried it out against Dall-E 3. No character recognition, forced plastic aesthetic, no style recognition. Datasets have regressed thanks to local self-sabotage and synthetic captions erasing copyright. Data is king, and recent models have absolutely lobotomized their datasets.
>>
Is what animanon said about anime datasets true? Should I start collecting anime videos?
>>
File: ai6.png (1.03 MB, 769x834)
1.03 MB
1.03 MB PNG
>>
File: HunyuanVideo_00001.mp4 (534 KB, 960x544)
534 KB
534 KB MP4
Page 9, you know what they say about page 9
>>
>>103502003
give her armpit hair
>>
>>103502003
paggee ninny....?
oh, page nine
>>
This is the end of /ldg/
>>
should have given her armpit hair
>>
>>103502048
yeah im up at the ass crack of midnight genning fat ugly ginger girls
its so over
>>
>>103501766
>forced plastic aesthetic
not quite true
>No character/style recognition
true
>>
>>103502073
May I see it?
>>
File: 00003-4125978743.png (2.75 MB, 1536x1536)
2.75 MB
2.75 MB PNG
>>103502077
sure since realism engine is aids and can't understand my prompt this is the most sfw one i've got so far
gonna have to go pony realism or redownload lustify which is barely any better since it at least followed my prompt last time..
>>
>>103502086
Wait, it's not moving.
>>
>>103498395
Im going to make gore videos of this grifty third wordler
>>
>>103502089
please don't hit me with that, i'm saving for a gpu that can do vidgen.
>>
>>103501848
I've seen uglier.
>>
File: AnimateDiff_00091.webm (170 KB, 512x288)
170 KB
170 KB WEBM
>>
>>103502184
is that real?
>>
File: HunyuanVideo_00302.mp4 (149 KB, 512x320)
149 KB
149 KB MP4
>>
>>103499634
I love this.
>>
File: 1658418596829251.jpg (368 KB, 1170x2532)
368 KB
368 KB JPG
>>103501724
>>103501671
ah... i remember the old days...
>>
File: ai7.png (569 KB, 768x833)
569 KB
569 KB PNG
>>103502255
a classic
>>
File: awegvagvae.png (721 KB, 754x634)
721 KB
721 KB PNG
>>103502255
great you guys are making me wanna watch the whole oneyplays a.i playlist from back during a.i dungeon and that discord bot im assuming used 1.5.
i remember fucking with a greta gen in late 2023 but i have no clue were that shit is on my drive (or the sfw stuff for that matter)
>>
>>103501809
Nothing he says is true.
>>
File: HunyuanVideo_00303.mp4 (353 KB, 720x480)
353 KB
353 KB MP4
>>
File: 1664485655111284.webm (331 KB, 512x512)
331 KB
331 KB WEBM
>>103501671
This is what a billion dollar company was doing not long ago.
>>
File: 1632914365228.webm (2.68 MB, 512x512)
2.68 MB
2.68 MB WEBM
>>103501671
and I forget what this weird morph thing was
>>
File: 1672931052783648.webm (2.87 MB, 1280x768)
2.87 MB
2.87 MB WEBM
>>103501671
this was late 2022 early 2023
>>
>>103502307
Stoners and people who make shit music videos loved this
>>
>>103502315
oh yeah didn't a bunch of people sperg out when an ai video won some Pink Floyd contest?
>>
>>103502321
Reddit freaks out at anything AI.
>>
>>103502334
nah it was real world people getting mad
https://guitar.com/news/pink-floyd-slated-after-ai-created-video-wins-dark-side-of-the-moon-animation-competition/
the video wasn't very good
>>
>>103502344
It was true deforum slop
>>
File: HunyuanVideo_00304.mp4 (318 KB, 720x480)
318 KB
318 KB MP4
Someone make a new thread
>>
Baking
>>
>>103502450
>>103502450
>>103502450
>>
>>103501671
we downgraded desu, we can't do greta thumberg with modens models anymore
>>
30 more epoch before I find out if I have wasted a day, so excite



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.