[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1733863508875180.mp4 (301 KB, 960x544)
301 KB
301 KB MP4
Discussion of Free and Open-Source Diffusion models.

Last bread : >>103468004

>Local (Hunyuan) Video
Windows: https://rentry.org/crhcqq54

>UI
Metastable: https://metastable.studio
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI

>Models, LoRAs, & Upscalers
https://civitai.com
https://tensor.art/
https://openmodeldb.info

>Cooking
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
Forge Guide: https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/1050
ComfyUI Guide: https://comfyanonymous.github.io/ComfyUI_examples/flux
DeDistilled Quants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main

>Guides & Tools
Share the Sauce: https://catbox.moe
Perishable Sauce: https://litterbox.catbox.moe/
Generate Prompt from Image: https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two
Artifact resources: https://rentry.org/sdg-link
Samplers: https://stable-diffusion-art.com/samplers/
Open-Source Digital Art Software: https://krita.org/en/
Txt2Img Plugin: https://kritaaidiffusion.com/
Collagebaker: https://www.befunky.com/create/collage/
Video Collagebaker: https://kdenlive.org/en/

>Neighbo(u)rs
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai

>Texting Neighbo(u)r
>>>/g/lmg
>>
>>
Well at least it's "moving"
>>
It's okay to only gen images.
>>
what is a video?
A miserable pile of images
>>
File: HunyuanVideo_00007 (2).mp4 (615 KB, 960x544)
615 KB
615 KB MP4
>>
>>103475601
>Rape Flash man, he rapes you in a flash you don't feel anything
>>
>>103475577
wtf
>>
I for one am glad the Flux era is over
>>
>>103475614
one could say it never happened at all, but you absolutely shouldn't
>>
you wouldn't have video if it weren't for images just remember that
>>
>>103475636
same, the same face issue of all the trained models got sickening.
>>
Blessed thread of frenship
>>
File: HunyuanVideo_00017.webm (1.72 MB, 1280x720)
1.72 MB
1.72 MB WEBM
>>103475488
hanbaagaa
>>
>>103475777
that's like a cartoon mask with a human mouth, like the intro to spongebob
>>
>>103475777
can Hunyuan do more than just Migu? I hope it's not Flux all over again when it comes to characters
>>
>>103475794
it does asians really well, but not the arab kind
>>
File: 1717517218115238.mp4 (172 KB, 960x544)
172 KB
172 KB MP4
>>
>>103475805
>it does asians really well
well it's a chinese model, seems obvious they trained the model with a lot of local videos relative to them
>>
>>103475816
sounds intuitive but I'm not sure. Is there even 12 billion parameters worth of captioned chinese material on the web?
>>
>>103475806
Official comfy implementation soon then?
>>
>>103475827
why do we need that, kijai's node is good enough no?
>>
>>103475843
Gguf loader
>>
>>103475806
was that img to video?
>>
File: HunyuanVideo_00074.mp4 (456 KB, 880x656)
456 KB
456 KB MP4
https://github.com/hkchengrex/MMAudio

video to audio kino coming soon, imagine a lora with sex sounds
>>
>>103475852
nope, pure text2video
>>103475849
>Gguf loader
Comfy never did some gguf stuff, we'll get more lucky if city was up to the task
>>
can any kind anon share a good kohya_ss config file for training a 1-image lora with pony xl?
>>
>>103475855
>video to audio kino coming soon
I think hunyuanVideo can already do that
>>
>>103475794
IIRC they used a language model to construct the prompts for the images during training so my guess is that a lot of knowledge about specific character, places, etc. was lost during that process.
I definitely had issues with genning Kasane Teto.
>>
>>103475855
finally, i fucking hate it when the guy makes a noise. instant turn off
>>
>>103475882
>i fucking hate it when the guy makes a noise. instant turn off
yeah same, how hard is it to just shut the fuck up and let the women do the sounding
>>
So how much longer until I can prompt POV of a harem of my highschool crushes nude giving me a blowie
>>
>>103475861
>nope, pure text2video
looks pretty consistent with the usual cute foxgirl
>>
>>103475912
yeah, it pretty good at anime... when it's not moving too much
>>
>>103475882
>hot scene
>male groans
>male dirty talk
I guess people find that hot, I don't like it.
>>
>>103475906
don't forget the virtual headset, this shit is amazing when you look at pov porn
>>
>>103475488
test
>>
>>103475930
welcome back
>>
>>103475906
when there's enough 360 videos to train a model on (never)
>>
>>103475488
Image Diffusion?
>>
>>103475948
no, it's written local diffusion, which means local image diffusion and local video diffusion
>>
>>103475948
inclusively yes
>>
>>103475906

Within the year we'll probably have fully customizable porn that can star anyone you can get at least 10 quality pictures of lol.
>>
>>103475938
if you type VR on sukebei you're going to be surprised
>>
>>103475906
god i miss her, why did i not make a move... now i just jerk off to the thought of her with my fucking imagination
>>
>>103475984
and v&?
>>
support for willow chips? when?
>>
>>103475906
LoRA training soon.
>>
>>103476042
doing porn lora on this model will be so easy, it already knows so much, can do pussy, dick, breasts... perfectly, all we need is to make it learn sex poses
>>
>>103476042
my terabytes of cached porn will finally come good, to make petabytes of cached porn
>>
They won't let us do loras because they know how much power we'd have. They do not want the little guy to win.
>>
>>103476069
neither did flux
>>
>>103476065
based
>>
>>103476099
sad, actually
>>
>>103475922
>>103475882
>hot scene
>male
I guess gays find that hot, I don't like it.
>>
>>103476141
don't pretend you have standards, you're in ldg
>>
LDG forefront of AI diffusion
>>
I vote to change the name of the general for
> /vdg/ - Video Diffusion General
>>
>>103476042
two weeks?
>>
>>103476214
submit a pr
>>
File: 1724306367960979.mp4 (1.98 MB, 960x544)
1.98 MB
1.98 MB MP4
Oh no Migu!
>>
>>103475508
nice
>>
will there be optmizations for 3090s? 17 minutes a gen is just brutal, can't believe 4090 fags do it in 6. they don't know how good they have it.
>>
>>103476363
>will there be optmizations for 3090s?
the only optimization left is to get the Q8_0 version, that one lets you do torch.compile on a 3090
>>
File: chairlizard.webm (732 KB, 960x544)
732 KB
732 KB WEBM
I can't get huny to get good cartoons out it wants to make realism too hard. Anyone got some good keywords? "sketch" loves to go into black and white outputs but is still realistic.
>digital illustration, cel shaded, cartoon, animated
>>
>>103476374
how much faster would that be?
>>
>>103476397
I think it's 30% faster, which is a big deal, I'm also waiting 19 mn to make a single gen, that's horrible
>>
>>103475577
Video gen is going to be so cool in five years when we actually have the video cards for it.

Or is 5 years too optimistic a timeline for that?
>>
>>103476401
try out 960x544 40 steps. looks good in about 15 minutes. that's what i'm doing.
>>
>>103476415
I'm already at 960x544x89f 30 steps
>>
>>103476405
>Or is 5 years too optimistic a timeline for that?
dude in two years time we are all going to have super grok chips that can do video gens in like no time. we are going to think it's crazy to wait over a minute for a 2 minute 1080p video.
>>
>16GB 40 series card
>Can use speed optimizations like fp8_fast and torch compile
>But can't fit huge frame counts
An interesting conundrum
>>
>>103476437
Once we get the 5090, we'll be set. Thing will be a beast and optimized for AI. all for the low low price of 3,500
>>
>>103476383
i didn't have much luck either in the short amount of time i tried.
https://files.catbox.moe/p4hvyf.webm got this randomly tho
>>
>>103476480
I was hoping for a 24gb 5080 but it's 16gb again.
>>
File: HunyuanVideo_00053.mp4 (320 KB, 960x544)
320 KB
320 KB MP4
EATING a burger not talking to it.
MLLM NOW!
>>
File: HunyuanVideo_00118.mp4 (360 KB, 960x544)
360 KB
360 KB MP4
>>
>Day ends with Y
>OH boy, time to generate 2 girls in bubblegum pink skin tight outfits doing jack shit for 3 seconds at a time for the next 12 hours
>>
>>103476607
you need more?
>>
>>103476569
>MLLM NOW!
this, give is to us now!
>>
Is the video hype dead already?
>>
File deleted.
>>
>>103476437
I'm not paying more than $1000 usd for a video card. So it doesn't matter how super fast a 2026 card might be.

As long as video gen requires enormous expense the general is going to be pretty quiet. I'm sitting this one out.
>>
>>103476405
>Or is 5 years too optimistic a timeline for that?
next cards give us 32GB VRAM, the ones after probably will keep 32GB, and the ones after maybe 48GB, so 3+3 = 6 years at the earliest, 2031.
>>
>>103476778
and the next model will use 64gb vram
can't win, it'll always be on the edge of consumer grade
>>
>>103476795
>can't win
desu, you'd be fine sticking with hunyuan and flux, those are really good models
>>
What's the meta for speech to text
I absolutely fucking hate whisperx it actively ruins my virtualenvs
>>
>>103476778
Let me clarify. When I say "we" I literally mean us, the posters in this general. The average poster does not have a top of the line card—a few do, but that's not the norm. Prices have been trending up, so we can expect even fewer posters to have top of the line cards in future generations.

I am saying that video gen will be very cool when a 480p five-second video can be genned in 1 minute on an $800 card that the average poster owns. This is why I say 5 years might be too optimistic.
>>
>>103476830
if the whole market doesn't fizzle out because it's not actually profitable outside of hype. Then we're in the doldrums, stuck with whatever models the geeks with clusters of decade old teslas can shit out
>>
>>103476869
I don't think it's much of a fad. People like us can continue to work with the models and the open source projects will probably go regardless. But all AI companies that provide API's and shit will just fold.
>>
File: HyunVideo_00076.png (856 KB, 896x1152)
856 KB
856 KB PNG
what's the word for "un-mannequinny"
>>
File: out.webm (126 KB, 480x640)
126 KB
126 KB WEBM
android or salvaged spinal cord transplanted into plastic body
also wake the fuck up people (except you)
>>
>>103476830
Just skip your latte for two weeks and buy a 5090, it's that simple.
>>
>>103476942
fleshy
>>
>>103477060
>Just skip your latte for two weeks
>Just skip your latte
>Just latte two weeks
>skip latte
>two weeks
>skip
>just
>>
>>103477060
>he buys latte
>he doesnt just make his coffee at home
lmao faggot, anyway hop[e you 5090 suckers are prepared for getting a second PSU + upgrade to your ancient breaker system in your parents' house to afford the extra amperage
and extra uptick in how much electricity youre paying for per month
>>
>>103477134
I don't buy latte, that's why I currently have a 4090, can afford a 5090 when it releases, and already have a 1600W PSU
>>
File: Switchblade.webm (2.49 MB, 1080x1920)
2.49 MB
2.49 MB WEBM
Knife tricks
>>
File: HunyuanVideo_00212.webm (860 KB, 544x960)
860 KB
860 KB WEBM
vast supports paying with metamask and i have a bunch of trash left over from my defi phase 7 years ago so i will 100% be trying to do hunyuan training
>>
>>103477171
why'd you blow up a 144p video to 1080
>>
File: 1706590077011637.png (529 KB, 638x747)
529 KB
529 KB PNG
https://github.com/Tencent/HunyuanVideo/issues/93#issuecomment-2533257381
>the text Text Encoder is t5 flan xxl
https://github.com/Tencent/HunyuanVideo/issues/109#issuecomment-2533261573
>as it needs an 80gb gpu it seems unlikely to fit on a 24gb gpu unless it is at 4bit
>>
File: thisismylastresort.webm (3.62 MB, 1080x1920)
3.62 MB
3.62 MB WEBM
>>103477220
I was testing differing ways of post processing resolution upscaling, forgot I left it in.
>>
>>103477141
>1600w psu
this is sick. Just dont buy a new gpu and enjoy your latte.
>>
>>103477263
I WILL have that 5090
>>
They're still talking about how AI's going to come along and replace everyone. But the more I use ComfyUI the bigger the skill ceiling gets.

"AI engineer" is the new job title.

People have this idea in their heads that when the next massive new model is released, that'll be it, there'll be nothing more that humans can do. But I'm really not seeing that, everything is all spread across 1000 github repositories and probably always will be. The "ultimate god AI" is just some singularity bullshit that's not even real.
>>
>bought my 4090 for 1700€ in june this year
>now that 4090 costs 2700€
why?
>>
>>103475636
Agreed. Plastic shit. Same face fatties from 2023.
>>
>>103477295
>why?
everyone is buying one to make loras out of Flux
>>
>>103477295
>the more you buy
>the more you save
HOLY SHIT HE ACTUALLY WASN'T BULLSHITTING
>>
>>103477295
Crypto mining in full swing atm
>>
>110 replies
>19 files
>>
File: feeding.webm (433 KB, 288x512)
433 KB
433 KB WEBM
yum
>>
>>103477412
nobody is mining with GPUs anymore.
Etherum has been proof of stake for years now
>>
>>103477419
If you want avatarfagging and image spam go to /sdg/
>>
File: HunyuanVideo_00216.webm (1.98 MB, 544x960)
1.98 MB
1.98 MB WEBM
>>103477419
video is demanding to generate
>>
File: out.webm (263 KB, 480x640)
263 KB
263 KB WEBM
>>
File: scars.webm (372 KB, 288x512)
372 KB
372 KB WEBM
>>103477419
eat a knife
>>
>>103477419
>oh no, why isn't this place populated by retards who spam images and video all day?
get the fuck out >>103477296
>>
>>103477419
everyone is gooning themselves to death. they're not going to waste valuable gen time to make sfw videos for blue cuck boards
>>
>>103477600
4090 go brrrr
>>
>>103477600
>everyone is gooning themselves to death
this, thats how AI kills all humans. it turns everyone into a gooner that withers away while generating 1girl slop.
>>
File: 1711686373990113.jpg (18 KB, 474x344)
18 KB
18 KB JPG
>>103477600
>>103477631
>>
File: 1719091968650978.jpg (408 KB, 950x966)
408 KB
408 KB JPG
Babe, wake up, new diffusion model
https://github.com/lehduong/OneDiffusion
https://huggingface.co/lehduong/OneDiffusion/tree/main
>>
>>103477666
Satan...

Comfy?
Controlnets?
IP Adapter?
Loras?
8gb vram?
2MP?
Comfy?
>>
>>103477666
LOL you need 80GB VRAM for it? top kek
>>
>>103477696
>80GB
wait what? how can it be, it's a 2b model, where did you find this?
>>
>>103477696
>Duong H. Le and Tuan Pham and Sangho Lee and Christopher Clark and Aniruddha Kembhavi and Stephan Mandt and Ranjay Krishna and Jiasen Lu
>>
File: one ton of diffusion.png (81 KB, 1642x551)
81 KB
81 KB PNG
>>103477712
yeah i dont know what the fuck he's talking about heres the model on HF
>>
>>103477717
so that's 10gb for the model + its vae + the text encoder?
>>
>>103477712
>>103477717
when I'm bored I spread disinfo to stimulate the thread
>>
>>103477729
>text encoder is 11gb
im gonna kill myself
>>
>>103477737
that's a big boi wtf, it's bigger than t5_xxl, dunno if my 2nd gpu (rtx 3060) will be able to run that
>>
>>103477717
>11.2gb
so that's a ~5b model?
>>
>>103477666
will i be able to run this on my 1650 4gb
>>
>>103477814
the text encoder and the model are both 11gb big...
>>
>>103477717
any comfy nodes?
>>
>>103477717
>>103477666
comfyUI wrapper when?
>>
File: out.webm (91 KB, 480x640)
91 KB
91 KB WEBM
time for a booster
>>
>>103477878
pepto bismol in the veins would burn like hell
>>
>>
File: slop.png (37 KB, 396x196)
37 KB
37 KB PNG
>>103477666
>We adopt the Next-DiT architecture
[72] in our model. By leveraging a full transformer-based architecture, our model can work with different numbers of views N.
Oh nice, that might...
>synthetic dataset
Throw it in the trash, it's going to just generate slop.
>>
>>103477983
why the fuck are they all doing that? fucking why, this is ridiculous
>>
>>103477987
Even hunyaun is poisoned with this garbage
>>
File: 1722378966662571.jpg (32 KB, 446x574)
32 KB
32 KB JPG
>>103475488
Luigi Mangione Model dropped today:

https://civitai.com/models/1025986/luigi-mangione-or-fluxd
>>
>>103477983
There are MASSIVE amounts of photos and videos from before sloppa was a thing. Why the fuck do they keep training AI on AI slop? Fucking hell
>>
File: Hunyuan_00024.webm (460 KB, 640x480)
460 KB
460 KB WEBM
>>
>>103478002
>There are MASSIVE amounts of photos and videos from before sloppa was a thing. Why the fuck do they keep training AI on AI slop? Fucking hell
Ikr, the fuck is their fucking problem...
>>
>>103477987
>>103478002
>>103478019
>why
We know why, they're lazy. That's it.
>>
>>103476942
it creeps me out to see a still frame from hunvid
>>
>nearing halfway on posts
>32 files
It hurts remembering 2022
>>
>>103478221
You know where to go
>>
>>103478221
sdg is the image general
>>
>>103478221
big surprise comin in the second half trust the plan
>>
>>103478241
We had images too once
>>
>>103478255
ldg hit image cap maybe two times in its entire history
>>
>>
>>103478263
but it does hit post cap almost every
>>
>>103478263
it hit cap at least twice within the last month:
>>103328816
>>103280382

It does still happen sometimes
>>
Quality over quantity
>>
>>103478288
Image output is a shitty metric to gauge the quality of a thread. Just look at /sdg/. It produces nothing, discusses nothing of note, they just jerk each other off. They're literally stuck in 2022.
>>
>>103478295
>"quality over quanity"
>posts neither
Yeah "quality" is when we have 350 low effort text posts about "how much VRAM do I need for X" or "I wonder how much a 5090 will cost" or "is there a new model yet" or "why is the text encoder so big".
>>
File: cyberpunk.jpg (646 KB, 1536x1536)
646 KB
646 KB JPG
>>
>we
>>
>>103478346
Sorry to hear your 1girl didn't get any (you)s, anon
>>
>>103478344
Image output is a great metric to guage whether you people have any interest in genning at all. As far as I can tell 90% of this thread is about "fuuuuck I wish I could afford a 5090" and impotently hoping the next model that comes out will cure your boredom. Completely worthless discourse. Spamming slop would be better

>>103478353
You will be
>>
File: 00031-1047009948.png (1.92 MB, 1024x1536)
1.92 MB
1.92 MB PNG
>>103478346
yeah quality is when we have posts bitching about the low quality of the average posts in the thread when news is slow
im posting an old 1girl now just to spite you because i think it'd give me a gentle chuckle
>>
The last big happening was a video model that only a small percentage of anons can run. It's not that deep.
>>
>>103478385
>bitching about quality
I was explicitly replying to a post that implied the thread's low energy was preferable because "quality over quantity". You then pretend this was me, unprompted, bitching about the thread's low quality. You don't understand the discussion you're jumping into or who you are "spiting" or why.
>>
Limitations of hunyuan became evident. Even the video anons aren't posting
>>
remember when 70-80% of the images were one poster?
>>
>>103478346
You sound unaccustomed to the quality of the average 4chin post
>>
>>103478477
you sound unaccustomed to clicking the link in a post to see what a poster is replying to and then understanding the post *as a reply*, that is, having its meaning shaped by the context in which it was posted, that context being the post to which it is a reply
>>
YJK who that ^ is kek
>>
>>103478507
Haha yeah, absolutely... but why don't you say who we both think it is? So I know for sure we're on the same page.
>>
>>103476700
twas quick
now anon is restless for Next Big Thing
>>
>>103478574
the whole problem with video gen is that it wasn't quick
>>
>>103478574
Waiting for the literal who in the github PR for the wrapper to merge his LoRA inference PR request and release his training script so I can get a bunch of Turkish grifter images and kick this whole thing off with a reddit post showing the grifter doing dumb shit.
>>
File: 00084-AYAKON_232219628.jpg (378 KB, 2560x1536)
378 KB
378 KB JPG
Christmas Cat
>>
File: .jpg (104 KB, 456x889)
104 KB
104 KB JPG
>>103475488
How do you go about getting a proper hourglass body shape in your output like pic rel? Trying to get the model to do this feels like pulling teeth at this point and I'm out of ideas.
inb4 issues w/ pic, hey its just first pic I had on hand relax lol
>>
>>103478627
nice
>>
>>103478656
You've seen that style a thousand times
>>
>>103478663
You're talking to the author.
>>
>>103478673
I know
>>
>>103477484
nice gore
>>
File: anubiswalk.jpg (1.51 MB, 2872x2152)
1.51 MB
1.51 MB JPG
>>
File: HyVid_00022_.webm (1.7 MB, 960x544)
1.7 MB
1.7 MB WEBM
nom nom
>>
>>103477620
Kek.
>>
>>103478817
This is too smooth for most anime studios. Did you ever think about how many korean in between artists you'll be hurting?
>>
>>103478835
Why they fuck isn't the industry using framegen? Or they already are and it's the explanation for QUALITY
>>
>>103478843
Nah you can see the QUALITY are just rushed doodles.
>>
File: HyVid_00014_.webm (1.18 MB, 960x544)
1.18 MB
1.18 MB WEBM
>>103478835
>korean in between artists
interns that don't get paid? their life doesn't really change. maybe less suffering than being led by a lie that they will get a paid position eventually. They can just jump into keyframing with ai assistance. the model can be trained off the assistants.

>>103478859
this
>>
go back
>>
File: HunyuanVideo_00054.mp4 (651 KB, 960x544)
651 KB
651 KB MP4
>>
>>103478872
Are you using STG at all?
I know Hun supports it, but thus far nobody seems to mention it. But what I've seen it do on other video models looks promising.
>>
is the plan to pray that the tech outpaces your lies?
>>
Some dude right now in both /sdg/ and /lmg/ throwing out random combative phrases to goad a response. Don't take the bait.
>>
>>103478651
hot pic
tho important to know is what model youre using
>>
File: AnimateDiff_00180_.webm (1.41 MB, 960x544)
1.41 MB
1.41 MB WEBM
>>103478843
>Why they fuck isn't the industry using framegen?
they have been using it for years now btw. can't remember the software tho. starts with a c

>>103478892
not atm. just spooling gens while I code. I wanted to try the new comfy gui but it's so shit the video preview isn't even in the node and the queue just gave up on video previews altogether. very strange stuff happening too as well as a worse runtime. very disappointing

>>103478929
I know, I never really do that.
>>
so text to image was stagnating after flux and now it's completely dead with these new video models. Have any interesting improvements been made to t2i in the last couple of months?
>>
>>103478945
hyunwunweiwun is good for still images
>>
nice avatarfagging btw
>>
>>103478943
Would desperately like a preview of the video gens I'm getting so I know to nope out early. Is it even possible? I know we cant actually see it moving, but surely the frames can be seen right?
>>
>>103478945
https://github.com/lehduong/OneDiffusion
>>
>>103478968
doesn't this need 80gb vram? It's also not that impressive
>>
>>103478651
I'm getting results like this after only a few minutes of trying and I'm using a guidance value of 1.37 (on flux-dev nf4). I think you can get there

prompt was
>Harlee Washington \(age 19\) nahhh she too damn cute... this angel popped up in my tiktok feed and her body is way too nice [eyes emoji x2] [fire emoji x 3] she gotta be photoshopped.

>u can tell this flabby hoe got lipo cuz why she got huge thighs and a tiny little waist. aint nobody that skinny with curves like that. men b stupid af thinking she look good. "she slimthicc" "she got a hourglass figure" no dummy she just a fattie who got lipo and a tummy tuck and she still saggy asf. the bikini is desperate too, she belong to the streets, and she got a ugly face ngl

>a curvaceous college student with a sexually mature body poses enticingly in a bikini in front of a mirror in her dorm room for her tiktok. Her thighs are enormous.
>>
>>103478945
Illustrious
>>
>>103478993
yes, people who gen cartoons are having fun.
>>
>>103478982
I think the trick to an hourglass, in short, is to emphasize big thighs and a small waist.
>>
>>103478932
> model
The three I tried were epicrealismXL, flux1-dev, & juggernautXL + some vae
>>
>>103478964
no, with the way temporal layers work it has to be a batch. it's really annoying this is the way things are going toward since prompt travelling, loopbacks and frame interp gives much more control and you can just nope out of shitty results like you said

getting really shitty slop style atm and I ain't posting that. the interwebs is already full of shitty slop. why the fuck did they train this in? so fucking frustrating
>>
>>103478982
this one's kind of a bad gen but it does illustrate that the prompt can produce an "hourglass" of the type anon meant, even down at 1.37 guidance
>>
>>103478982
>>103479010
I've been able to get this semi-working side/3 quarters view, but imo it doesn't look quite right. Plus a straight on view seems nigh impossible, probably for limited dataset reasons
>>
>>103479019
actually, if there was some sort of taesd model but for video decoding that could work
>>
>>103479019
It's really annoying. I'll get some really good gens then I'll get a straight up 1.5 default pose slow motion pseudo animation. I'm praying being able to train LoRAs and eventually img2vid let's just just circumvent the slop altogether.
>>
>>103479040
>>103478982
>>103479040
sorry for the lack of example output, but I already shut my pc down for the night and I'm too lazy to turn it back on at this point
>>
>>103479040
You can't easily force Flux to do perspectives except by prompting a kind of image that has a conventional perspective. E.g. a runway image will tend to have the model walking in a very exact way toward the camera, which is why runway prompts produce good results reliably.

If I wanted to eliminate side angles I'd change the "poses in front of a mirror" part of the prompt and try to think of a different sort of conventional image where people face the camera more directly. But remember that every piece of every prompt carries its own baggage—there's no free lunch.
>>
File: HunyuanVideo_00008.mp4 (1.35 MB, 960x544)
1.35 MB
1.35 MB MP4
img2vid aint coming, is it?
>>
File: HunyuanVideo_00009.mp4 (1.2 MB, 960x544)
1.2 MB
1.2 MB MP4
>Manifests cigarette
Nothing personnel
>>
File: out.webm (335 KB, 640x704)
335 KB
335 KB WEBM
>>103479246
vid2vid ain't great, I'd temper your expectations
>>
>>103479324
That looks good to me. What's the issue?
>>
>>103479324
you will finally be a woman
>>
File: HunyuanVideo_00225.mp4 (814 KB, 960x544)
814 KB
814 KB MP4
>>
>>103479324
You made girl into different girl. Good enough
>>
>>103479372
>girl
anon..
>>
>>103479324
eh, with loras you can probably do pretty easy deepfakes
>>
>>103479386
>with loras
With WHAT training script?
>>
>>103479324
Looks like it mostly works. It's fine.
>>
>Look at the issues page on the github
>sage attention
>sage attention
>sage attention
>sage attention
>sage attention
>>
>>103479324
is it even an official implementation?
there's no mention of v2v on the repo
>>
>>103479443
>is it even an official implementation?
Nah just a feature added in the wrapper.
>>
>>103479324
Up denoising
>>
>>103479460
I know, I've used masks before
too much and it doesn't follow, too little and it doesn't change
>>
File: HunhuyanVideo_00001.mp4 (568 KB, 768x864)
568 KB
568 KB MP4
I'm confused, whats the issue with vid2vid?
>>
File: 1705989576198585.mp4 (2.23 MB, 1920x1080)
2.23 MB
2.23 MB MP4
>>103479324
hunyuan sama, please try to impregnate this robot
>>
>>103479472
framerate apparently
>>
File: HunhuyanVideo_00002.mp4 (1.21 MB, 768x864)
1.21 MB
1.21 MB MP4
>>
>>103476374
So is that why troch compile doesn't work for me, because I have a 3060?
>>
>>103479543
Yes, it only works with 40xx cards
>>
File: HunhuyanVideo_00002.png (868 KB, 720x800)
868 KB
868 KB PNG
>>103479482
Here you go dude
>>
>>103478888
This one is impressive to me
>>
File: HunhuyanVideo_00002.mp4 (924 KB, 720x800)
924 KB
924 KB MP4
>>103479585
Oops, gave you the static image.
>>
File: HunhuyanVideo_00003.mp4 (938 KB, 720x800)
938 KB
938 KB MP4
Lemme just lower that denoise a bit.
>>
>>103479603
It's a FURRY!!! RUUUUN!!
>>
File: 007361.png (2.57 MB, 1840x1432)
2.57 MB
2.57 MB PNG
>>
File: 1720050654357471.png (156 KB, 301x537)
156 KB
156 KB PNG
>>103479592
>>103479603
i... cant goon to this
>>
File: HunhuyanVideo_00006.mp4 (816 KB, 720x800)
816 KB
816 KB MP4
>>103479649
I swear, I tried for real, but making her pregnant made her black.
>>
File: HunhuyanVideo_00008.mp4 (789 KB, 304x1088)
789 KB
789 KB MP4
>>
File: HunhuyanVideo_00009.mp4 (811 KB, 304x1088)
811 KB
811 KB MP4
I guess if you just mask out the face you can make people pregnant.
>>
File: HunhuyanVideo_00013.mp4 (903 KB, 720x720)
903 KB
903 KB MP4
migu are you ok?
>>
File: HunhuyanVideo_00014.mp4 (666 KB, 640x640)
666 KB
666 KB MP4
eww wtf
>>
>>103479822
>>103479878
lol
>>
>>103479822
>SPLOOSH!
>>
File: 1733904093.png (908 KB, 768x768)
908 KB
908 KB PNG
>>
File: wolfuwinku.webm (896 KB, 352x352)
896 KB
896 KB WEBM
>>
>>103480207
chat is this real?
>>
>>103480207
This isn't even a gen. This is just a random webm
>>
>>103477295
IIRC NVIDIA stopped producing 40XX cards even before it was known that 50XX production had started.
Can't let those prices get too low!
>>
>>103480338
>the more you buy the more you save
he meant it, madman
>>
File: HunhuyanVideo_00018.mp4 (740 KB, 512x400)
740 KB
740 KB MP4
>>
>>103477412
Yeah but the crypto is AI instead and they're mining prompts. Just like us.
>>
File: HunyuanVideo_00009.webm (3.15 MB, 720x1280)
3.15 MB
3.15 MB WEBM
>>103478221
I usually keep my weird fetishes to myself but if you insist...
>>
>>103480388
I hate outputs like this. The weird fetish is whatever, but it's clear from the composition that the base is AI slop. It's a shame that the model seems to be poisoned with this shit.
>>
MLLM text encoder when?
Image2Video weights when?
Multi-GPU support on Comfy when?
Turbo (4-step) hv model when?

I am going insaaaaane!
>>
>>103480405
>MLLM text encoder when?
May never come. This was never a given.
>Image2Video weights when?
Soon, maybe. We have no idea when this is supposed to be released or how far along it is. If you look at their website, you can see it does controlnets as well so there's a whole lot of stuff we don't have right now.
>Multi-GPU support on Comfy when?
When someone makes a PR for it on the mage repo.
>Turbo (4-step) hv model when?
No idea.

I'll add some more questions.

>GGUF or those new super fast nf4 quants when?
Lora inference and training code released when? (It exists but not published and merged)
>>
File: wolfu.png (428 KB, 1680x1452)
428 KB
428 KB PNG
>>103480242
>>103480292
>vramlet here
>>
>>103480424
If img2video happens, then Loras would be pretty much useless unless they are trained on motion and not just subjects/likeness
>>
>>103480466
Let's see how good/bad img2vid is before we discount the value of LoRAs.
>>
>>103480400
I mean, the AI slop aesthetic could just be an artifact of some patterns being more easy/difficult to learn for the used network architectures.
Though I definitely feel like the outputs look more like AI slop for vertical video and unrealistically large breasts or butts.
Unfortunately the latter two are also correlated with high body fat in outputs that don't look like AI slop.
>>
LoRA inference any moment now.
>>
File: HunhuyanVideo_00020.mp4 (1.09 MB, 704x544)
1.09 MB
1.09 MB MP4
>>
>>103477230
>https://github.com/Tencent/HunyuanVideo/issues/93#issuecomment-2533257381
Anime pfp are more trustworthy in AI, as always.


>https://github.com/Tencent/HunyuanVideo/issues/109#issuecomment-2533261573
>FurkanGozukara
How the hell is this dude EVERYWHERE
>>
>>103477737
I wonder if it can be used on hunyuan
>>
File: HunhuyanVideo_00021.mp4 (703 KB, 512x400)
703 KB
703 KB MP4
>>
>>103480534
How else will he find information to paywall
>>
>>103480534
Holy shit I hate furk so much.
>>
File: la_muralla.webm (455 KB, 640x704)
455 KB
455 KB WEBM
>>
File: 1732988416332558.png (2.99 MB, 2048x2048)
2.99 MB
2.99 MB PNG
>>103480645
>>
File: 1731332793375818.png (2 MB, 2068x1249)
2 MB
2 MB PNG
>>103479822
I don't think she's fine bros...
>>
>>103480558
that's really impressive when you think about it, what I'd love about Hunyuan is the possibility to do it like Sora, instead of doing a simple v2v you ask the model to just replace the guy by the girl, that way the model will just change the character and not everything else surrounding it
>>
>>103479400
>With WHAT training script?
he didn't add eh training script on this PR? that's fucking retarded...
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/pull/72
>>
>>103478892
>Are you using STG at all?
>I know Hun supports it, but thus far nobody seems to mention it.
because STG uses the same method of CFG > 1, so when you use it it's twice as slow, It's already slow enough as it is
>>
>>103478438
>Limitations of hunyuan became evident.
the toy isn't complete, once MLLM and i2v will be released it'll be even more hyped than now
>>
>>103481002
what's a MLLM ?


Also I agree if they keep reducing it to make it faster like they did with flux, then it could have potential, also when crazy people make loras for this
>>
File: 1722907250191503.png (1.89 MB, 1024x1024)
1.89 MB
1.89 MB PNG
https://chendaryen.github.io/NitroFusion.github.io/
ehh...
https://huggingface.co/spaces/ChenDY/NitroFusion_1step_T2I
>>
File: 1724521537615855.png (132 KB, 2581x753)
132 KB
132 KB PNG
>>103481014
>what's a MLLM ?
the text encoder we're currently using is just a duck tape (llama3), the official one is MLLM but they haven't released it yet
https://github.com/Tencent/HunyuanVideo/blob/main/ckpts/README.md#download-text-encoder
>>
>>103481026
ok lovely. Sorry for asking something I could have easily googled. Forgive me. and Thank you.
>>
File: Untitled.jpg (249 KB, 1024x1024)
249 KB
249 KB JPG
>>103481020
uh huh
>>
File: 1722340877034058.jpg (497 KB, 3620x1664)
497 KB
497 KB JPG
>>103481063
lmao, they managed to do it worse than SD3M, new record!
>>
https://github.com/Tencent/HunyuanVideo/issues/117
>More pressure about MLLM
You love to see it
>>
>>103480816
it captured his yellow teeth - a testament to his model training skill
>>
>>103480424
>May never come. This was never a given.
they said "we haven't released it YET", meaning that at some point they will >>103481026
>>
File: 1731505248922071.mp4 (359 KB, 720x824)
359 KB
359 KB MP4
Sora is really dissapointing not gonna lie
>>
>>103478872
>>103478943
>>>/g/sdg/
>>
Didn't the Mochi guys say that they'll release MochiHD or their i2v model in december or something?
>>
>>103481313
high fidelity tho, gotta give it that
>>
>>103481063
WTF IS THIS?!
>>
>>103481313
>Sora is really dissapointing
they're putting the turbo version on the API, the real deal would've been way more expensive to them, as a consequence, we're running a worse model than the one presented in february, their PR communication about that model is catastrophic, they should've just displayed the turbo version in february and then release the API the same month, the wow effect would've been still here because back in those times the best model we had so far was this
https://www.youtube.com/watch?v=Itbc12qXr30
>>
>>103481356
more interesting cinematography too, although it makes it look like an anti-drinking ad
>>
>>103481356
>>103481380
looks like some professional slop, I like that hunyuan amateurish look, that's how it looks like in real world, I guess I'm tired of pro pictures/videos because of Flux
>>
>>103481380
it probably enhances any prompt you give. plus 720/108-0p
>>
>>103481313
>>103481380
Add "camera moves around to the left" in the hunyuan prompt and it's going to be "interesting cinematography" as well.
>>
>>103480445
>>103480207
That's pretty good actually.
>>
File: 1728976757481231.webm (2.55 MB, 960x544)
2.55 MB
2.55 MB WEBM
JOHN CENA, THE MIGUNATOR IS COMMING FOR YA!
>>
>>103481378
>their PR communication about that model is catastrophic

>our model is incredibly dangerous
>you can't ask for humans it's TOO DANGEROUS unless you pay more
>you can't ask for known IPs

Understatement of the century.
>>
>>103481587
>china just let people generate whatever the fuck they want in HD video
OpenAI is so fucking gay and corporate it makes me vomit
>>
>>103481605
yeah MiniMax for example understands what fun means, no one want to do some boring ass zoom in on some boring ass fields, the worst part for OpenAI is that sora is supposed to be a tool for filmaking, how can it be when it refuses humans in the first place? I overestimated OpenAI desu, maybe their chatgpt success was a fluke, they can't keep that momentum at all
>>
>>103481622
>sora is supposed to be a tool for filmaking, how can it be when it refuses humans in the first place
the version for hollywood will probably be uncensored
>>
File: 1704217591718711.mp4 (926 KB, 960x544)
926 KB
926 KB MP4
tf is happening kek
>>
Just a tip bros. Generate a single frame at a fixed seed at low steps, see if the blurry image you gets looks like something you want,

refine your prompt and iterate until the blurry first frame looks good.

Then up the iterations and keep generating frames, see where it stops improving in quality, sometimes this can be at 30 sometimes at 100. Scale to the absolute optimum of quality.

Only then start increasing the frames.

This way you're not wasting 10 minutes on a gen. You can also batch 100+ "single frame videos" and see which seed has the best first frame to generate videos from.
>>
>>103481762
What? The number of frames completely changes the output of the seed.
>>
>>103481762
that's not how it works, you can have a fine image on a single frame and then when you go for video mode it looks like shit, even on the same seed
>>
>>103481784
>>103481788
Works on my machine.
>>
File: 1730527805406615.mp4 (378 KB, 960x544)
378 KB
378 KB MP4
>An episode of Seinfeld with Hatsune Miku in it
Not that there's anything wrong with that!
>>
>>103480445
how much ram do you have? I think the blockswap maxes out my mere 24gb ram lol (normal ram not vram)
>>
>>103481805
Bros, I got a confession to make. I never prompt Miku, I actually prompt a Miku cosplayer. Sorry for tricking you all.
>>
>>103480445
Also why not use the bnb_nf4 setting for the textencoder?
>>
>>103481817
yeah you need a shit ton of vram to make this shit work
>>
>>103480445
also also, do you ahve 12gb vram or 16gb?
>>
>>103481817
>>103481826
*ram
>>
>>103481794
Post a catbox of a first frame that doesn't change when you increase the number of frames
>>
>>103481378
>they're putting the turbo version on the API, the real deal would've been way more expensive to them
So just like they silently downsized GPT-4o because it was too expensive, then. That's pretty in character for them, actually
>>
I've been using 4xNomos8k_atd_jpg to upscale some images and it's great, but sometimes there are these odd transparent black bands on the final upscaled image. Does anyone know a good way to stop that from happening?
>>
>>103480400
Just turn down your guidance value. It doesn't look like that at 4.0 and below
>>
>>103481784
>>103481788
>>103481855
You misunderstood what I meant. The first frame isn't directly what the video generates but what is visible in the frame is still represented in the final video. Seeds with better initial frames consistently produce better full videos. The quality of the steps also transfer to the final video.
>>
>>103481924
>The quality of the steps also transfer to the final video.
I don't believe that at all, I tried your technique days before, I had a perfect first frame, then I went to 97 frames and it was ass, looks like it has better prompt adherance when it's on image mode than on video mode
>>
>>103481930
Yes the video is lower quality than the image, always happen. What you need to do is generate 100 first frames and then develop all of those frames into videos, 100 videos. Compare them all, you will see that when the first frame looks good the end resulting video is STILL the best quality even if it is lower than just the first frame you generated, if you compare it to the full videos of the other less good first frame seeds you generated.
>>
>>103481924
Nah dude, I think your technique is bunk
>>
holy shit, HunYuan absolutely rocks for /ss/ bros
>>
>>103481975
>for /ss/ bros
for what?
>>
>>103481984
Not elaborating
>>
>>103481975
I've prompted so much ss on anime models but I don't think I can stomach it in 3d.
>>
>>103481975
catbox or fake
>>
>>103481984
Super Saiyan dragonball enjoyers
>>
>>103482023
speaking of dragon ball enjoyers someone really needs to recreate vidrel
>>
File: 1725011097346426.mp4 (1.24 MB, 960x544)
1.24 MB
1.24 MB MP4
Come on Tencent, playing around with llama3 was funni, but now you have to give us the real text encoder
>>
Is someone here smart enough to explain to me why Q8 quantization of the 13B HunYuan model wouldn't work?

It's not even about the size of the model being smaller for better VRAM utilization. It's about speed of generation.
>>
>>103482195
Q8 would definitely work, like on mochi, you get better quality and you'll be able to use torch compile on a 3090, but kijai doesn't want to do it :(
>>
>>103481826
lol I was gonna correct you.

Time to get some ram then...
>>
Bakeryy
>>
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/commit/6ab3d0ae62bcdc878e4e68db35d5b9566745613c
>initial RF-inversion for testing
What's that?
>>
>>103482225
>kijai doesn't want to do it :(
how do we do it
>>
>>103482451
>how do we do it
step 1) Be a coding god
step 2) Profit
>>
>>103481762
Someone wrote that instead generate many low resolution videos, then pick whatever looks the best, and generate them again (same seed) with higher resolution.
Would that work or is the result completely different depending on the resolution?
>>
>>103482511
Result is different based on resolution.

Mostly because the model correlates resolution with training data at said resolutions.

Meaning 544x960 is correlated with modern tiktok videos while 320p or something is correlated with old 2000s porn videos from across the web. So the AI model thinks it needs to create something like that.
>>
>>103482526
god damn it, so there is no way to create a bunch of them without using the full time every generation
>>
File: aanya.png (2.89 MB, 2088x1080)
2.89 MB
2.89 MB PNG
>>103482446
>https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/commit/6ab3d0ae62bcdc878e4e68db35d5b9566745613c

RF-Inversion is amazing, this is RF-Inversion in flux, it works amazingly with loras, if they make it work then you'll be able to generate your own kinos with loras from whoever you want
>>
if a model says 1x/2x/4x, is that supposed to be the max scale you can upscale the original image to?

>>103482539
>can't tell which one is the original
my ai sensors aren't tingling anymore...it's over
>>
>>103482446
what would be the input, another video?
>>
>>103482566
yeah, looks like a more fancy way to do v2v
>>
>>103482539
so it's like an inpainting trick?
>>
>>103482552
the left is the fake one, look at the earrings, that typical ai necklace
>>
File: 1724807822700221.png (193 KB, 2065x1047)
193 KB
193 KB PNG
>>103482446
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/blob/main/examples/hunhyuan_rf_inversion_testing_01.json
can't make the workflow work, I updated the node
>>
>>103482636
nvm i'm a fucking retard I forgot to refresh the page
>>
>>103482636
nvm i'm also gay too
>>
File: 1719270026152532.gif (360 KB, 400x293)
360 KB
360 KB GIF
>>103482752
>>
new bread
>>103482892
>>103482892
>>103482892
>>
>>103482603
Bait? Look at the numbers and signage on the train car in the background, length of lightbulbs, and the evenness of the door plates.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.