[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


The First Output Edition

Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106722132

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Neta Lumina
https://huggingface.co/neta-art/Neta-Lumina
https://civitai.com/models/1790792?modelVersionId=2203741
https://neta-lumina-style.tz03.xyz/

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
>nigbo
>>
I removed sage attention. 480s for a 720x720, lightning loras, 8step gen. like 50s of those are color correction and interpolations.

Sounds about right?
>>
File: ComfyUI_00088_.png (2.15 MB, 1024x1024)
2.15 MB
2.15 MB PNG
oh boy, I drank too much again
>>
File: WanVideo2_2_I2V_00459.webm (749 KB, 1248x704)
749 KB
749 KB WEBM
>>
>>106722862
it's literally called Hunyuan Image 3.0, they're advertising it like an image model and seem to falsely believe that anybody gives a shit about how many completely generic sameface asian ladies it can gen at 1280x768 when it's an 80 BILLION PARAM model. Like no, I don't want to see ANY shit that doesn't show off the maximum extent capabilities of your 80 BILLION PARAM model, every single one of the examples should be impressive, when in reality none of them are.
>>
So HunyuanImage-3.0 is actually an autoregressive MoE LLM actually, not just some dense 80b image model
>>
>>106723744
It could be a solid gold turd. I can't run it.
>>
>>106721558
>>106721465
mode/prompt/catbox?
>>
>>106723751
ramlet issue
>>
>>106723730
Idk anon, an anon tested it towards end of last thread, and while not enough to make a conclusion yet, those gens look 80b worthy to me. They are better than anything similar I've seen so far from a local model, and its prompt following is very, very good. It is good enough to spark my curiosity and warrant possible API testing. Let's put our grudges against its steep requirements aside and see for ourselves if the model is good thru testing.
>>
File: 1735271275820775.jpg (910 KB, 1416x2120)
910 KB
910 KB JPG
>>106723623
i do sexy sometimes
>>
>>106723813
You've got fresh model brain. It's slopped.
>>
>>106723815
nice style, what artist?
>>
File: 250928-130223(1).mp4 (1.63 MB, 676x2000)
1.63 MB
1.63 MB MP4
>>106723717
Hopefully the girl retain her body proportion from reference in the next wanimate version
>>
File: 1740148854987631.jpg (708 KB, 1416x2120)
708 KB
708 KB JPG
>>106723826
fresh lora i trained on >>106723142
>>
>>106723839
She had to match Tuco's body shape it was a hard ask.
>>
>>106723717
standing there like a silent gtav player during cutscenes
>>
File: WanVideo2_2_I2V_00460.webm (885 KB, 1248x720)
885 KB
885 KB WEBM
>>
>Do something in another workflow
>Preview on workflow that is currently rendering fails to display
When are they going to fix this stupid bug. What the fuck are they doing with that API money?
>>
File: 1743684409071253.jpg (30 KB, 600x753)
30 KB
30 KB JPG
>2026
>still no dedicated wiki or pastebin dedicated to ksampler settings for each model
>>
>>106723995
euler/simple
>>
>>106723995
depends on the workflow you're using and steps, so its worthless, just find a popular workflow and use it
>>
>>106723995
Then do it yourself then.
>>
File: image3.png (1.88 MB, 1024x1024)
1.88 MB
1.88 MB PNG
Third output of Hunyuan 3.0
This one took 38:09. It's taking longer each time, so I need to figure out if it's some kind of leak or a higher variability than diffusion models.

I told it to use Yoshitoshi Abe's style and the output is still generic, so that's a bit informative.
>>
File: WanVideo2_2_I2V_00461.webm (2.7 MB, 1248x720)
2.7 MB
2.7 MB WEBM
>>
>>106724047
Fully chinese within 3 seconds. Impressive.
>>
>>106724038
this looks like fucking 2022 models. the random lines and chaotic fine detail
>>
>>106722281
why is this face familiar to me
>>
>>106724054
If you run mayli through a tagger it can't make up its mind. Half of the captions are asian, the other half are caucasian.
>>
ramtorch status?
>>
>>106724100
Ask yourself what the status of any of the furries projects are and you got your answer.
>>
File: 00219-1116224549.png (1.13 MB, 1216x832)
1.13 MB
1.13 MB PNG
reduce steps to 4 to unslop. vrambloaters hate this trick
>>
>>106724124
cool 1995 CRT TV lora!
>>
>>106724092
i also have a difficult time expressing the uniqueness of her beauty through words alone
>>
https://huggingface.co/lightx2v/Wan2.2-Lightning/tree/main/Wan2.2-T2V-A14B-4steps-lora-250928
is it better than the previous version?
>>
>>106724155
Looks a lot better. But also T2V with I2V coming "When an even better T2V LoRA is released".
>>
>>106724155
Movement is a lot better and there's some improvement on everything else, but the not being able to make dark scenes problem remains.
>>
>>106724100
turn it into a custom node so people can test it
>>
File: 1741506654436589.png (54 KB, 895x461)
54 KB
54 KB PNG
>>106724160
I'm not going to complain much, these guys are heroes, they've made AI video creation so much more fun by not having to wait half an hour anymore.
>>
>>106724189
im surprised they dont get paid by any video gen model labs to fully fine tune their models to work at 4 steps at almost full quality, given whats possible with just this lora
>>
>>106724183
no YOU do it
>>
>>106724200
I tried but It keeps giving me tensor shape mismatch
>>
>>106724038
this reminds me of pixart except it's 40 minutes per image
>>
I hate Face Detailer in ComfyUI. It should always use the max res that the model was trained at. Instead, it uses values based on confusing sets of inputs which can fluctuate depending on the size of the face and where it's positioned.
I redid it manually using the standard face bbox detector using 1024x1024 inpainting boxes and it works a lot better. Fucking stupid that face detailer doesn't have that as a simple option, "always use 1 megapixel canvas sizes"
>>
>>106724038
>80B
>>
>>106724155
it looks less slopped and the speed is back again, seems to be a great improvement overall
https://www.reddit.com/r/StableDiffusion/comments/1nshjxg/updated_wan22t2v_4step_lora_by_lightx2v/
>>
>>106724242
>t2v
I sleep. I also want 8 step i2v. 4 step is too ass for my tastes
>>
>>106724225
I pretty much do the same thing though I use different detection models
>>
>>106724248
maybe they'll improve the 4 steps version to the point it'll be as viable as their old 8 steps lora
>>
Can't you just run any step amount with the speed loras?
>>
>>106724264
No, that's why the lightx2v guys release 4 and 8 step variants, the 8 step retains more information or something.
>>
File: 1727878773108907.png (104 KB, 1566x585)
104 KB
104 KB PNG
>>106724155
>https://huggingface.co/lightx2v/Wan2.2-Lightning/tree/main/Wan2.2-T2V-A14B-4steps-lora-250928
I'm not sure they're working on ComfyUi, I got this shit
>>
>>106724273
yep there's an issue there
https://huggingface.co/lightx2v/Wan2.2-Lightning/discussions/42
>>
>>106724222
Imagine if Pixart wasn't bought out by Nvidia. We'd have so much soul right now.
>>
>>106724273
Ruh roh, you usually get that when you try to load a lora that was made with a different model, ie 2.1 loras on 2.2
>>
>>106724285
Could be whatever whacky format they were using for training. I think it's an easy fix.
>>
>>106724038
Finetunes and LoRA's will fix it
Guffaw
>>
File: 1734782883309867.png (1.81 MB, 1280x768)
1.81 MB
1.81 MB PNG
>>106724038
There could be something wrong with the implementation there maybe, just saw this output on reddit.
Apparently the prompt was "Draw the main villain Deku struggles with in the My Hero Academia Forest training camp arc"
>>
>>106724273
>>106724282
>>106724285
>>106724288
try the kijai's one, maybe they're compatible
https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22-Lightning
>>
>>106724294
ugh, fine. I'll boot up t2v for the first time in a thousand years.
>>
>>106724292
>six fingers on an 80b model
deos mio
>>
>>106724305
Chatgpt and nano both give me 6-7 fingers every now and then, no model will ever fix this.
>>
>>106724160
>>106724189
>i2v later
I guess in a week or so, sad.
>>
>>106724294
>rank 128 for high
>rank 64 for low
why?
>>
>>106724305
They also trained at 1024x1024, there's only so much you can pack into that many pixels. Why they didn't train a smaller model at 2-4MP is beyond me.
>>
>>106724311
You figure i2v would be first since like 99% of videos I see posted anywhere are i2v, but whatevs I guess
>>
>>106724321
I've seen t2v released first for so many things at this point, I think it's just a simpler model to do stuff with or something.
Most people use i2v, so it makes no sense otherwise.
>>
>>106724311
>>106724321
you can try the t2v version to do a i2v video and see if it improves shit lol
>>
>>106724038
grim
>>
>>106724310
You know what never gives me six fingers? Wan, either t2v video, or when it brings a hand out that was hidden/obscured in i2v, or when I use it to generate images. Never.
>>
does any saas have hunyuan 3.0 yet? its not on lmarena or dreamina. there isnt even a api node yet! i just want to test the model
>>
>>106724340
Maybe has to do with it being trained on video, it sees hands from every angle in long-context form, so it learns better than an image model can with static single images alone
>>
>>106724340
Wan has melting fingers
>>
File: image4.png (1.89 MB, 1024x1024)
1.89 MB
1.89 MB PNG
>>106724038
4th output Hunyuan 3.0. I only changed the prompt to request Masamune Shirow style in another attempt to see if anything would be recoginized. Obviously, it wasn't. It also accidentally a leg.

This one took 46:25 so something is definitely broken and causing successive generations to get slower. The first one at 13 minutes was still terrible but probably more correct.

>>106724292
I have been using only the bare transformers inference example code. Their webui has additional default prompts that might be helping it. I'll try one more with that active before I have to stop for now.
>>
>>106724352
It looks like the first anime LoRA I saw for SDXL when it was released
>>
>>106724340
I got alien hands in i2v, when the woman showed her hands in the video (it started without them).
No idea if it's due to 4 steps lora, q8, or inherent to the model.
>>
>>106724352
>cracks in ponytail
LMAO this model is pure ass
>>
>>106724359
>4 steps lora
Probably. I don't use lightning loras and I've never had weird hands. I have had a spare limb erupt from a body, or a tongue come out of an armpit, weird shit like that. Hand and feet anatomy is solid though.
>>
>>106724352
>46:25
Is there any worse feeling than knowing you could have saved over half an hour by just restarting the environment and genning again?
>>
>>106724362
that's actually amazing
>>
>>106724362
That's obvious very old dried cum.
>>
File: 1757809026988667.png (21 KB, 695x311)
21 KB
21 KB PNG
>>106724155
People will post this non helpful shit and expect help.
>>
File: 80b parameters.png (1.31 MB, 751x965)
1.31 MB
1.31 MB PNG
>>
>>106724393
turn it off and on again
>>
>>106724398
>it learned compression artifacts
>reproduces them unprompted
OH I AM LAFFIN', M8
>>
can i use the lightning lora with wan 5b?
>>
>>106724412
No.
>>
>>106724393
The output is a blur,
A hazy, ghost-like image,
What could be the cause?
>>
File: 1664103489782303.gif (445 KB, 280x280)
445 KB
445 KB GIF
>>106724412
>5b
>>
>>106724294
>kijai
>>
What are your settings for non lightning lora workflows for 2.2?
>>
>>106724423
>the
>>
>>106724398
Did it output at that weird res?
>>
>>106724423
all right all right, we have to call him KJ Boss I get it
>>
File: 74.jpg (714 KB, 832x1216)
714 KB
714 KB JPG
>>106724292
>>106724038
I wonder if the model is trying to approximate things it doesn't understand using its LLM or something. This was "A pencil sketch of Bayonetta". It looks nothing like the character, but looks exactly like what someone who never saw the character but read a description would envision. This is GPT's two-sentence description of the character.
>Bayonetta is a tall, striking woman with long black hair styled into an elegant beehive and sharp bangs framing her face, often wearing glasses that accentuate her confident, alluring look. She’s most recognizable in her skintight black outfit made of her own magical hair, adorned with intricate designs and red accents, giving her a sleek, witch-like yet stylishly modern appearance.
>>
File: lookin good.png (37 KB, 478x455)
37 KB
37 KB PNG
>>106724398
>>
>>106724423
Kijai workflows are just better.
>>
>>106724425
the same as with lightning but with more steps
>>
File: 1758949704393966.png (2.44 MB, 1059x1115)
2.44 MB
2.44 MB PNG
>>
>>106724434
I found this shit here lol
https://xcancel.com/JeepersMedia/status/1972211090961715512#m
>>
>>106724398
Not even the people in the reflection are correct.
>>
>>106724447
you have no idea how much I miss these kinds of ads
>>
>>106724443
mid 90's adventure game background aesthetic
>>
>>106724446
Even the cfg and sampler?
>>
>>106724447
wtf? even the text on the bottom left is perfect
>>
>>106724459
yeah
>>
>>106724447
>benis in bagina
KEK, what the fuck did you use to make this
>>
>>106724459
>1 cfg without lightning lora
kek
>>
>>106724445
maybe, but that's why I don't really like his implementation. You're pretty much stuck with whatever workflow there is for his nodes and his nodes only (and that's on top of comfyui)
>>
File: WanVideo2_2_I2V_00453.webm (355 KB, 1312x768)
355 KB
355 KB WEBM
>>106724471
>1 cfg without lightning lora
>20 steps

If you're curious.
>>
>>106724471
and KEKAROO to you to anon
>>
>>106724476
When the molly hits
>>
>>106724393
SAME HER
>>
>>106724473
His implementation is Wan's official implementation.
>>
>>106724476
>damn it shaggy this ain't weed
>>
>>106724470
It's from /v/ but the anon that made this was hanging out in the thread. He did explain how he made some of them though.
>>>/v/721838671
>>
>>106724476
Damn, didn't know Hercules liked to get wet.
>>
>>106724447
>All right 1995 was a pretty good year in video games
>Really the latter half of the 1990's saw some of the most iconic games coming out
yep, and I'm glad I was old enough to witness this in 4k (or in 360p if we want to keep the context accurate kek)
>>
>>106724486
That still doesn't change what I said tho
>>
>>106724494
>he uses a1111
more proof that cumshartui is good for nothing but mass production of assembly-line slop. nobody inpaints or puts effort into their gens with comfy because it's a pain in the ass. comfy is for non-creative individuals, such as those from the indian subcontinent
>>
>>106724494
>A1111
jesus christ, at least use forge, how is he doing the text though if he's only using noobai?
>>
>>106724476
Smoking the good this
>>
chat, how do I improve the backgrounds on my 1girls?
[spoilers]there are actually 2 girls[/spoiler]
>>
Is everyone in this thread retarded? He's clearly shopping in the text, logos and a bunch of other shit. Christ...
>>
what an embarrassing post
>>
>>106724524
obviously pasting it on top in an image editor and applying a filter. so many tards here really waste time with shit like qwen and flux, 4+ minutes per gen only to get stuff that's 1/4 as good as simple sdxl with a bit of inpainting.
>>
File: swag.jpg (108 KB, 1280x720)
108 KB
108 KB JPG
>>106724532
>spelling mistake
it's over
>>
>>106724537
Yes pretty much 90% of people in this hobby are next level retarded
>>
>>106724537
But... why would he do that?
>>
>>106724537
fluxbrowns lowered the average thread iq by 50 points, got even worse after the 4o tourists.
>>
>>106724543
Someone will call you an 'Indian' or 'third worlder' and thread devolves into political infighting.
>>
>>106724543
memes from 2010s are cancelled anon
>>
>>106724532
maybe nano banana, "make this less shit", then img2img over it
>>
And when I get that feeling
I want sexual healing
Sexual healing, oh baby
Makes me feel so fine
>>
>>106724537
I thought that was clear. The main Lara image is what he genned, the rest is shooped, including the filter over the top. It's still cool nonetheless, but I'd be seriously concerned over anyone who genuinely thought the whole thing was genned.
>>
File: T2V.mp4 (3.96 MB, 720x720)
3.96 MB
3.96 MB MP4
>>106724155
>is it better than the previous version?
definitely better, the slow mo shit is gone, now I wait for their I2V version
>>
>>106724498
kek
>>
>>106724567
My grandmother sent me a video of a bunny the size of a dog. It was clearly AI. She thought it was real
>>
>upscale by 1.5
>image is fine
>upscale by 2
>result is an abomination of limbs and multiple nipples
>upscale by 2 on a different checkpoint
>image is fine
>upscale by 2 on the same checkpoint with a different upscaler
>image is fine
is this related to a checkpoint's maximum output resolution or the upscaler not being compatible with that checkpoint past a certain level of upscaling?

i never changed the denoise level either so i dont know what causes this. im using illustrious/noobai
>>
>>106724593
Some checkpoints react very differently to the denoise level.
I have one that goes all the way up to almost 0.7 with good results at 2x upscale, while another does the same amount at 0.15.
>>
File: 1731121961754876.mp4 (991 KB, 1240x464)
991 KB
991 KB MP4
https://phantom-video.github.io/OmniInsert/
I wish we could insert some shit on some I2V process, that would be much more fun
>>
>>106724612
>kling
kek
>>
>>106724602
i see, so its a case by case basis. this helps a lot, i thought my install was fucked or something, thanks.
>>
>>106724593
Are you using tile controlnet?
>>
>>106724593
>is this related to a checkpoint's maximum output resolution
this one
I tried a lot of upscaling with different noob/illust models, and illust v2 was the one I had the less abomination of limbs with, because it has the biggest output reso, but since the model is shit itself it fucks up the style everytime
And that anon is right too >>106724602, different models will need different denoise level, sometime you are lucky if you find the sweet spot between step/denoise, but most of the time you will get abomination a lot of because they aren't trained for higher resolution
>>
>>106724633
From what I've seen, a surprising amount of anons don't even know how to upscale, they literally try to upscale the entire image at once x2 in img2img.
>>
>>106724633
nope, just hires.fix, nothing else
>>106724638
got it, i'll just add notes with the upscaling info to each checkpoint that i use to avoid eldritch abominations
>>
File: AnimateDiff_00784.webm (3.19 MB, 1872x1088)
3.19 MB
3.19 MB WEBM
>>
>>106724625
Kling got the netflix treatment :(
>>
File: image(2).png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
Last Hunyuan 3.0 output for now. Using the included gradio webui with the default system prompt (instead of none). This took 16:56.

It was another attempt at extracting a recognizable style. I told it to make an Asuka from Evangelion, but as concept art for Serial Experiments Lain.

>>106724440
I think this is a good theory. It does remind me of what you get sometimes if you feed the output of a verbose captioner straight into an image generator.
>>
>>106724656
>nope, just hires.fix, nothing else
That's why. You need to learn how to use Ultimate SD Upscale. It cuts the image up into chunks that the model can manage, usually 1024x1024 chunks. It then upscales each one individually. You use a tile controlnet made for illustrious or whatever model you're using, and that makes it so that you can upscale at 0.6-0.7 denoise and keep the same composition, without sprouting new limbs or any other weird shit.
>>
>>106724667
there is no actual fucking reason for hunyuan 3.0 to take upwards of 40 minutes to gen 1girl slop holy shit. and thats at 1024x1024
>>
Hunyuan 3.0 reminds me a lot of vae-less 1.5 or pre-vpred/noise offset SDXL, when everything was washed out and grey. the outputs have this sort of 2022 look to them
>>
>>106724683
It takes that long because he's offloading an 80B model onto ram.
>>
>>106724667
The quality looks pretty good here, but since it's such a fucking huge model, I'd be interested in some more difficult subjects and compositions than 1 girl.
>>
>>106724675
ill look into it, i've never used controlnet once so i'll have to mess with that first.
>>
Didn't someone post a thing stating Seedream 4 was only 20b? Yet it generates at least 2k images that look way better than Hunyuan, plus their API spits them back faster too. What went wrong?
>>
>>106724693
Yeah I swear this place the creativity of a toddler. Anyway just waiting on some place to run it so I can test this shit myself.
>>
>>106724705
post a single creative gen you've made right now

one.
>>
>>106724701
The only way to get seedream local is to pose as a janitor at bytedance and work your way up the ladder until you have server access. I'll provide you with eye tape and a bronzing agent. The rest is up to you.
>>
>>106724705
makes you look like a superior cool guy to say that, but it's way more helpful to post common baselines that everyone has existing expectations for instead of trying to get too creative.
>>
those new light loras, they ain't so good.
how is it even possible that the 2.1 loras are still better for movement. wack.
>>
File: 1748609078204846.mp4 (3.17 MB, 1920x1080)
3.17 MB
3.17 MB MP4
we got mogged right?
>>
> we
>>
File: 1758940462899554.png (930 KB, 996x666)
930 KB
930 KB PNG
>>106724701
https://xcancel.com/bdsqlsz/status/1966034419183124527#m
>Didn't someone post a thing stating Seedream 4 was only 20b?
I'm seeing 26b here
>>
>>106724728
Every model ever released can do this man. That's like testing a sports car by seeing if it can keep up with a carriage lol. The point of the newer models is prompt adherence if you wanted 1girl looking at screen and I mean in no derogative way, why tf would you use a 80B model?
>>
File: ComfyUI_06588_.png (3.13 MB, 2560x2560)
3.13 MB
3.13 MB PNG
SHE IS SUPPOSED TO BE HAPPY OR SOME SHIT. People might light this though...
>>
>>106724683
It's so slow because it's probably expecting at least eight GPUs instead of one.
>>
>>106724745
it's the reward model (used for training) not the model itself
>>
>>106724737
both are awful desu
>>
File: WanVideo2_1_T2V_00209.mp4 (3.66 MB, 1872x1088)
3.66 MB
3.66 MB MP4
New 4 step LoRA
>>
>>106724710
>no reply
i won
>>
>>106724778
yo wtf that looks smooth
>>
>>106724778
pretty good, if only Alibaba was as focused on QIE compared to Wan we would've gotten Nano Banana at home already
>>
is wan2.5 still in 5 second hell?
>>
>>106724791
10
>>
>>106724799
naisu
>>
>>106724800
10 for Wu but not for You.
>>
>>106724811
not naisu
>>
>>106724811
>b-but, the interview where the Alibaba engineer said that-
ACK
>>
Sage attention 3 is out.
>>
>>106724822
Beta test it for me then, slave.
>>
>>106724124
Ummm... based?
>>
>>106724778
really impressive, imagine telling your old self from a year ago you'd get this quality with only 4 steps he would've laughed at your face
>>
>>106724822
>only relevant for the 5090
*yawn*
>>
>>106723624
me as the red demon
>>
File: 00729-1584687002.png (279 KB, 384x704)
279 KB
279 KB PNG
Very artsy and tasteful, so I think it's okay for blue board.
>>
>>106724822
You cannot do that for now :
>Note: SageAttention3 does not guarantee lossless acceleration for all models. For other video generation models, we recommend selectively using SageAttention2++ in certain layers or timesteps.
>For example:
>Apply SageAttention2++ only at the first and last timesteps,
>Use SageAttention3 for all the others.
>This hybrid approach may achieve lossless acceleration.

Once you can, it'll be worth trying for sure.
>>
I know this is AI but how the fuck did they do it? it's so clean
https://vm.tiktok.com/ZNdG17uvs/
>>
>>106724873
It's not that clean? And I bet those camera flashes are covering some of the more egregious errors.
>>
File: 1732075223182787.mp4 (3.7 MB, 720x720)
3.7 MB
3.7 MB MP4
>>106724822
>Sage attention 3 is out.
>>
>>106724822
nice, now waiting for the node updates to test it
>>
>>106723624
where's the rentry for Wan?
>>
>>106724898
dunno why they deleted it, but here it is
https://rentry.org/wan21kjguide
>>
File: WAN2.2_00148.mp4 (3.45 MB, 720x1280)
3.45 MB
3.45 MB MP4
>>
>>106724938
Street shitting caught on camera.
>>
>>106724938
you don't want to use the film interpolation node to get something less choppy?
>>
File: WAN2.2_00149.mp4 (3.49 MB, 720x1280)
3.49 MB
3.49 MB MP4
>>106724956
That thing smoothens it way too much.. It gives that vaseline dlss effect. So Im not really a fan.
>>
>>106724985
too bad the multiplier is either 1 or 2, it would be a good compromise to go for 1.5 imo
>>
>>106724822
How do I test it? I recall seeing a node that had sage 3 in it. I have a 5090.
>>
>>106724994
>2x interpolate
>24 fps
derp
>>
>>106725010
no, 2x interpolate is 32fps since wan is a 16fps model
>>
File: 1729724755853795.mp4 (3.7 MB, 960x960)
3.7 MB
3.7 MB MP4
>>106724886
going for 960x960 (which has the same number of megapixels as 1280x720) gives really clean results, too bad it's like twice as slow now (took me 4 mn for 720x720 -> 10 mn for 960x960
>>
>>106725028
Who the fuck that in the end there?
>>
What is a good model for anime images with backgrounds that don't look terrible?
>>
they updated the light loras again. (40mins ago)

they work better now.
>>
>>106725038
it's Miku's twin duh!
>>
File: doa.png (89 KB, 1729x398)
89 KB
89 KB PNG
>>106724822
DOA
>>
File: WanVideo2_1_T2V_00211.mp4 (1.63 MB, 1872x1088)
1.63 MB
1.63 MB MP4
>Howdy Partners!
>>
>>106725066
the colors are a bit too fake, are you sure you're going for shift = 5
>>
File: WAN2.2_00153.mp4 (3.48 MB, 544x960)
3.48 MB
3.48 MB MP4
>>106725038
yeah its a weird framerate
>>
File: 1728322195713570.mp4 (3.56 MB, 1280x1080)
3.56 MB
3.56 MB MP4
>>106725044
>no info
thanks for nothing

https://huggingface.co/lightx2v/Wan2.2-Lightning/tree/main/Wan2.2-T2V-A14B-4steps-lora-250928

(vid mega compressed for 4chan)
>>
>>106725076
damn.. meant >>106725018
>>
>>106725074
I think that's just light LoRAs being light loras. colors aside, I think it's a pretty good output for wan t2v
>>
>>106725078
https://huggingface.co/lightx2v/Wan2.2-Lightning/discussions/42#68d90d04340cbd3c0f3f2fa7

minor stuff
>>
>>106725078
they just made the format compatible with comfyUi (since this fucker doesn't want the diffusers format to be compatible on his own shit)
>>
File: 1743100305931510.mp4 (3.41 MB, 720x1080)
3.41 MB
3.41 MB MP4
>>106725018
>completely missed the point
>>
>>106725109
ironic
>>
>>106725109
god this looks so retarded. how do you not abort that gen the second you see it in the preview
>>
>>106725124
>this looks so retarded
he probably likes that, because he's a retard lol
>>
>>106725136
zing!
>>
Lotta talk for nogen poorfags
>>
Why do I have to add myself to a github list in order to install sage attention 3, wtf.
>>
File: file.png (15 KB, 956x139)
15 KB
15 KB PNG
that's it. i'm buying a 6000. an entire minute just for un/loading the models. go fuck yourself
>>
>>106725157
I think buying an nvme is a more sensible decision
>>
>>106725157
>that's it. i'm buying a 6000.
you'll be able to run HunyuanImage 3.0 with it right? kek
>>
>>106725166
hope your 1650 can do the 1.5 atleast
>>
File: 80b parameters.png (1.66 MB, 1280x768)
1.66 MB
1.66 MB PNG
>>
>>106725170
>haha funny randumb and it can place multiple objects, please ignore the 2002 jpeg quality
>>
File: WanVideo2_1_T2V_00212.mp4 (953 KB, 1872x1088)
953 KB
953 KB MP4
idk why it turned into a timelapse.
>>
>>106725174
you weren't gening in 2022 cleaarly
>>
>>106725185
he isnt genning in 2025 either
>>
>>106725185
>>106725191
every accusation is a confession
>>
>>106725191
>>106725185
cope chinkoid, the model is shit
>>
>>106725197
yet I was gening in 2022 and am gening in 2025
>>
>>106725182
>WanVideo2_1_T2V
why not 2.2?
>>
>>106725199
be grateful for what your getting from the chinks. Cumskins havent done anything
>>
>>106725200
no you weren't
>>
>>106725202
>why not 2.2?
heh, don't believe everything you read, kid.
>>
>>106725203
>Cumskins havent done anything
they've done Flux and Kontext, have some respect
>>
>>106725208
you're right, I was too busy fucking your mom to gen
>>
>>106725214
>flux

its ded
>>
>>106725216
>brown skinned joke
https://youtu.be/0GCkhjDO-2s?t=27
>>
>>106725223
so this is how you defeat the chuds, just have a non white do something and they'll avoid it forever
>>
>>106725153
PLEASE, HOW DO I USE IT
>>
>>106725236
You just want an .exe file don't you.
>>
>>106725236
how should we know? it's been released really recently
>>
>>106725247
hehe
>>
File: 1756929760862825.png (259 KB, 1539x1593)
259 KB
259 KB PNG
I wonder if HunyuanImage 3.0 (80b model) will crack the top 10? lol
https://artificialanalysis.ai/text-to-image/arena/leaderboard-text
>>
>>106725236
if you're going to be an early adopter, it helps to not be retarded
>>
was there a local model that could replicate that overall style you got from dalle vivid?
>>
File: 597438753.png (1.47 MB, 1326x747)
1.47 MB
1.47 MB PNG
There will not be a Studio Session today.

Before you get that kleenex box out, I come bearing gifts.

Last week's Nano Banana studio session is live -

https://youtu.be/A4N2WBhpGyk

AND -- This list of Nano Banana capabilities/techniques is more than I could possibly showcase in a single session. Worth taking a look through - There are some AWESOME capabilities!

https://github.com/PicoTrex/Awesome-Nano-Banana-images/blob/main/README_en.md

We'll resume with Studio Sessions next week!!! See y'all then.
>>
>>106725286
>dalle vivid?
I don't know what that is.
>>
>>106725296
>style (defaults to vivid):
>The style of the generated images. Must be one of vivid or natural:
>– vivid causes the model to lean towards generating hyper-real and dramatic images.
>– natural causes the model to produce more natural, less hyper-real looking images.
>This param is only supported for dall-e-3.
my guess is that they used 2 models
>>
>>106725291
https://www.youtube.com/watch?v=qL1e67jm290
>>
Dear god, do not use high strength on the new t2v lora.
>>
>>106725341
well they're specifically made for 2.2 so you'd set the strength to around 1
>>
>>106725341
Yeah motion is better but it blows the fuck out of colours. Needs to be at like .8 or something
>>
>>106725341
yeah, it's not meant for that, the x3 strength trick worked only for the 2.1 loras
>>
>>106725078
can't wait for the I2V lora, they really nailed their new T2V one
>>
File: 1728653096965654.png (95 KB, 1554x1001)
95 KB
95 KB PNG
Testing this, so far so good, vram usage seems down for me.
>>
>>106725420
on what?
>>
>>106725420
>vram usage seems down for me.
how much improvement? give some numbers
>>
>>106725427
comfyui latest commits
>>
>>106725445
Anons here will have to do a before/after, I only noticed the change after pulling and noticing that I could swap less blocks to ram on my videos without OOM.
>>
>>106725420
went from 21.6 to 20.4 gb vram usage, damn
>>
>>106725484
that might just be enough to mitigate the memory that's leaked
>>
Has anyone tried the new lightning lora on i2v?
>>
>>106725420
>>106725484
>the lightvx fags finally managed to make good speed loras for wan 2.2
>a huge improvement on memory usage has been found
today was a good day
https://www.youtube.com/watch?v=h4UqMyldS7Q
>>
>>106725420
>>106725484
ok this is pretty cool
>>
>>106725497
>2.5 weights not released
zzzzzzzzzzzzzzzzzzzzzzz
>>
File: nice job.png (162 KB, 512x363)
162 KB
162 KB PNG
>>106725420
>Testing this, so far so good, vram usage seems down for me.
just tested it, impressive as fuck, I had to offload 1.5gb of memory to the ram to test Wan 2.2 on 720x720, now I don't need to offload anymore, great shit
>>
>>106725420
>More VRAM savings to come.
I like this guy
>>
File: 1740273943845836.png (232 KB, 2280x1292)
232 KB
232 KB PNG
>>106725420
https://github.com/comfyanonymous/ComfyUI/pull/10062/files
>changes 2 lines
>up to 2gb of vram saved
that's black magic dude
>>
morning, tell me what is babbys first video model I should try with 12gb of vram
>>
File: vibevoice error.jpg (174 KB, 1083x489)
174 KB
174 KB JPG
i need help with vibevoice using tts audio suite. i have the small and large models already and put them in the right folders, but its saying i have to install some kind of package. pip install git+https://github.com/microsoft/VibeVoice.git produces an error since the repo isnt a python project. i dont know whats its talking about, it is saying it cant find the model?
>>
>>106725420
just imagine how much vram we could actually save if we focused more on making Comfy's code optimal or some shit, I didn't expect that much improvement desu, I thought it was already optimized enough
>>
File: 80b parameters.png (1.47 MB, 1381x1538)
1.47 MB
1.47 MB PNG
>HunyuanImage 3.0 fuses aesthetic skill with reasoning and world knowledge
>aesthetic
>aesthetic
>aesthetic
oh they dared to say that!
>>
>>106725555
catbox/prompt?
>>
>>106725555
>cyberrealisticPony

anon please. it's time to let go.
>>
>>106725624
>(massive breasts:3.0)
>>
>>106725668
and the rest
>>
>>106725420
Still using SDXL here. What does this new thing give me that I don't already have? Should I switch to Comfy? Or has everyone just forgotten about SDXL users?
If that's the case, I'll stick with Invoke, they keep updating their UI for image generation.
>>
File: 1742985000865580.mp4 (3.94 MB, 720x1264)
3.94 MB
3.94 MB MP4
miku hatsune cooks hot dogs on a grill

nice
>>
>>106725674
>sdxl
irrelevant to you
>>
>>106725674
>Still using SDXL here. What does this new thing give me that I don't already have?
if the model you're running is too big for your gpu, that's a big deal, if you have enough vram for sdxl it's a nothingburger
>>
>>106725624
1girl, solo, sitting, massive breasts, tan lines, blonde hair, bloom, night, ugly, 3DPD, white lingerie bikini, cleavage
>>
>>106725674
These are vram savings, if you're not starved on vram when using SDXL, this won't make a difference
>>
Are there any UI's that aren't abandoned? Something that isn't cumrag UI?
>>
>>106725682
>720x1264
why not going for 1280 instead of 1264?
>>
>>106725652
not until you spoonfeed me sth better that I can use with sd next
>>
>>106725685
>>106725690
>>106725694
I'm not starvinf but I could always use faster image gen. Should I update and switch to ComfyUI? Or does this update hurt output quality?
>>
I have a request if anyone is bored. Baby Football. Animated would be great. Wanimate would be better.
>>
>>106725708
just try it by yourself? ComfyUi is know to be the fastest (and it's even faster with sageattention 2++)
>>
>>106725701
>sd next
Anon I...
>>
>>106725693
catbox?
>>
>>106725714
I want to know if I press update.bat will I get faster SDXL gens, or do I need to add a node in between or something?
>>
>>106725726
it won't be faster, it's just a memory optimization
>>
>>106725708
>Or does this update hurt output quality
Read the commit, it has nothing to do with sdxl generation like >>106725685 said.
>>
>>106725708
>use faster image gen
The potential speedup with these vram savings primarily come from having to offload less from vram to system ram during generation, if you're not offloading when using SDXL this will have no effect unless it allows you to use higher resolution / batch size
>>
>>106725726
@grokk, zap this anon with lightning
>>
>>106725726
Oh I see.
Have a goo day.
>>
>>106725685
>irrelevant to you
so it's only an optimization for flux and wan is that it? >>106725548
>>
Thanks to everyone who helped. I'll keep doing what I've been doing until ComfyAnon remembers SDXL users again.
Good Sunday anons.
>>
Old forge is still superior to all and gets the job done.
>>
>To use SageAttention 2.2.0 (containing SageAttention2++), please compile from source
i am filtered
>>
Does this speedup work with API nodes?
>>
>>106725584
i see on huggingface lots of people having similar problems with no answers
>>
https://huggingface.co/lightx2v/Wan2.2-Lightning/discussions/42#68d92370efc669a86a5fe3ca
>Our quantitative evaluation on an in-house test set shows that the motion dynamics scores for the base model (wan2.2-T2V-A14B), Lora-250928, and Lora-v1.1 are 10.66, 7.76, and 5.27, respectively.
>We are currently working on an improved version.
they're so fucking based desu
>>
On a local comfyui install, how I can do texture tiling? All the guides online seem to be for cloud implementations
>>
>>106725764
what os? gpu? python version? cuda version?
>>
>>106725774
there's a lot of tutorials on how to install sageattention, don't be that lazy anon
>>
Have used SDXL for a really long to time, almost always below 8 steps. Gave Flux a whirl again, but it's way too slow in comparison. What optimizations are best in slot? or would you recommend some other arch?

>inb4 that guy
>>
>>106725817
its not sage attention its vibevoice. there is something wrong with the comfyui nodes that doesnt set it up properly for most people. maybe the set up only works on linux or something
>>
>>106725584
>>106725836

cmd.exe in ComfyUI_windows_portable\ComfyUI\custom_nodes\tts_audio_suite

..\..\..\python_embeded\python.exe install.py

cmd.exe in ComfyUI_windows_portable\

python_embeded\python.exe -m pip uninstall vibevoice
python_embeded\python.exe -m pip install vibevoice

python_embeded\python.exe -m pip install --upgrade transformers
>>
>>106725836
oh my b, it was meant for >>106725764
>>
>>106724038
>>106724292
Recognizes some popular characters, but no indivudal style tags unfortunately, similar to their slopped Hunyuan Image 2.1. A shame in terms of output capacity this is basically and 80B version of that and it's not special otherwise.
>>
https://github.com/lodestone-rock/RamTorch
How do I turn this into a comfy node?
>>
>>106725860
no one seem to care about his repo, there's no issues made so far, that's probably a huge nothingburger, like everything he's done so far (to be fair I like his willigness to make chroma VAE-less, that's the future of edit models that's for sure)
>>
>update to and try the new 2.2 t2v loras for my i2v
>all nsfw stuff now starts lactating
https://files.catbox.moe/1c31ya.mp4
Huh, I'll take it.
>>
>>106725899
>update to and try the new 2.2 t2v loras for my i2v
I tried to do that and I got some weird ghosting and duplications, it wasn't made for this at all lol
>>
>>106725899
it's not working well at all with i2v
better wait for their i2v version
>>
>>106725904
>>106725925
It seems to do really well for me, I see a quality improvement. But then again I am doing small motions aiming for seamless loops.
>>
>>106725843
thanks anon uninstalling and reinstalling worked
>>
>>106725939
No problem.
>>
File: bejita kneel HD vegeta.png (372 KB, 575x855)
372 KB
372 KB PNG
>>106724494
>one singular /v/ faggot made a better image than anything i've ever seen shat out of cumfartui users with flux/SaaS in these threads and /sdg/

i.. i kneel.. and will probably download neoforge this morning honestly.
>>
>>106725420
When will this land in AniStudio(tm)?
>>
>https://github.com/woct0rdho/SageAttention/releases

How the fuck do I install this shit given all those versions of pytorch are deprecated and not even available anymore. Pytorch is up to 2.9. and 2.10.
>>
What are your opinions on qwen image edit 2509?
I want it to swap clothes but I can't make it work.
>>
>>106726014
compile it for your env
>>
>>106726022
>Use the girl from Image 1 as the primary subject, preserving her exact body, pose, facial features, hair, background, and the original lighting conditions of Image 1. Replace only her clothing with the outfit shown in Image 2, ensuring the new outfit matches the style, color, and details of the clothing in Image 2. Do not alter the girl's physical appearance, body shape, or any other elements from Image 1 except for the clothing. Ensure the new outfit is seamlessly integrated onto her body, maintaining proper fit and proportions as if she were naturally wearing the outfit from Image 2, while keeping the lighting, shadows, tone and highlights consistent with Image 1.
>>
>>106726014
>Pytorch is up to 2.9. and 2.10
bro everyone runs 2.8 or 2.7.x what are you smoking?
>>
>>106726022
>What are your opinions on qwen image edit 2509?
still plastic, still zooms in images, and it doesn't know how to make styles proprely like the old version, big meh
>>
>>106726053
Skill issue?
>>
>>106725818
I tried a q5 quant of flux schnell on my 12 gb 3060 the other day... and it took 5 s/it for 1024 res
can't use any of the inference tricks on this card either, lmao
>>
File: pytorch.png (25 KB, 888x141)
25 KB
25 KB PNG
>>106726040
I tried to install 2.7.1 or 2.8.0, it didn't work. It used 2.10.0 instead. Or maybe I'm from the future, fucked if I know.
>>
>install new thing for new model
>breaks my other work flows

so do you guys use a separate embedded python for each thing?
>>
>>106726072
>it didn't work
Bro how. Just force reinstall requirements.
>>
File: 1743306580097470.png (522 KB, 640x960)
522 KB
522 KB PNG
https://www.reddit.com/r/StableDiffusion/comments/1nsowjd/i_trained_my_first_qwen_lora_and_im_very/
>makes a lora that finally unslops Qwen Image
>doesn't provide the lora
reeeeeeee
>>
>>106726058
if it takes even more practice to get good results with a new model, that's not a good thing
>>
>>106726022
Tried on anime girl and it didn't recognize the yellow shirt, left the sleeves and neck part.
Once managed to remove the shirt but left wrinkles from the shirt and added them on the skin.
Couldn't remove kitchen cabinet. All it had to do was to paint that area grayish white, but apparently too much to ask. Not that it wouldn't take 15 seconds on photoshop to fix
>>
>>106726088
What model do I use then, nano banana is censored crap
>>
>>106726086
thats his girl. why would he provide his lora lol
>>
>>106726099
maybe that's just a colleague
>>
>>106726106
oh lets not go there. There was one of those here back in the day
>>
>>106725755
I've had it not work on some newer models for some reason, I don't know what changed, hadn't messed around with SDXL for a while, ended up having to get classic.
>>
Bros I wanna try out hunyuan 3 but 320 gigs is nuts, I have 10tb of ssd space and I'm always running out.
>>
>>106726177
It's absolutely not worth it
>>
>>106726177
if it looked awesome I would see the point, but it looks like shit desu >>106725170
>>106724398
>>
>>106726177
Just gen with qwen image then add jpeg artifacts
>>
>>106726196
kek
>>
>>106726177
>I have 10tb of ssd space
>I'm always running out
what in the name of how?

>>106726196
lol'd
>>
>>106726210
I dabble with LLMs too and those are freaking huge and theres so many finetunes. I think I have like 3+ tbs in llm stuff, 1tb of imagegen stuff and 2/3 tb of video gen stuff. Other space is various apps and stuff like that.

I guess I should start archiving old stuff or just deleting them, I'm a hoarder.
>>
>>106726271
i just wouldn't expect you to put all that on SSD's kek. i archive my old shit on HDD's and make copies of my daily drivers but i'm nowhere near 10tb of A.I models including gens.
though LLM's are fucking huge it's ridiculous no wonder /lmg/ is suicidal, that space hasn't moved in years.
>>
File: jillgle.webm (3.87 MB, 1128x920)
3.87 MB
3.87 MB WEBM
Postin (You) bait from another thread.
>>
File: 77308212.mp4 (3.49 MB, 1536x864)
3.49 MB
3.49 MB MP4
>>
>>106725260
no qwen?
>>
File: FLX_0004.png (1.63 MB, 896x1152)
1.63 MB
1.63 MB PNG
>>
>>106726325
qwen is 22th
>>
>>106726293
gonna bait (Me) to this later
>>
>>106725652
What do you recommend instead (i'm on comfyUI)?
>>
>>106726358
if you unironically enjoyed cyberrealistic, then upgrade to its illustrious version.
>>
>>106726368
I'm after uncensored realistic stuff, not necessarily porn. I used SDXL bigLust for a while
>>
>>106726411
Chroma
>>
>>106726411
well have fun searching civitai for like 100 hours worth of your month, its really all about personal preference at this point. no one has it down perfectly.
which is why you see dorks in this thread debating something as mundane as noise levels
>>
It's always fascinating to see the difference in intelligence between text AI people and image AI people. Most image people melt down when they can't run a model and understand absolutely fucking nothing. Hunyuan's outputs don't look spectacular, but the fact that we finally have access to an autoregressive model is cool as hell. It isn't likely to replace anything except maybe editing models, but it will be fun to play with. I'm hoping we'll see some efficiency gains or a smaller version in the future that puts this in range of 3090 vramlets.
>>
File: FLX_0011.png (2.08 MB, 1080x1344)
2.08 MB
2.08 MB PNG
>>106726332
>>
>>106726440
>It isn't likely to replace anything except maybe editing models
Now with that output quality
>>
File: ComfyUI_temp_phaap_00001_.png (3.21 MB, 1152x1728)
3.21 MB
3.21 MB PNG
>>
>>106726448
Can it generate non-asian people?
>>
File: FLX_0013.png (2.13 MB, 1080x1344)
2.13 MB
2.13 MB PNG
>>106726448
>>
>>106726432
it doesn't take 100 hours to train a (text encoder +) LoRA. why do people insist on waiting for someone else to do it when they could have already had whatever they wanted?
>>
>>106726450
Unless I'm mistaken we haven't seen how well it handles outputs. QIE isn't exactly a high bar to clear.
>>
>>106726484
Because it's really just that easy innit bruv?
>>
File: 3329169236.png (1.25 MB, 896x1152)
1.25 MB
1.25 MB PNG
>>
File: FLX_0014.png (2.2 MB, 1080x1344)
2.2 MB
2.2 MB PNG
>>106726482
>>106726483
yes
>>
>>106726503
alright BUB you got my attention what checkpoint is this?
>>
>>106726496
buddy, you'll never know until you try
>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe
>>
>>106726109
>posting his colleagues images here
that's just retarded
>>
File: FLX_0015.png (2.26 MB, 1080x1344)
2.26 MB
2.26 MB PNG
>>106726508
Flux with realism lora
>>
>>106726484
sure let me boot up my 192GB VRAM computer I have lying around
>>
okay bros i got wan image to video working. i got voice gen working. now how does lip syncing work? can i take a video of a girl dancing and make her talk with the audio or do you have to take a still image and make her talk? local music gen would also complete what i need to do
>>
File: VirginApi.png (469 KB, 1892x1038)
469 KB
469 KB PNG
>>
>>106726411
this sdxl checkpoint is good
>>
[spoilers]chroma[/spoilers]
>>
>>106726698
[spoiler]spoiler[/spoiler]
>>
>>106726733
[secret]wan 2.5 soon[/secret]
>>
File: 1744201956546689.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
qwen edit understands openpose models natively. use openpose from AIO aux preprocessor node to get a pose/skeleton, then use it as the input for image2.

the anime girl in image1 is using the pose of image2. keep her anime appearance the same.

guess the pose source.
>>
[cope]B-bigma status?[/cope]
>>
>>106726773
what about nsfw open poses?
>>
>>106726781
why not? with a nsfw or clothes remover lora it would still work
>>
>>106726691
catbox?
>>
>>106726773
Wow, it even included Mi Maimen and Dr. Pavleheer (the crew of the Wreckage Brother).
>>
File: 1753790057878515.png (61 KB, 910x512)
61 KB
61 KB PNG
>>106726805
this was the input for image2, kek
>>
>>106726698
>>106726733
>>106726762
[wrong board]wrong board[/wrong board]
>>
baker doko?

>>106726670
You want WAN 2.2 S2V, looking at the comfyui template it seems similar to i2v as you provide a reference image and audio track along with a prompt. I haven't fucked with it yet so I can't say more than that.
>>
>>106726691
>TgirlsAndMales
no thanks
>>
File: 1757216283311364.png (1.15 MB, 1296x886)
1.15 MB
1.15 MB PNG
depth seems to work too.

change the pose of the anime girl in image1 using the depth map of image2. she is looking at the camera and the time of day is night.

notice the hair strands stayed intact:
>>
File: 1757175418417592.png (832 KB, 1024x1024)
832 KB
832 KB PNG
>>106726876
this time with a random bocchi face:
>>
Bakermen
>>
File: 1749850691585385.png (1.05 MB, 1024x1024)
1.05 MB
1.05 MB PNG
>>106726906
and again with the openpose, for you
>>
Bakerstreet
>>
Bakestreet boys
>>
*also set the resolution to 1024 for AIO aux preprocessor (to get openpose/depth/etc), works much better.
>>
>>106726990
>sets your resolution to 1024x768
heh nothin' personnel kid
>>
>>106726995
well qwen edit workflow scales it around 1024 at first so it should be better, output is more than that depending on size.
>>
File: FLX_0026.png (2.09 MB, 1080x1344)
2.09 MB
2.09 MB PNG
>>106726564
>>
>>106727009
yikes
>>
>>106727009
I THINK I'M TURNING JAPANESE
>>
>>106727009
>these monster hands
scary
>>
File: 1741622229969161.png (1.01 MB, 1024x1024)
1.01 MB
1.01 MB PNG



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.