[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: collage.jpg (1.64 MB, 3814x1984)
1.64 MB
1.64 MB JPG
Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107145378

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Neta Yume (Lumina 2)
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd
https://gumgum10.github.io/gumgum.github.io/
https://neta-lumina-style.tz03.xyz/
https://huggingface.co/neta-art/Neta-Lumina

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
File: 1751284365805507.png (1.73 MB, 1152x896)
1.73 MB
1.73 MB PNG
>>
From "Localsong" + a lora:

https://voca.ro/1cbIetpoY6Gv

I am telling ya, this shit has potential
>>
File: no lora vs lora.jpg (1.67 MB, 1968x2528)
1.67 MB
1.67 MB JPG
>>
>>107154861
based
>>
File: SDXL_00012_.jpg (469 KB, 1984x2480)
469 KB
469 KB JPG
blessed bred
>>
>>107154883
that's no language I've ever heard, sounds like gibberish
>>
File: 1762099606807161.png (2.4 MB, 1152x896)
2.4 MB
2.4 MB PNG
>>107154885
thanks. any OCR or VLM anons want to see if their model can read these?
>>
File: ComfyUI_temp_qmfoy_00048_.jpg (1.14 MB, 1824x1248)
1.14 MB
1.14 MB JPG
https://files.catbox.moe/9egs1f.png
>>
File: Bloodborne..jpg (679 KB, 1366x768)
679 KB
679 KB JPG
What program/model do I use to gen cool landscape images?
>>
>>107154891
Who cares when the melody sounds cool
Modern music is garbage precisely because artists try to give emphasis to the lyrics way too much
>>
File: ComfyUI_temp_qmfoy_00003_.jpg (823 KB, 1824x1248)
823 KB
823 KB JPG
https://files.catbox.moe/cqg2n9.png
>>
For those who missed it:
https://github.com/Lakonik/ComfyUI-piFlow
https://huggingface.co/spaces/Lakonik/pi-Qwen
https://huggingface.co/Lakonik/pi-Qwen-Image
https://huggingface.co/Lakonik/pi-FLUX.1
>>107154174
>Ok this thing is kind of insane. I made a workflow to compare it with normal Qwen, and it's basically the same level of quality while taking less than 10% of the time. Works out of the box with loras also. In fact, with a custom lora on a mediocre quality dataset, the results are arguably better with this thing at 4 steps. It is partially counteracting the shitty quality of my dataset. Absolutely the new meta for using Qwen, it will be impossible to go back with how fast it is.
>>
File: SDXL_00013_.jpg (501 KB, 1984x2480)
501 KB
501 KB JPG
>>107154886
>>
File: ComfyUI_temp_qmfoy_00043_.png (3.63 MB, 1824x1248)
3.63 MB
3.63 MB PNG
>>107154908
You can try regional prompting, so that in one region of the image it'll follow this prompt, then in another this other prompt. You can also try inpainting

----
https://files.catbox.moe/2zhb62.png
>>
File: ComfyUI_00018_.png (935 KB, 832x1216)
935 KB
935 KB PNG
>>107154918
>20s qwen gen
not bad, i would still give a little denoise with something to tidy it up.

if you gen with qwen then do wan denoise, where do you even post that on civit?
>>
>>107154937
>>107154904
how long do your WAN gens take anon?
>>
>>107154937
I just wanna go:

>landscape, big castle, atmospheric, dark clouds, lightning, mountains

What does that?
>>
File: ComfyUI_temp_qmfoy_00028_.png (3.28 MB, 1824x1248)
3.28 MB
3.28 MB PNG
https://files.catbox.moe/e3dk4s.png
>>
File: SDXL_00017_.jpg (460 KB, 1984x2480)
460 KB
460 KB JPG
>>107154920
>>
File: ComfyUI_00033_.png (1.2 MB, 832x1216)
1.2 MB
1.2 MB PNG
>>107154918
>6s flux gen with 4steps
>>
>>107154943
Takes about 3 mins per generation. The workflow I use has an upscaler that basically generates the image twice

>>107154944
Hmm, I see. Any image generator can do that. I thought you were going for a specific compostion, etc

https://files.catbox.moe/tcgxrp.png
>>
>>107154883
alright, i'm gonna give this a try with some instrumental tracks and see what happens. this was convincing, lyrics aside (which i know the page said it wasn't trained on lyrics)
>>
>>107154972
you got the patience for that? asking coz I dont. I can get a 540p WAN video at least twice. I know your gens are super good. its just too long I fee.
>>
>ram prices skyrocketing
>rumors of 5000 series supers being delayed

bros... I'm about to give in. I'm tired of waiting. Should I buy a used 3090 or 5070ti? they are about the same price
>>
File: ComfyUI_temp_qmfoy_00023_.png (3.74 MB, 1248x1824)
3.74 MB
3.74 MB PNG
https://files.catbox.moe/sp4jkj.png

>>107154981
Yeah, I actually set up a bunch of them in a row then I got eat a snack or something, lol. Thanks for the compliment btw.
Also: You can cut generation time by half by skipping the upscaler/upres part of the workflow
>>
>>107154997
Ill give it a shot. I havnet your level of realism till now.
>>
File: wbkfmb.png (3.38 MB, 1824x1248)
3.38 MB
3.38 MB PNG
>>107155004
WAN is perfect to recreate the "modern digital" photography style, that you see with most photojournalism and some photographers
Also, it has pretty much perfect anatomical precision, but adding loras (i.e porn loras) decrease this precision

https://files.catbox.moe/wbkfmb.png
>>
>>107155038
oh yeah the military ones look damn good.
>>
File: ComfyUI_00095_.png (1.58 MB, 1216x832)
1.58 MB
1.58 MB PNG
https://files.catbox.moe/rc3h45.png
>>
>>107155038
can you do images like this but with bikini thighhighs girls?
>>
File: ComfyUI_00009_.png (1.77 MB, 1216x832)
1.77 MB
1.77 MB PNG
>>107155050
I can, but I don't wanna get the banhammer. Also, I don't have access to the 5090 I use to generate the imgs rn.
I'll post some NSFW next post. I'll just post the catbox link, i won't up the img on the thread
https://files.catbox.moe/3jpm5w.png
>>
>>107155050
+1
>>
File: ComfyUI_00024_.webm (958 KB, 480x768)
958 KB
958 KB WEBM
>>107154920
nta making the other wan gens
>>
>>107155050
>>107155072
I don't have access to the 5090 I use to generate images rn, sorry. The porn images I've are mostly artsy-fartsy ones
>>
>>107155050
>>107155069
This gen is a rare one made in the "digital photojournalistic" style I've on hands rn

https://files.catbox.moe/lei0s5.png
>>
>>107155050
>>107155069
An example of my typical "artsy fartsy" gens.lmk if you guys want more

https://files.catbox.moe/y93k43.png
>>
File: 3641827738.png (589 KB, 1216x832)
589 KB
589 KB PNG
>>
>>107155166
can you generate feminist protesting free nipples or something feminist but are actually hot babes with big tiddies in underwear and wearing thighhighs?
>>
>>107155178
Yess ofc definitely!
>>
man, all these dit models kinda suck. was raping ram really worth having nlp? everything was just fine if not better when we used controlnets and ipadapter. edit models were a mistake
>>
File: 1752442757507484.jpg (804 KB, 2048x2048)
804 KB
804 KB JPG
>>
>>107155069
Nice Redditor Gold there, kind stranger!
>>
>>107155178
i too would like more
>>
File: tmp7vfgm7y6.mp4 (1.87 MB, 832x576)
1.87 MB
1.87 MB MP4
>>
File: lora_00033_.jpg (367 KB, 1336x912)
367 KB
367 KB JPG
>>107155166
>>107155178
These are great
>>
File: WAN2.2_00472.mp4 (3.95 MB, 872x592)
3.95 MB
3.95 MB MP4
>>107155048
>>
>>107155234
What track is this?
>>
>>107155240
le circuit de wan
>>
>>107155240
this is going to be the first playable "world simulator" game. just an infinite race track. probably releasable by someone like deepmind right now
>>
>>107155217
>>107155190
https://files.catbox.moe/32hb6v.png
>>107155188
can't, sorry. this machine can't gen imgs
>>
>>107155225
Thanks a lot, fren!

>>107155234
>>107155222
Awesome gens, fren! Loved how the lead car went to the F-Zero shield recharge strip at the end there, lmao

>>107155240
Reminds me of the start/finish line from Imola, but it's not any particular track
>>
File: ComfyUI_00442_.jpg (30 KB, 1216x262)
30 KB
30 KB JPG
Fencing duel gens, complete pic(s) in the catbox
https://files.catbox.moe/10dpcm.png
https://files.catbox.moe/9g7xb8.png
>>
TW: suifuel (contains happy couple)
https://files.catbox.moe/ngt115.png
>>
last one for now, gtg work. another duel, this time to the death
https://files.catbox.moe/y7jlxy.png
>>
Blessed thread of frenship
>>
>>107155204
recipe for this bread?
>>
>>107154958
Does it work with Chroma since it supports Flux?
>>
>>107155437
try it and find out
>>
File: 1742420329343968.png (287 KB, 635x563)
287 KB
287 KB PNG
>>107154896
>>
Sega Genesis Sonic-style track on "LocalSong":

https://voca.ro/13U9LKll5na4

Things got a bit bad in the end, but overall pretty good
>>
File: ComfyUI_00060_.png (1014 KB, 832x1216)
1014 KB
1014 KB PNG
>>107155437
>60s with (30s -> face detailer), 12steps using 8step lora. no dice on chroma, it has hardcoded qwen and flux in the loader
>>
Need a wan lora from the Tylers poop festival video
>>
>happily gen some cute anime 1girls at the start of the year
>look away from the screen for a moment
>Huge fucking pile of optimizations happen
I feel like unless you're keeping up with this daily, you're just hopelessly left behind because its impossible to find information on whatever sage attention or these other -attention fixes are, how to use it, or what they're for because it gets buried under a sea of new or conflicting information.
>>
>>107155866
that would be the case if anyone used said optimizations. unless it's merged into mainline comfyui, most of the good optimizations (both for speed and quality) just get ignored/forgotten.
>>
>>107154100
>>107154342
Nope, doesn't build with downgraded toolkit:(
Yaps about nvvc not existing after idling for half an hour. I guess the other anon who warned about incompatibility was right.
Gonna wait TM for official support or make separate docker for it later.
>>
File: 1552572011.png (1.07 MB, 1152x896)
1.07 MB
1.07 MB PNG
>>
What do you want the most for a local model?

https://poal.me/7udx6s
https://poal.me/7udx6s
https://poal.me/7udx6s
https://poal.me/7udx6s
>>
>>107156022
anyone voting anything than video is retarded, images are already mostly there, the biggest thing we need is edit model without vae, video has a long way to go in comparison
>>
>>107156045
>anyone voting anything than video is retarded
*or vramlet
>>
>>107156045
yep this was my take too
>>
Retards rise up
>>
>>107156045
Video models are less suitable for prompt alignment for a single frame
>>
>>107156045
I'm excited for video because I know video brings audio in with it immediately as well. Immediately ASMR and braps and sound effects and short dialogue sentences and memes and swears and so much more are solved before we even get a text-to-audio model that's good
>>
You know deep in your hearts that you will not be able to run Sora 2 grade stuff without 48gb vram and waiting 10+ minutes per video even with distillation and quants
>>
>>107154918
>ctrl f "edit"
>zero results
does it work for qwen-e
>>
>>107156130
correct, we will have something much better than dogshit sora lol
>>
>>107155799
Lame ty. Glanced at the code and it seems like there's a few places that would need adapting
>>
>>107156157
I am an openai hater as well, but come on anon, let's not cope that way
>>
File: sora 2.png (293 KB, 549x617)
293 KB
293 KB PNG
>>107156170
toy model for memes whose only great thing is the fact that they trained on the entire youtube dataset, without that its literally worse than wan 2.2
>>
>>107155614
well it got the genesis instruments right for sure
>>
>>107154918
Loaded this up and I'm getting 20 second Qwen gens even with my shitty setup, what sorcery is this
>>
>>107156269
vram?
>>
What is the current meta lora for speeding up wan 2.2 14b i2v?
>>
>>107156269
16GB, RX 9070 XT.
>>
>>107156194
It's still superior to any open video model in existence by a country mile, and that will remain true for a long time. To this day, there isn't a single local model that can pull some of the stuff that dalle3 could in 2023
If you cherrypick things, Wan does mangled outputs just as often
>>
>>107154918
does it work with gguf?
>>
>>107156310
>If you cherrypick things, Wan does mangled outputs just as often
not by a mile
sadly for you, the apicuck model cant be tested 1:1 with local because its locked into a chastity cage, like all who shill for it
>>
>>107156335
>sadly for you, the apicuck model cant be tested 1:1 with local because its locked into a chastity cage, like all who shill for it
You do realize there are other possible prompts other than porn and politically incorrect stuff, right? So yes, they can be compared
>>
>>107156282
Let me be more clear.
Apperantly I am still using this from 3 months ago:
https://huggingface.co/lightx2v/Wan2.2-Lightning/blob/main/Wan2.2-I2V-A14B-4steps-lora-rank64-Seko-V1/high_noise_model.safetensors
Is this:
https://huggingface.co/lightx2v/Wan2.2-Distill-Loras/tree/main
Or anything else better than it?
>>
>>107154918
>uses own ksampler
>uses own model loader
INTO THE TRASH IT GOES
>>
>>107156393
NTA compared =/= 1:1
>>
>>107156393
Wow. I didn't know that. You're telling me now for the first time
>>
>>107156485
You're welcome anon. It's enlightening indeed to know there are more prompts other that "1girl big bobs and vagene", who would have guessed!
>>
>>107156458
There also seems to be a moe distill lora...
>>
>>107156509
damn, gotta step my game up, i mean imagine a 1girl with smal bobs... it got my creative juices flowing
(and unretarding for a minute: curiosity in how to setup those matrix comparison graphs people post every now and then, since those can be programmed, i think?)
>>
>>107156523
There also seems to be v1030 that got deleted
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/LoRAs/Wan22_Lightx2v/Wan_2_2_I2V_A14B_HIGH_lightx2v_4step_lora_v1030_rank_64_bf16.safetensors
I don't expect a wall of text spoonfeeding me strengths and weaknesses of all but just what are anons here using in their daily gens?
>>
>>107156157
we still don't have DALL-E 3 at home, stop coping
>>
what's a good free software for managing gens? preferably one that shows the metadata like prompts. I'm getting to have too many. bonus points if it does wan too, though idk if it actually has metadata yet. I only just started with that
>>
File: 1747494964850870.png (489 KB, 836x284)
489 KB
489 KB PNG
>>107156736
correct yet again, we have something much better than dalle 3, the possibility to train a lora on anything you want and generate with any parameters you want with no limits, including training a dalle 3 style lora itself like picrel
>>
>>107155204
It upsets me that I can't reproduce this solid vectorized style.
>>
nano banana 2 is too good
its over for local
>>
>>107156761
lora https://civitai.com/models/2093591
>>
File: 1754450011443883.png (1.29 MB, 768x1344)
1.29 MB
1.29 MB PNG
>>
>>107156772
The better proprietarycuck edit models are, the better outputs the new qwen image edit model can be easily trained on, thanks for spending millions for local to snatch it all up for free before training a clothes remover lora within a couple hours lol
>>
>>107156804
based
>>
>>107156761
it's not about the style, or any specific thing object/concept, retard
that you thought it was tells me all I have to know about your intellectual level, you don't understand what dall-e 3 has that local still has not and you never will understand because you're a moron
>>
>>107156845
>no argument
oof, thanks for conceeding
>>
File: ComfyUI_11606.png (3.02 MB, 1280x1600)
3.02 MB
3.02 MB PNG
>>107156462
This. I can't fucking use this in my workflow. I needs my snake oil!
>>
File: 1761864588001962.jpg (476 KB, 1264x1656)
476 KB
476 KB JPG
>>107154826
>not collaging the real braphog
>>
File: nano banana 2 map.png (1.84 MB, 1408x768)
1.84 MB
1.84 MB PNG
>>107156772
It still can't do maps. (Courtesy of some plebbitor.)
But yes the whiteboard math equation stuff is impressive.
>>
>>107156335
>not by a mile
No local model can gen multiscene videos WITH audio at the same time, so yes, nothing local comes close to it currently

The closest thing to it is this Wan fine-tune for multiscene, which has no audio:

https://holo-cine.github.io/


(and I haven't seen any anon use this)

Apparently they will release the weights for an audio component later though, so we'll see (there is a HoloCine-audio in the roadmap as well as an I2V version)
>>
>>107156920
no proprietary model is gonna allow you lora creation for whatever you want nor to tweak every gen parameter, that is the thing that actually matters, everything else can already either be done locally or can be done locally but with more manual work worst case scenario, but proprietarycucks literally CANT do these things and wont ever be able to in any way.
>>
File: 1756195819814295.png (1.21 MB, 896x1152)
1.21 MB
1.21 MB PNG
>a- aunt jemima... is that OK to wear in public?
>>
>>107156854
keep on coping, copeboy
>>
>>107157028
>no argument
already accepted your concession lil bro, keep crashing out
>>
>>107156982
Very nice anon
>>
>>107157052
you do whatever it takes to keep the cope alive
is this you?>>107156940
>everything else can already either be done locally or can be done locally but with more manual work worst case scenario
lol, lmao even
>>
>>107157076
>no argument
this has to be a bot, right? lol
>>
Most important things for new pc if I wanna do decent video gens in a non absurd timeframe?
I don’t wanna reply to ever webm in here asking for pc specs but if someone wants to post some with their specs/how long it took I’d greatly appreciate it
Budget is about 2.5k for new pc
>>
>>107157098
16gb vram is the single most important thing. more than that is better. less than that you're fucked.
>>
>>107157092
of course, anyone who laughs at your lack of intelligence is a bot
the argument is that you're a retard, you give more weight to what can be done locally just to poop on the things local can't do yet, that's moron behavior
>can be done locally but with more manual work worst case scenario
ANYTHING can be done locally but with more manual work, just grab a camera, hire actors, make a set, film it, pay jeets to VFX it and there you have it, no Sora 2 needed
it's an useless statement, you absolute shit for brains baboon
the whole point of AI is to have less manual work, if Sora 2 can do it without the manual work then it is (even if just for now) better
>>
>>107157098
nvidia gpu is the only thing that really matters. 16gb vram+. 24vram is practically required if you want top quality video gens. minimum 64gb ddr5 ram for offloading model cache if needed. cpu isnt important but you'll want something made within the past 10 years at least.
>>
Question to the anons using Wan2.2 text-to-video (not I2V), which lora are you using?
>>
>>107155166
crazy workflow, nice
>>107155364
im so lonely bwos
>>
>>107157280
There was this released two days ago if you're talking about lightx2v
https://huggingface.co/lightx2v/Wan2.2-Lightning/tree/main/Wan2.2-T2V-A14B-4steps-lora-rank64-Seko-V2.0
>>
File: 1757125369853283.png (592 KB, 1572x773)
592 KB
592 KB PNG
>>107157141
>be proprietarycuck

>you cant train a lora to add a style to the model
>you cant train a lora to add a character or a person to the model
>you cant train a lora to add a concept to the model
>you cant train a lora for anything at all
>you cant finetune the model
>no big company can finetune the model like many companies are doing right now with wan
>you cant have anyone research around the model at all to improve its architecture, find optimization avenues, fix issues, change specific layers, text encoders, vaes, learn how to make better models in the future and advancing the entire ai industry itself etc
>you cant generate gore
>you cant generate pornographic material
>you cant generate anything else someone else would deem "problematic", no matter how mundane it might be
>you cant generate anything they at any point in time say you cant generate in the future when they change their mind overnight
>you cant generate anything at all if their servers are overloaded, not online, or broken
>you cant generate anything without it being logged and all your data harvested and sold
>you cant control dozens of generation parameters that would allow you to have precise control over what you generate, no matter how specific
>you cant write nor test out new generation parameters like new specialized samplers and schedulers
>you cant do anything about it if they decide to lobotomize the model you are using or remove it completely overnight, never being able to truly recreate what you once did and liked
>you cant test out new papers coming out with new technologies like completely changing how an entire portion of inference works, like completely changing how cfg works, completely changing how negative prompting works (https://github.com/hako-mikan/sd-webui-negpip) etc etc
As a proprietarycuck you are paying to be in a limited and spied on cuck cage and you lash out when someone calls out your evil corpo master and your pathetic cuck predicament.
>>
>>107156982
how the fuck are you guys, like pancakechad for example, genning animateinanimate like this? fuck this is so good.
man i know my brain is rotted when i find pancake and syrup women hotter than any e-girl kek
>>
>>107157453
very carefully
>>
>>107157470
i asked how you gen them, not how you fuck them!

but true.
>>
>>107157311
No lora I found works well with Holocine (the multiscene fine-tune)
>>
File: 1751074644698500.jpg (73 KB, 735x739)
73 KB
73 KB JPG
WAN 2.2 anons: just bought a 5070ti and I've been playing around all weekend to get a good workflow for keyframing a longer animation
>Generate ~12 separate 'keyframes' in SD for character LORAs
>Inpaint poses/details - create depth masks to quickly delete background in photoshop to keep character in white void for WAN
>send color 'keyframes' 1 + 2, 2 + 3, to FFLF2V to get a crude timeline of 2-3 second clips (turning, raising, pointing, draining a pint glass, etc. )
>i2v Q_8 gguf in the comfy 'workaround' gets jarring "Flashes" on reaching last frame as it quickly tries to compensate for color degradation, but LORAs are made for i2v.
>Inpaint Q_8 gguf seems to go faster and solves the flashes, seems to take the LORAs but i'm still unsure how well it will work long term.

curious how to proceed here:
>finish all the 2-3 second clips in i2v and try to save it in premiere
>keep playing with the inp. to get it to follow styles so I only need to fix the front half in post or re-gens
>Learn how to use VACE and how to use the last and first 8 frames of each clip to preserve the motion
>Take the entire 24 second video with jank coloring and learn VACE v2v to depth mask the entire thing and regen.


>>107157199
minimum 64gb ddr5 ram for offloading model cache if needed
I have 32 and have been holding off because prices are gay. is it actually super necessary?
>>
>>107157556
vace
>>
>>107157556
>is it actually super necessary?
No but excessive swap use you get with 32 gigs slow generation down considerably.
>>
File: 1743026961283844.jpg (19 KB, 409x160)
19 KB
19 KB JPG
Why are the vue nodes so fucking huge? I want to use them, but this is ridiculous.
>>
File: lora_00090_.jpg (325 KB, 891x1336)
325 KB
325 KB JPG
>>107157556
It's much faster with 64gb+ ram
>>
File: ComfyUI_01308_.png (1.12 MB, 1152x896)
1.12 MB
1.12 MB PNG
>>107157453
prompt for the original one:
>professional 4k high resolution hyperrealistic 3d render by Disney Pixar of a beautiful nude curvy woman slime girl who is made entirely out of maple syrup. Her whole body and face are translucent and seethrough syrup. Her hair is made out of melting butter. She sits cross-legged on top of a huge stack of pancakes. Her body melts onto the pancakes. The pancakes are on a modest porcelain plate in a 50s American diner restaraunt.
>raytracing, beautiful lighting.

standard chroma WF
>>
>>107157199
What does offloading model cache mean and what do you mean by 16gb vram + .24vram?
>>
File: lora_00094_.jpg (301 KB, 891x1336)
301 KB
301 KB JPG
>>
Easy Cache, Lazy Cache, Apply First Block Cache, Wan Video Tea Cache, Wan Video Mag Cache, Wan Video Tea Cache Native, Wan Video Easy Cache
Which cope cache node do you use and at what settings?
>>
File: 175498415651458.png (506 KB, 640x610)
506 KB
506 KB PNG
>>107157592
would 96 make any difference or is that just pointless? the price ladder from 64 is a lot narrower than it used to be due to being a weirder size + slower clocks for XMP

>>107157565
>Vace
what's the point of the 3gb "Module" Vace FUNs at https://huggingface.co/Kijai/WanVideo_comfy_GGUF/tree/main/VACE
versus the large models at https://huggingface.co/QuantStack/Wan2.2-VACE-Fun-A14B-GGUF/tree/main/HighNoise?

Do you load the modules in the same chain as the regular i2v (or inp) model to save on disk space while achieving the same result?
>>
>>107157642
For example, Wan2.2-I2V-A14B-LowNoise-Q8_0.gguf is 15.4gb. If you only have 16gb of vram on your gpu, that leaves you with 0.6gb of vram. Keep in mind, the text encoder + loras + vae also are stored in the vram. Since all that can't fit on a tiny 16gb card, you can set a specific amount of the model to be swapped to your system ram. IE, 10gb of the wan model off loaded to system ram. This will allow you to gen without running out of memory. Off loading to ram is much, much slower, but it works.

Optionally, you can use a lower quant version of the model, like Wan2.2-I2V-A14B-LowNoise-Q6_K.gguf which is 12gb, but lower quants = lower quality.
>>
>>107157642
He's saying you should aim for 16gb vram minimum but 24 is preferable. Offloading is when you can't fit the entire model into vram so you use your system ram. wan 2.2 q8 is like 15 gigs(?) for one of the models
>>
>>107157685
For Wan2.2, you don't use any of them.
>>
>>107156022
imgchad rein eternal
>>
File: 1653334650116.jpg (46 KB, 750x1086)
46 KB
46 KB JPG
>>107157370
>>
>>107157713
Is there a reason why?
>>
>>107157705
>would 96 make any difference or is that just pointless?
Hard to say really. Depends of the motherboard combo I guess.
>>
>>107157740
Video generation is iterative.
>>
https://civitai.com/models/2114848/2000s-amateur-photography
As requested. Not perfect, but reduces vaginahorror and manfaces.
>>
File: wan2.2_00001.mp4 (3.13 MB, 832x480)
3.13 MB
3.13 MB MP4
I tried Holocine and I could not get the same results as their demo even with 15 seconds lol
I used the same prompt
I obviously had to make some sacrifices like using distillation models with 5bit quants

"b-but local is better than saas, trust me bro!"
"results are shit? It's your fault you are poor and don't own an H100, the pinnacle of LOCAL gpus :^)"
>>
>>107157641
thanks <3
>>
>>107154956
this is AI?
>>
>>107157987
That looks like a zoomer idea of what 2000s photography looks like, and some of the photos in the showcase don't look "amateur" at all. At least search for photos that used popular cameras from that time like Sony Cybershot, Olympus, Canon PowerShot etc, or search for old myspace photos or older photos from Flickr.

t. Millennial
>>
>>107157987
bruh moment, as the kids say. https://civitai.com/models/978314/ultrareal-fine-tune?modelVersionId=1413133
>>
>>107157987
wait regular chroma cant do vageen? wtaf
>>
>>107158042
Dataset is mostly from 2000-2010 era.

>>107158090
It can, but it gets confused.
>>
>cold weather
>gpu 100% to warm room
Ohh shit it is GOON season
>>
But for what shall i goon to?
>>
File: 1741043482920713.png (1.18 MB, 896x1152)
1.18 MB
1.18 MB PNG
>>
correct me if im wrong, but is there any reason to make a high noise of a character lora for wan? there's no motion, so what would be the point?
>>
>>107158114
>Dataset is mostly from 2000-2010 era.
I am a Millennial boomer who lived that era and at least the showcase images don't resemble the amateur pics from that era at all
>>
>>107158147
It's less about "motion" strictly but denoising strength.
You might be able to make do if your character looks like a normal human with just low denoising lora. But for something like say Kirby or Sonic, you probably want for both.
>>
File: ComfyUI__00002_.mp4 (479 KB, 832x640)
479 KB
479 KB MP4
>>107155187
>>
>>107158193
I see, thanks. I've been experimenting with my character lora while using other NSFW loras, and I noticed that using the low rank of some loras forces my character(person) to look like whatever person that lora was trained on. How can I avoid that? Increase the strength of my character's LOW lora? remove the NSFW's low model? I've tried both but haven't found anything solid that works. I can't get rid of the low lora for some NSFW loras because wan needs that data to create for example, a penis or cumshot.

The twerk lora for example, always makes the ass bigger and i don't want that. its so annoying. lowering the strength of the nsfw lora helps but also reduces the motion
>>
File: ComfyUI__00003_.mp4 (587 KB, 640x832)
587 KB
587 KB MP4
n00n0
>>
>>107158244
nani kore wa yameto my ramenu betta stoppa acting up i'm gonna nækædæshi my ramanu
>>
>>107158224
>How can I avoid that?
I should note that I never trained a WAN lora, but this seems like a generic lora compatibility issue to me. Try lowering the strength of other lora?
>Increase the strength of my character's LOW lora?
Maybe just a bit if you are desperate.
>remove the NSFW's low model?
Probably not.
>The twerk lora for example, always makes the ass bigger and i don't want that.
This just means the person who trained it, trained on big asses.
Train your on with diverse dataset of asses of all sizes?
>>
File: 1750330283072872.jpg (746 KB, 1536x2688)
746 KB
746 KB JPG
>>
flux/chromosome users, how do you handle your text encoders? do you use specific quants? i'm starting to wonder if my shit gens are a product of what i'm using, but i'm not sure. cumfartui is very confusing as well so that's a variable. the default flux krea workflow is 3 whole seconds slower than an old workflow i was using earlier this year..
>>
File: ComfyUI__00006_.mp4 (514 KB, 640x832)
514 KB
514 KB MP4
>>
>>107158330
keep t5 at fp16 imo.
>>
File: ComfyUI__00007_.mp4 (642 KB, 832x640)
642 KB
642 KB MP4
>>
>>107158330
q8 chroma, fp16 clip, 26-35 steps, euler simple/beta
try "aesthetic 1" in negative
>>
File: ComfyUI__00009_.mp4 (596 KB, 832x640)
596 KB
596 KB MP4
it doesn't understand left/right but far/near seem to work
>>
>>107158350
>>107158378
thanks. i guess i was trying too hard to save on vram by lobotomizing the text models.
>>
>>107158385
Why do text encoders struggle with directions? That's not an isolated incident.
Quick theory:
Is this because right/left can mean both viewer's right/left and character's right/left, which ends up confusing the UNET during training?
>>
>>107157987
thank you for your hard work
>>
>>
>>
>>
File: ComfyUI_temp_xgutr_00011_.png (2.11 MB, 1024x1248)
2.11 MB
2.11 MB PNG
>>
>>
So sounds like it’s worth going down a generation to the 4x cards if I want 24gb vram at a more reasonable cost
>>
>>107158475
>>107158542
neat
>>
>>
>>
File: peasant girls.jpg (508 KB, 2688x1536)
508 KB
508 KB JPG
i literally gooned for 12 hours today
>>
>>107158554
so a 4090 then? aren't they like 1500 dollars
>>
>>107158162
I guess I could rename it, fair point

>>107158435
npnp
>>
File: 1732200222847408.jpg (960 KB, 768x1344)
960 KB
960 KB JPG
>>107158607
Can you catbox your picrel or a similar gen?
>>
>>
>>
>>107158665
https://files.catbox.moe/0andv6.png
>>
>>
>>107158723
Thanks!
>>
>>
>>107158781
>>107158726
>no large breasts, wide hips
>>
>he doesn't (large breasts, wide hips, thick thighs:1.5)
>>
>>
>>
>we may be getting a $2k stimulus check
and i'm 100% going to use that money to buy a 4090, kek
>>
File: ComfyUI__00010_.mp4 (1.06 MB, 832x640)
1.06 MB
1.06 MB MP4
>>
>>107158919
You should by stocks, dummy. Preferably OpenAI stocks of course lol
>>
>>107159071
but i want faster gens right NOW
>>
Hello, I am from the TouHou AI general on >>>/jp/2huAI/. It is a very nice and good quality general, but it is slow to answer simple questions. I have a question about making a lora. Is this the right place to ask?
>>
>>107159091
you could've just asked the question instead of wasting a post asking for permission to ask a question
>>
ooooh baby SongBloom is cookin up some songs real nice

hear that sizzle & smell dem onions
>>
>>107159161
https://vocaroo.com/1myH3aeJX4hT

Trying again, the tricky part is trying to get lyrics adherence but the right amount of song stealing.
>>
File: ComfyUI__00022_.mp4 (322 KB, 640x832)
322 KB
322 KB MP4
>>
>>107159282
nice, can she SING?
>>
>>107159091
Going to bed now might answer your question hours later if it makes sense, if no one else answered it and if I don't feel too lazy.
>>
File: ComfyUI__00028_.mp4 (1005 KB, 640x832)
1005 KB
1005 KB MP4
>>
File: ComfyUI__00023_.mp4 (267 KB, 640x832)
267 KB
267 KB MP4
>>
>>107159296
nobody cares bro.

bro go to the poop festival, in your dreams.

bro
>>
>>107159294
dunno
>>
>>107159308
>slow motion shit
lightx2 crap
>>
>>107159256
https://vocaroo.com/1cHl7bT8AHk0
>>
> no new better cards
> no new better models
it's so over for local
>>
>>107159481
I'm literally posting SongBloom gens, dearest sir of the African persuasion.
>>
>>107156778
cth-uwu.
>>
>>107159495
do you know how you would prompt this?
https://www.bbc.com/news/articles/c1wl5jp94eno

I genuinely have no idea
>>
Is normal forge still the only UI that uses Gradio 4?
>>
>>107159504
you could try this maybe? it does alright
https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
>>
>>107159519
>https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
>This photograph features a large, shiny, blue, abstract sculpture of a humanoid figure with a rounded, bulbous body and simplified, elongated limbs. The sculpture has a smooth, glossy texture, reflecting the surrounding environment. It stands outdoors on a paved area with a grassy patch behind it. In the background, there are palm trees and a building with a white facade and a red horizontal stripe near the top. The sculpture's head is slightly tilted downward, and its expression is indistinct due to its abstract nature. The bright blue color contrasts with the green grass and the white and red building.

I may try it.
>>
>>107159452
nogen crying
>>
File: 1757137130565758.jpg (174 KB, 1086x386)
174 KB
174 KB JPG
>>107159562
I gen 24/7 but ok
>>
File: ComfyUI__00035_.mp4 (1.18 MB, 720x1280)
1.18 MB
1.18 MB MP4
>>
>>107159618
cry moar kid
>>
>>107159618
win VOMIT dows
>>
local still can't compare to grok imagine
>>
>>107159643
can grok do explicit porn? no? who cares.
>>
>>107159618
WHOA lookout, we got a WINDOWS guy here
>>
Any good alternatives to lightx2v, other than waiting 10 minutes for a gen?
>>
>>107159679
Personally, I think SongBloom is the best replacement, but some may disagree, for very stupid reasons.
>>
>>107154861
lol
>>
>>107159561
Running it
>>
File: 1742645567943191.jpg (467 KB, 1028x1421)
467 KB
467 KB JPG
>>107159652
Yes. I use freeBSD, headless debian and primarily Windows, anon.
>>
>>107154883
thanks

SongBloom is way beyond lol
>>
>>107158919
>4090
Shit, $2k is like 8GB of RAM these days.
>>
File: 1755826040903345.jpg (234 KB, 1011x694)
234 KB
234 KB JPG
>>107159727
>>
>>107159452
>lightx2 crap
let's hope the new 4 steps distillation method will make the slo mo shit dissapear >>107154918
>>
>>107159737
why not just buy a 5090 at that point
>>
>>107159816
I've said it a billion times but slow-mo isn't the only problem with lightx2. it nukes the liveliness of animations. everything is simply less animated. less things move.
>>
File: waifu located.png (1.46 MB, 1024x1024)
1.46 MB
1.46 MB PNG
>>107159702
image & text go together, but cba to make a video, so just play it and stare at the picture. or ask someone to wan it, I guess.
https://vocaroo.com/1otWsnlu7TwZ
>>
>>107159835
That's an extra $700-900.
At that point may as well buy a RTX 6000 Pro. Just a few extra $$
May as well buy an H100
Shit may as well buy an H200
>>
>>107159835
:^)

so, apparently the 5090 sucks for ai, anyway.
>>
So the only recent local developments are speedcrap for turdworld poorfag shitskins? Local really died with wan2.5, what a letdown
>>
>>107159840
5090 is $2k bud
>>
a pickle, for the knowing ones
>>
>>107159308
>Uses ai to create something that basically doesn't exist. An east-asian with giant tits that aren't fake. This is the future of stable diffusion.
>>
File: wan22___0004.png (1.69 MB, 832x1216)
1.69 MB
1.69 MB PNG
>another day, another lora
>>
File: Video_00137.mp4 (1.56 MB, 544x960)
1.56 MB
1.56 MB MP4
>>107159910
>2.2, still slow mo
>>
File: spics fear it.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
https://vocaroo.com/1fh5yWC322DT

used joy caption on the top image from
https://www.artforum.com/features/yuk-hui-daniel-birnbaum-interview-1234733869/

>Photograph of a clear, rectangular ice cube suspended in mid-air against a bright blue sky with scattered white clouds. The ice cube is transparent with visible internal crystal structures and slight surface imperfections. In the background, there are blurred green trees and a tall evergreen tree, indicating an outdoor setting. The image has a sharp focus on the ice cube, with a shallow depth of field that blurs the background. The sunlight illuminates the ice cube from the front, highlighting its transparent and textured surface. The overall composition emphasizes the contrast between the sharp, detailed ice cube and the soft, blurred natural background.

chroma hd and SongBloom
>>
>>107159910
What lora is that
>>
>>107159960
who cares, they're all literally the same.
>>
File: wan22___0009.png (1.48 MB, 832x1216)
1.48 MB
1.48 MB PNG
>>107159960
https://civitai.com/models/2063310?modelVersionId=2334783
just published the wan version
>>
>>107157987
how many images for the dataset?
>>
>>107159982
Nice
>>
>>107157370
Unfathomably based
>>
>>107157290
don’t worry anon that’s just your dumb hormones
women are overrated
>>
>>107160004
600, next version has 648
>>
File: 1740414827149879.png (1.5 MB, 1800x606)
1.5 MB
1.5 MB PNG
>>107154918
this is way closer than the lightning method, impressive
>>
File: 1745128418003985.png (2 MB, 1344x1728)
2 MB
2 MB PNG
>gooning to your own gens
isn't this just a more convoluted and expensive way to goon to your own imagination? what's the point?
>>
>>107160054
damn dude, i rarely go over 20 images. have you done any tests with smaller training datasets?
>>
>>107160083
aphantasia
>>
>>107160106
>have you done any tests with smaller training datasets?
Yeah. I prefer larger datasets for more variation
>>
>using supervacetools to make long video
>long pauses between each gen
>swap the "patch sage attention kj" node with "model patch torch settings" node
>no more retarded long pauses in between gens
>near double the speed

fp16 accumulation is pretty dope, wonder if it'll work on wan2.2
>>
>>107159932
this i2v or t2v? what x2v youre using?
>>
>>107157370
train a lora to make Wan have the prompt adherence of Sora 2
>>
https://github.com/wallen0322/ComfyUI-Wan22FMLF

Improved tech just dropped.
>>
And a qwen edit upscaler.
https://huggingface.co/vafipas663/Qwen-Edit-2509-Upscale-LoRA
>>
>>107160391
>Wan fuck my life

Start, middle, and end frame is a nice addition.
>>
>>107160391
> - Dual MoE conditioning outputs (high-noise and low-noise stages)
> - Multi-motion frames support for dynamic sequences
> - Automatic video chaining with offset mechanism
> - SVI-SHOT mode for infinite video generation with separate conditioning
> - Adjustable constraint strengths for each stage

Interesting.
>>
File: 1737619368908935.png (841 KB, 896x1152)
841 KB
841 KB PNG
SPARK chroma fixed chroma.
>>
>>107160523
Thought svi was for 2.1 and 2.2 5b? Would it properly work with 2.2 14b? Also pretty sure there's either going to be dedicated nodes or comfyui native implementation, soon hopefully
>>
>>107160582
Native comfyUI nodes often times are riding free ideas from others or poorly supported just for market capture. Dedicated nodes from other people tends to get faster updates and works better. More than once, I am disappointed with Comfy implementation. IE, inpainting and wan2.2
>>
>>107160565
Show realism, then we talk
>>
>>107160391
I was just asking about something like this a few threads ago. There is a ton of multi image pixiv illustrations as well as my own ai ones that would make for great animation with this.
>>
What's the state of the art for local photorealistic video gen?
>>
PSA: pi-flow combined with loras seems to be slightly last slopped than the normal Qwen-Image experience. To get rid of plastic skin slop and "cinematic" stuff, avoid using words like "a photograph of (...)" or "an image of", and use "Amateur footage of (...)" instead and you will consistently get better photo-realistic results. Recommended model: Lenovo Ultrareal
>>
>>107160622
makes me wish a different UI got all the community attention. anons are too doompilled on comfy since it focuses on saas more than anything nowadays
>>
>>107160622
>>107160804
It's not too late for (You) to contribute to stable-diffusion.cpp
>>
>>107160772
>Lenovo Ultrareal
My favourite LORA
>>
>>107160772
>slightly last
*Slightly less. I am sleepy
>>
>>107160083
it's gooning to gambling and chance that the prompt matches what you had in mind beforehand. twice the degeneracy and an excuse to edge endlessly
>>
>>107156856
now that is just ugly
>>
File: 00017.png (62 KB, 1024x1024)
62 KB
62 KB PNG
>Try to generate a headshot
>The top of the girl's head is always out-of-frame
How can I fix this?
>>
>>107160837
tell the Chinese to switch onto it, it's run by one of their own so I don't get it. it would also gatekeep western companies since they hire Indian slaves to poothon all day. would be hilarious if they changed their minds and cucked america by doing that though
>>
In case you guys want to be disappointed of the state of local: prompt Qwen-Image for centaurs.
>>
>>107160965
She looks like your average Bong woman
>>
File: tmp6mjeba9y.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
>>107160980
If putting "out of frame, cropped" in negative prompt isn't enough you can always just outpaint.
Another option is to resize and draw in the hair color in an editor then inpaint that to match the rest of the image.
Failing that you could always generate a taller aspect ratio full body shot and crop out what you don't want to use.
>>
>>107160980
Add more details to the prompt of what you want to see: eyes, hair, etc.
>>
Does anyone know the default wan2.2 settings without light loras? 20 step (10 h + 10 l) and euler + simple, high start step 0 end step 5 / low start step 5 end step 10000?
>>
>>107161034
Best I've got.
>>
>>107160083
>isn't this just a more convoluted and expensive way to goon to your own imagination? what's the point?
For me it's a combination of aphantasia like the other anon said (I can't visualize things) as well as "playing with dolls" (some anon once mentioned that genning 1girls is the same delayed brain development as people who play with dolls and I 100% agree because I independently came to the same realization/conclusion)

>>107161131
"Default" wan 2.2 to me sounds like 50 steps (25 each) on unipc sampler with default CFG and flow shift values
>>
>>107161160
Got it to work with the 20 step but only gen once. I tried genning again and get an error of

CLIPTextEncode

'GGUFModelPatcher' object has no attribute 'named_modules_to_munmap'


I've already updated everything to the latest version. Doesnt seem to want to work without light loras, kek
>>
>>107161034
It can, however, do the reverse (prompt was 'a headless horse', after some seed hopping)
>>
>>107160582
The node doesn't work even without svi lora.
>>
>>107161206
Never seen that error before, you can open your ComfyUI folder in Visual Studio Code and give Copilot the error and see if it can help figure it out.

Since this is a clip error I'm assuming you're doing image2video? Since only wan i2v should be using a clip model (clip vision to read your input images)

Text to video and image to video have different settings and nodes required. The default comfy workflows on the GitHub for wan are fp8 scaled and don't have any lightning enhancements so you can use those to do your full-step gens I guess
>>
>>107156291
on linux?
please share your setup, what distro and kernel version
>>
guys is qwen fp8 better or q8?
>>
>>107161341

Yes, its i2v. I just switched to the multigpu unet and clip nodes instead and that solved the problem. Yeah there's always some kind of new error every update
>>
>>107161397
>guys is qwen fp8 better or q8?
Q8 is fp8 with some layers kept unquantized so it should be strictly better
>>
>>107161470
thjanks. I dont know what it is with qwen, but it takes so long to gen. Wan videos are so much faster!
>>
>>107161470
>Q8 is fp8 with some layers kept unquantized
No. Q8 is basically FP16. It is much better than FP8. This is common information you can google.
>>
>>107161497
What you said didn't invalidate what I said. Q8 is some of the blocks at int8 and some blocks at f32.

I'm planning on testing t5_xxl with the different fp8 versions versus Q8_0 today
>>
>>107161474
Don't you use a speed-up lora?
>>
>>107161642
theres so much quality loss tho
>>
>>107161648
are you real? finally a real person that agrees lightx2 and speedup loras are fucking dogshit
>>
>>107161666
umm yeah, if u dont think theres quality loss then ur blind af
>>
>>107161666
No one is denying that. But also a genuinely improved t2v version came out recently, and so did an i2v version but it was worse than previous ones so only t2v got an improvement recently
>>
>>107161341
i've been doing i2v without clip vision and it seems just fine, would i get better results if i add it?
>>
>>107161730
I have absolutely no idea to be honest since I hardly do i2v locally

I never used it either but it's supposed to be required. It works on a GGUF workflow without clip vision but broke for me using kijais nodes iirc

But I am also out of the loop of what nodes to use nowadays. My 2.2 workflow is full of deprecated and beta nodes since it just works, it's GGUF and there's no actual benefit I get from remaking it until until there's a new model to run, which will have its own nodes and workflow needed anyways probably
>>
>>107161730
>>107161775

Asked perplexity about clip vision for wan2.2...

>Clip Vision is generally not necessary or beneficial for WAN2.2 workflows, according to user reports and in-depth testing from the image-to-video AI community. WAN2.2 is designed so that it no longer relies on CLIP embeddings; this marks a shift from previous models like WAN2.1, which did have image cross-attention layers that could utilize CLIP vision. When Clip Vision is supplied in a WAN2.2 workflow, it is simply ignored, so it does not improve generation quality or prompt adherence, and may actually slow down video creation times by several minutes.
>>
>>107159091
Don’t mind the mean people
>>
>>107161648
There's a new one that promises to be better, look up the thread.
Altenatively, gen in stages, and only use the full model during critical ones. I don't think I could tolerate genning with Qwen at all, but at less than 10 seconds per preview without optimizations, I'm chugging along merrily. But then I've split my workflow into so many sampling stages, the workflow is becoming unwieldy by itself.
>>
>>107161882
>and may actually slow down video creation times by several minutes.
Good to know.
>>
>>107161882
Then kijai is more of a vibe coder than I thought lol. Pretty sure it's not a false-memory that I needed to download clip_vision_h in order to get one of his nodes to stop complaining, even if that node ended up never using it

I'm bored and my new job starts in a month so I'll spend today making an "opinionated 2.2 t2v guide/recommendation" rentry as well I guess
>>
>>107161926
Probably a hallucination. Clip vision is like a 70mb model so even if it's being loaded and unloaded every generation without doing anything it can't be adding more than a couple of seconds max
>>
>>107156462
>Stanford University, Adobe Research
yeah i'm not going to be installing that adobe research shit on my machine. I really just don't trust them not to sell my data for research purposes using some fuckery inside of their nodes. also no gguf support?

TRASH
>>
>>107161947
This post activated the neurons in my brain that reminded me that Tel Aviv University made a really good text to video model and put out a paper and then never released it. I think this was either before or during the wan 2.1 era
>>
>>107161938
> Clip vision is like a 70mb model
>>
>>107161995
>>107161938
clip_vision_h used for wan21 is like 1gb. still, on an ssd you'd barely notice it. if wan22 doesn't use it then there's no reason to have it.
>>
>>107161995
I was wrong but I also swear I downloaded a tiny clip vision h as a .pt before

>>107162021
>clip vision h for wan 2.1 only
That explains it. Thank God 2.2 got rid of the double text encoder autism that hunyuan introduced. Too bad we got refiner autism instead
>>
do you get better prompt adherence with fp16 clip compared to fp8 scaled?
>>
File: 1743731954230389.jpg (181 KB, 793x598)
181 KB
181 KB JPG
>>107162068
fyi, just ask claude these type of questions. higher precision models will always be better. how much of a difference in quality/prompt adherence will always be subjective and debatable because it entirely depends on the prompt and model.

https://claude.ai/
>>
>>107162109
please go away
>>
>>107162123
you asked a question and got an objectively correct answer. if you're upset that it was ai generated while also posting in a general about generating ai content then you're a fucking retard.
>>
>>107162140
>>107162109
> is model A good?
> according to benchmarks model A is the best...
>>
>>107159646
why don't you just have sex?
>>
new
>>107162296
>>107162296
>>107162296
>>107162296
>>
>>107162259
the kind of sex i want is forbidden.
>>
>>107162300
stop lusting after horses
>>
>>107162300
ask the friendly fbi agents to kindly break all of your limbs
>>
>>107160349
t2v
>>
>>107162606
check the new thread, but is this with the new seko v2.0 version of lightx2v? in my testing slow motion has gotten much better most of the time



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.