[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


v7 Edition

Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107001451

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Neta Lumina
https://civitai.com/models/1790792?modelVersionId=2298660
https://gumgum10.github.io/gumgum.github.io/
https://huggingface.co/neta-art/Neta-Lumina

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
what is opinion of pony v7, krea video, lightx2v lora nwe
>>
Blessed thread of frenship
>>
>>107006468
>not a single anime girl
:(
>>
>>107006514
anime died with pony
>>
File: 1743952343601771.webm (641 KB, 570x854)
641 KB
641 KB WEBM
>>107006514
>but there's a clown girl
a massive upgrade, then.
>>
File: fae_1292.jpg (1.18 MB, 1560x2008)
1.18 MB
1.18 MB JPG
>>107006326
It is more associated with "pointy ears" than elf, as I can get it with "demon girl" or "fairy" as in pic related.
>>107006523
Wow those do look a lot like the earrings I keep getting, especially when not trying to prompt around it. But does Chroma know "Frieren"? I thought it had characters and styles removed.
>>
>It isn't that half bad desu
it makes sense that the guy who shills netadogshit would think that ponyv7 isn't half bad. horrendous shit taste.
>>
my prompts are too strong for you, anon
>>
>>107006544
>But does Chroma know "Frieren"? I thought it had characters and styles removed.
Yeah chroma knows many characters and styles, just need longer prompting than just name
>>
anon, i tell you i am going to 1girl, and i want only your strongest prompts
>>
(you) can't handle my strongest 1girl
>>
>>107006585
masterpiece, loli, photorealistic
>>
>>107005962
Same reason Princess Peach always has an Iron Man core embedded in her chest no matter what clothes she's wearing, if almost 100% of the examples of a given object have that particular feature then as far as that model is concerned it is an inherent aspect of that object. It's not like these models make any inherent distinction between clothing and body parts.
>>
so did astralite comment about why v7 shat the bed so hard or is he pivoting straight to another grift
>>
>>107006647
Pony v6 was simply a fluke.
When he announced v7 and what was going on with it that already was a clear sign that the model is going to be a failure.
>>
>>107006545
>singular
>>
Anybody who defends NetaLumina or Ponyv7 is deranged and cannot be trusted.
>>
File: dot.jpg (629 KB, 1536x2304)
629 KB
629 KB JPG
>>
>>107006647
The latter
>>
File: elf-skelly-wat.jpg (1.09 MB, 1464x2144)
1.09 MB
1.09 MB JPG
>>107006629
Well I know there are plenty of elf images that don't have earrings like that. And other models like the SDXL-based ones can make earring-less elves easily enough. It might be that I am pass in a cartoon style image in the workflow, that does not itself have earrings but this nudges the model as well. Perhaps it can realistic pointy earsor other styles just fine without sticking earrings on. Or perhaps if I fed it a empty latent image it wouldn't have the problem. Haven't tested it. Just noticed it was st range. At any rate, I have added frieren to negatives (along with earrings) but I don't start losing the earrings until the cfg gets all the way up to around 5.0 at which point it is looking rather fried so I guess it is not going to fix the problem.
>>
>>107006704
> It might be that I am passing in a cartoon style image*
and
> Perhaps it can do realistic pointy ears or other styles just fine*
>>
>>107006704
bretty kino gen
>>
>>107006647
>pivoting straight to another grift
this, as to why anyone would support him after v7, the amount of retards far outweigh people with common sense.
>>
>>107006544
The annoying thing is I can get the earrings to go away by turning down cfg below the correct value (1), but then the image fails to denoise correctly.
>>
>>107006544
I finally succeeded. I was uhh only trying to accomplish no earrings, nothing else in particular
>>
>>107006793
prompt?
>>
>>107006793
Nice job and nice tits. Did you do anything in particular to make them go away or was it just a lucky gen?
>>107006731
Thanks, you can get some weird results when you use wd14 to interrogate an image, then use its prompt to make a new picture.
>>
>>107006829
>boobas, bazoongas, over the shoulder boulder holders
one can only imagine
>>
File: view.png (1.56 MB, 896x1024)
1.56 MB
1.56 MB PNG
No finetune is going to fix that trash pony v7 model. Pic unrelated
>>
>>107006840
Turning cfg down to ~0.85 had a lot to do with it, but largely just luck

>>107006856
In this case it was "tall and voluptuous" plus "[her] big flabby sagging breasts are tightly bound in her fraying smock and squeezed together for a ton of cleavage"

I found this was um necessary to make the earrings go away
>>
Local models should unironically be banned
>>
>nigbobumping this thread
>>
is the info of style clustering ANY FUCKING WHERE for ponyv7? I searched in the HF and civitai page, NOT a single fucking link to check these fucking clusters.
Yes I know that pony is shit, but I still wanted to experiment a bit with this new toy.
FML
>>
>>107006947
and what about the comic style name?
>>
>>107006840
Actually I don't know why I said it was only luck, I did a lot otherwise to try to make it happen (luck was still important of course)

I added no-makeup hashtags, removed anything signifying richness or ornateness, used a lot of words like "plain" "natural" "rustic" "barefaced" etc., tried to force a feral/pauper/tattered appearance, tried to get a retro pulp fantasy aesthetic to avoid modern "character design" slop, described the character as boyish and avoided things suggesting a stately older elf, etc.

But all those things failed until I turned cfg down a little bit. Now some of the gens have earrings and some don't.

>>107007052
"A blurry grainy scan of an old pulp fantasy illustration from 1957." I'm sure there's a lot of room for improvement there.
>>
File: file.png (2.64 MB, 1280x1536)
2.64 MB
2.64 MB PNG
res2s, beta57, 20 steps, 4 cfg.
man these fucking hands
my maximum permitted gen time is around 100s, this gen took 110s.
>>
>>107007058
>I'm sure there's a lot of room for improvement there.
E.g., I am now going to try "pulp adventure" instead of fantasy because the word fantasy is too closely associated with modern slop
>>
how do i gen girlfailures?
>>
>>107006975
>t. /de3/ vramlet jelly of /ldg/ chads' epic booba gens
lmao
>>
File: file.png (2.55 MB, 1280x1536)
2.55 MB
2.55 MB PNG
>>107007076
adding this negative:
deformed hands, bad anatomy, extra limbs, poorly drawn hands, poorly drawn face, mutation, deformed, extra eyes, extra arms, extra legs, malformed limbs, fused fingers, too many fingers, long neck, cross‑eyed, bad proportions, missing arms, missing legs, extra digit, fewer digits

seems to have fixed some of the issues actually. I prefered the older image overall composition and tone tho.
>>
File: 1745396937741.png (89 KB, 1714x539)
89 KB
89 KB PNG
Can't make Minthy/Rouwei-T5Gemma-adapter_v0.2 work. Provided workflow requires full Gemma so I add a gguf loader node, but then I get a picrel error. LLM SDXL nodepack updooted to 3.0.1.
>>
File: 1739337936477317.png (2.61 MB, 1536x1536)
2.61 MB
2.61 MB PNG
Pony v7 q8 , fp32 vae and clip,
official comfyui workflow, 30 steps
stress test

style_cluster_1610, score_9, Detailed photograph RAW of seven smiling friends of different races that are at a nightclub concert with dim lighting that is shining on their faces, behind them is a crowd of people dancing while fighting with large swords, everyone is holding a sword in their left hand and an intricate beer glass with differently colored beer in the right hand. Far behind them above the DJ there is a sign which has "Minimum drinKing age 021!" written on it in stylized cursive letters.
>>
File: 0064.jpg (1.15 MB, 1112x2816)
1.15 MB
1.15 MB JPG
>>107007058
I tried some of those but in my case I got the earrings still and even lowering the denoising to 0.5 didn't get rid of it. Interestingly, I don't have the problem with the non flash chrome models, such as Chroma-DC-2K-T2-SL4-bf16
>>
>>107007022
that v6 tagmine spreadsheet wasnt created by the author so
>>
File: 1739355381960379.png (2.86 MB, 1536x1536)
2.86 MB
2.86 MB PNG
>>107007243
Different seed
>>
>>107007256
since I've made that post I've read on the colab the styles groups go from 1 to 2048
>>
File: 1745791288575489.png (2.54 MB, 1536x1536)
2.54 MB
2.54 MB PNG
>>107007257
Different seed and without "style_cluster_1610" in the prompt
>>
>>107007243
>>107007257
>>107007269
takes me back to good ol' SD1.4 days
>>
the style_cluster thing for sure is cumbersome, but atleast it has artists in it in some form. I don't mind if I have to look up a table. wish we had that in chroma instead of fucking NOTHING AT ALL.
>>
File: 1752473720551238.png (2.76 MB, 1536x1536)
2.76 MB
2.76 MB PNG
>>107007269
Different seed and without "style_cluster_1610, score_9" in the prompt
>>
>>107007297
i assume the clusters do not allow for prompting individual artists which is a huge fucking kick in the nuts for no reason other than muh morals
>>
>>107007243
>>107007257
Goddamn, the sd1.5 and chroma merge lookin fire
>>
Stop it with the Pony posts that's like seeing gore
>>
>>107007307
what the fuck? I assumed that was the whole point of clusters. Maybe I should actually read the docs.
>>
>>107007338
You must have knowledge of the turd to appreciate the beauty of better models like Pixart Sigma.
>>
>>107007297
Chroma is a base model. Why should a base model have ridiculous style tags?

Chroma as can be tuned by anyone for any purpose. That's what makes it special.
>>
File: 1739366408357497.png (2.62 MB, 1536x1536)
2.62 MB
2.62 MB PNG
>>107007299
score_9, medieval magical intricate and detailed world, princess taking a selfie in a pink ball dress, long ginger hair, pale skin, huge breasts, smile
>>
>>107007361
30 steps is too low, try 40
>>
File: 1733811948766349.png (2.79 MB, 1536x1536)
2.79 MB
2.79 MB PNG
>>107007361
Same seed without "score_9"
>>
File: 1753642459104436.png (2.14 MB, 1280x1536)
2.14 MB
2.14 MB PNG
randomizing cluster styles now
>>
ponyv7 is atrociously bad what the actual fuck
>>
>>107007361
>>107007369
horrible
>>
>>107007341
the author has a strange heretic perversion to releasing models to the public which can be prompted with artist names. his own secret versions however do not have this problem
what a faggot
>>
File: Ponyv7_20251025_00001_.png (2.01 MB, 1280x1536)
2.01 MB
2.01 MB PNG
>>107007375
same seed
>>
>>107007385
>perversion
*aversion
>>
File: Ponyv7_20251025_00002_.png (2.59 MB, 1280x1536)
2.59 MB
2.59 MB PNG
>>107007388
>>
File: Ponyv7_20251025_00004_.png (2.02 MB, 1280x1536)
2.02 MB
2.02 MB PNG
>>107007405
this model is so fucking bad.
gradually losing ALL hope
>>
File: 1611853324372.png (943 KB, 1024x1024)
943 KB
943 KB PNG
>>107007243
Netalumina v3.5, without style and score
lol
>>
>>107007385
if he released the artist ids, I bet people would even overlook the massive flaws
>>
>>107007353
Wasn't it supposed to have them but they fucked up the captioning or something? I don't know, just something I read. You think we're gonna get a chroma finetune? As a dumb user, I don't really care what type the model is. All I know is chroma with artist styles would be sweet.
>>
>>107007443
it's not something you can mess up by mistake
>>
File: 1746136109386459.png (2.7 MB, 1280x1536)
2.7 MB
2.7 MB PNG
>>107007369
>>107007365
40 steps

score_9, Attractive medieval princess taking a selfie in a pink ball dress, long ginger hair, pale skin, large breasts, smile. She is at the top of a tall stone tower, with a large window behind her that overlooks a huge and crowded medieval city at sunrise.
>>
https://xcancel.com/JustinLin610/status/1982052327180918888#m
>Alibaba's CEO is asking himself why Open Source doesn't have udio at home
be the change you want to see, make Qwen Audio or something lol
>>
is this stupid faggot going to post every single gen he makes? fuck off already
>>
File: 1739566236477949.png (69 KB, 1944x433)
69 KB
69 KB PNG
https://github.com/fal-ai/flashpack
Then do it Comfy, I'd like to load my models faster, especially with Wan 2.2 when this model is all about unloading/reloading between the HIGH and the LOW model
>>
>>107007499
>we r working on it and it won't be far. i am just curious about the status

Why talk? Talk is cheap. Give me something that is Udio tier, Apache 2 licensed or I sleep. We don't want another Songbloom or ACE Step.
>>
>>107007244
Yeah flash is harder, which is partly what makes it fun.

As frustrating as models like that can be, fighting against them feels more like a game. Whereas with something more broad like Chroma base it's hard to know what you can do other than wait and get lucky
>>
>>107007538
this, they can definitely do it, do it chinks!
>>
File: Ponyv7_20251025_00015_.png (2.69 MB, 1280x1536)
2.69 MB
2.69 MB PNG
>>107007427
>>
File: 1760941687907711.png (3.52 MB, 1536x1536)
3.52 MB
3.52 MB PNG
>>107007467
Same except 1536x1536, which takes ~7s per step on a 3090, making this take 4-5 minutes per image. Almost the same time it takes to generate a full coherent 5s 32fps video today with Wan 2.2 lightx2v.

Unless the model will somehow be saved with "proper" prompting to take out the style knowledge which will also somehow fix the detail gore and almost make it into a completely new and better model too, it's sadly DOA.
>>
butiful
>>
File: 1756522413972141.png (3.18 MB, 1536x1536)
3.18 MB
3.18 MB PNG
>>107007572
Different seed.
>>
so what's the best wan 2.2 lora combo with the new loras?
>>
>>107007590
New HIGH:
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/LoRAs/Wan22_Lightx2v/Wan_2_2_I2V_A14B_HIGH_lightx2v_MoE_distill_lora_rank_64_bf16.safetensors

Old LOW:
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/LoRAs/Wan22-Lightning/old/Wan2.2-Lightning_I2V-A14B-4steps-lora_LOW_fp16.safetensors

4 steps, cfg 1, unipc
>>
>>107005507
Nice, did anyone try these SVI loras with wan2.2? What weight did you use? How did you make it work for longer videos?
>>
>>107007598
1 strength for both?
>>
>>107007076
>res2s
res3m should be superior and faster too
>>
>>107007603
Yes
>>
>>107007499
good if he makes something
>>
My friend's cousin works for OpenAI and he says they have a secret internal model not ready for public release yet, it's so powerful that you can type in your street address and it will show you pictures of your house, you can even prompt inside and you'll see yourself
>>
>>107007619
i will finally know what my oneitis' vagina looks like
>>
File: 1757969225956232.jpg (883 KB, 1336x2008)
883 KB
883 KB JPG
>sky, up in the clouds, heaven, pearly gates, the kingdom of heaven
>>
File: file.png (1015 KB, 4163x3563)
1015 KB
1015 KB PNG
>>107007536
Does that also apply to gguf? Or files in general?
>>
>>107007619
My uncle works at Nintendo and he said the next Zelda is gonna be fully dynamically generated by a next-gen GPT model that runs on a VR brain implant
>>
>>107007590
>lora combo
the one with the stuff you want in your video
>>
File: HANKH.jpg (8 KB, 309x111)
8 KB
8 KB JPG
>>107007536
Model load is the most frustrating thing about comfyui...

>wan2.1
>takes minutes at the sampler then starts genning or clip keeps offloading then loads forever or memory leaks after 5 gens where I have to force close comfy
>all-in-slop
>constantly offloads the entire fucking model and have to wait another 10 minutes for it to all load again
>wan 2.2
>while the fastest and least pain in the ass, constant and increasing pausing in between high and low generation
>>
>>107007647
it would still be better than nu-Open World slop zelda
>>
File: 00011-2854334830.png (2.96 MB, 1152x1440)
2.96 MB
2.96 MB PNG
>>
>>107007666
how could you do this to me?
>>107007665
>>
File: 1740937397271342.jpg (1.57 MB, 1248x1824)
1.57 MB
1.57 MB JPG
>>
>>107007526
Discussion of free and open source models, faggot
>>
>>107007643
>gguf
No.

>>107007643
>files in general
No, it's for safetensor files and you need to convert them to a flashpack format.
>>
File: 5121255412154.png (122 KB, 450x614)
122 KB
122 KB PNG
>>107007565
Local has a lot of work to do.

Audio inpainting. Audio upscaling/etc... The bar is literally just a decent model that can do it all.
>>
>>107007715
>you need to convert them to a flashpack format.
Comfy said you don't need to convert them, just use their methods on safetensors (look at pircel) >>107007536
>>
>>107007715
Can't wait to load my sdxl models super fast!
>>
>>107007718
udio is the sota on this, but it has the same limitations as any other closed models :
- you can't ask it to make music "in the style of" (make me music like michael jackson "man in the mirror" -> moderation backend and blocked), though now you can send music to it, but it's not the same.
- you can't train it, make specialized "loras".
- anything sexual is moderated (think a sensual song).
>>
>>107007724
Would be nice but I'm not sure anyone would work on that.
It can shave off quite a lot of time with complex multistage samplers setups.
>>
>>107007754
>- anything sexual is moderated (think a sensual song).
that's why it's the most hated audio software of female rappers
https://www.youtube.com/watch?v=1Gt9TTjAMvw
>>
>>107007777
Sure, but that's not sensual, that's just crude and vulgar, never been into these songs. Zero eroticism.
>>
>>107007777
yeah I guess kek
>>
Are they ever gonna make shorter GPUs? I can't fit anything longer than 300mm in my midtower so I'm stuck with 10 VRAM
>>
So does anyone have replacement recommendations to this?
https://github.com/1038lab/ComfyUI-JoyCaption
It refuses to use CUDA and does inference on the CPU. Taking a whole minute.
>>
File: 1741893554978335.png (114 KB, 1979x654)
114 KB
114 KB PNG
>>107007536
kek
>>
>>107007849
any VLM model will use cuda, if you're gonna figure out how to make it work for one then it may as well be joycap
>>
>>107007687
he's talking to himself and spamming the same 1girl, kill yourself, no discussion is being had
>>
>>107007843
With the way things are going, I doubt it.
Well they will, but you'll get the less powerful stuff.
>>
if I gen a 10s (161 frames) video on wan, is there a way to prompt it to do one thing then another without the second taking over immediatly?
"she types on a computer for 3 seconds, then she gets up and walks away"
>>
>>107007890
what if the second thing is waiting, then the actual second thing becomes the third thing
>>
File: 00017-1949654307.png (2.23 MB, 1152x1440)
2.23 MB
2.23 MB PNG
>>
>>107007849
the joycaption repo has a gradio interface right?
>>
>>107007843
no, we reached the limits on the size of a transistor, so the only way for them to get more powerful gpus is to make them bigger, the gold rush is over
>>
Does anyone here know about superesolution models?

I want to train a model with my own dataset, because my dataset shares the same colours, patterns and style, but it has low resolution images, so I want to upscale them as faithfully as possible.

Please somebody help me
>>
>>107007882
the only options for me are the ada 48GB which is not worth it and theres one 5070 ti thats like 285mm but also not worth it. I guess I'll have to wait because I happen to think the 5090 is also a bad investment
>>
>>107007879
I have no idea what you are trying to say.
It doesn't load anything to VRAM, has no GPU usage, CPU at 100% and is slow.
Maybe some other bug or whatever but it's not working as intended.
I asked for alternatives for joy caption inference.
>>107007914
If you are referring to hugging face one that has usage limits.
I am trying to mass tag images for lora training.
That's why I am trying to set up local.
>>
>>107007920
just play with seedvr2 to upscale them
>>
>>107007927
>If you are referring to hugging face one that has usage limits.
no, I mean the github repo
>>
>>107007754
I do recall people making stuff in the same style just by inputting lyrics back when that was allowed.

Look at this
https://www.404media.co/listen-to-the-ai-generated-ripoff-songs-that-got-udio-and-suno-sued/

Obviously there's more, including a popular one
https://www.udio.com/songs/nDKNwPUB6GrMhEfvM6v2u1

Though it's more like a cover
>>
>>107007904
ok worth a try, thanks anon
>>
>>107007934
Yeah, this is where local would shine.
>>
>>107007933
Gradio interfaces are typically hosted at hf and github repo links to hf for online demo as well.
Unless you are referring to something else.
>>
>>107007967
a1111 and all it's forks are using gradio locally. you are being retarded
>>
If anons here use torch nightly wheels, when I updated from the one in the beginning of October to the 22nd one (2.10.0.dev20251022+cu130), suddenly sage attention broke completely, it looks like an issue where everything defaulted to CPU instead of CUDA, making the sampler throw an error.
Going back to the 1002 version made it work fine again.
>>
>>107007979
Oh you mean this?
https://github.com/fpgaminer/joycaption/tree/main/gradio-app
I guess I can try that.
When you said it like that I expected some sort of link to somewhere.
>>
>>107008007
I am too lazy to look for you but you figured it out. gold star for you
>>
>monthly pytorch mismatch between custom nodes that requires a reinstall
here we go
>>
>50s/it WAN with random crashes on ROCM 7
>100s/it WAN with guaranteed stability on ROCM 6
suffering
>>
>>107008079
>he broughtered'ed AMD
why??
>>
>>107008096
Because fuck nvidia. Also gaming under Linux is less of a hassle with AMD.
It's fine, I don't have a fried attention span. I can cope.
>>
>>107008111
>It's fine, I don't have a fried attention span. I can cope.
I'm not sure you cope this well, you literally complained about the lack of speed here lool
>>
>>107007598
thank you anon the pajeet doesnt deserve your grace
>>
>>107007994
I had the same issue, if it's this : https://github.com/pytorch/pytorch/issues/166104

then it's "working as expected" apparently, so it means we need to get sage attention team to update or be stuck with early october torch
>>
File: 1760966609883945.jpg (728 KB, 1464x1824)
728 KB
728 KB JPG
>>
>>107008111
Why did you make a post kvetching about speed and stability if you were going to immediately get defensive and coping lol.
>>
File: 00023-3922286591.png (3.39 MB, 1344x1680)
3.39 MB
3.39 MB PNG
>>
>>107007931
That's not what I need, I want to train a resolution model with my own database.

The idea is to have pairings of images and teach the model what pairings are a correct upscaling.
>>
File: 1596061505572.png (740 KB, 1001x581)
740 KB
740 KB PNG
>>107007499
>make Qwen Audio or something lol
they will do it, but for api, kek
>>
Does ComfyUI patch lora weights into the model by default? Doesn't seem so, why isn't this the case? Wouldn't it help a lot for vram size for multiple loras? Can it be enabled somehow?
>>
>>107007849
skill issue literally. installa a llama-cpp-python version that has CUDA compiled in it, otherwise manually build the wheel using the correct compile flags (literally contained in this node repo through a script):
https://github.com/1038lab/ComfyUI-JoyCaption/blob/main/llama_cpp_install/llama_cpp_install.py
you're fucking retarded and should kys unironically retarded faggot brown
>>
File: 1755978351307033.png (3.88 MB, 1336x2008)
3.88 MB
3.88 MB PNG
>>
>you still have to wait a few minutes to OOM on the first comfy video gen before the second allocates properly and works from then on
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>>
>>107008334
stop being poor and buy a proper video card faggot
>>
>>107008242
>The idea is to have pairings of images and teach the model what pairings are a correct upscaling.
Elaborate
>>
File: 1747274203092719.png (920 B, 75x57)
920 B
920 B PNG
>>107008364
Write proper memory alloc code faggot
>>
File: file.png (513 KB, 765x733)
513 KB
513 KB PNG
>>107008376
>24gb
>not vramlet cope tier
stop being poor jamal, youre embarassing yourself
>>
>>107008393
Write proper memory alloc code faggot
>>
File: frames.jpg (9 KB, 328x186)
9 KB
9 KB JPG
i..I'M GOING TO OOM, AHHHHHHHH
>>
>>107008393
>0% utilization
embarassing indeed
>>
>>107008393
>all that compute for 1girl, standing, ai-generated
>>
>>107007599
didn't work at 5 strength for me so i gave up because wan 2.1 is shit and wan 2.2 5b will be even worse because wan 5b has no speed up lora's as far as i know, second wan2.2 vae is fucking slow and it oom sometime unless --reserve-vram 1.0 is used, oh an 5b produces shit t resolutions below 1280 x 704 or what ever.

total waste of time unless they train it on wan 2.2 14b 720p or heck even 480p, what's more is we don't even know if we are using it correctly. Yeah i think it will be forgotten about and never heard of again.
>>
>>107007599
well reading that post maybe it will be fixed and working, but i won't hold my breath for it because its already limited in what it can actually do with wan 2.1 and its limited to 832 x 480 or it is slow as fuck.
>>
>>107008393
>600 watts idle
waow
>>
>>107008425
im actually running REAL LLMs in here, sadly at q8 quant (GLM 4.6, 400gb~), fully in VRAM unlike you poors who make do with q2ks poverty tier quants and offload to CPU anyway lmao. I use the spare 200gb~ to load FULL precision video/audio/image models to delive a superior and immersive chat experience, with SOTA imagen/textgen/audiogen/voicegen all happening automatically as I rp with my waifus.
>>107008448
these are rented in a datacenter, no way I can run this shit at home. Also memed my company into doing it, bunch of clueless retard, im feeding them shit from bedrock itself while using the real cluster for myself.
>>
>>107008261
you can minimize the vram use by merging, but I don't think it's possible to do on the fly so we are stuck with model + lora sizes
>>
>>107008475
enjoy it while it lasts bruh
>>
>>107008475
maybe if you put in as much effort into real life you could have a real girl friend?
>>
Every now and again I dream about picking up a data center GPU for fire sale prices after the crash happens and then I remember the insane power draw and the fact that they apparently use total loss water cooling.
>>
>>107008334
I don't have the issue, what video gen resolution and how many frames? Can you share your wf?
>>
>>107008475
>Also memed my company into doing it, bunch of clueless retard, im feeding them shit from bedrock itself while using the real cluster for myself.
They don't even see they pay twice?
>>
>>107008508
>after the crash happens
Keep dreaming
>>
>>107008509
Maximum resolution and frames, Q8
https://civitai.com/models/1818841/wan-22-workflow-t2v-i2v-t2i-kijai-wrapper
>>
>>107008430
>total waste of time unless they train it on wan 2.2 14b 720p or heck even 480p, what's more is we don't even know if we are using it correctly. Yeah i think it will be forgotten about and never heard of again.
They released the training code so maybe some rich anon will do it.
>>
>>107008430
>>107008443
>>107008529
OK I guess we'll wait then.
>>
File: psxhr_flux.krea_0009.png (883 KB, 896x1152)
883 KB
883 KB PNG
>>107008519
>believing larpers
>>
File: 1749447214725219.png (22 KB, 571x321)
22 KB
22 KB PNG
>>107008524
>Maximum resolution and frames
720p and 81 frames? Do you blockswap?
Try block swapping 5 blocks for example, see if it works. As long as you have enough ram, the difference in speed is minimal.
>>
>>107008545
wtf bfl, this was almost unsafe
>>
>>107008541
sorry for the black pill bro but i've done some testing the base wan 2.2 5b model today. it don't work with lightx speed lora's so its actually slower than just using wan 2.2 high and low on my machine. The vae is a pain in the ass as well, its really slow to decode on my rtx 3060. it might be alright some people but the quality is shit if using resolutions below 1280 x 720

so yeah its slow and i really don't know why people with lower vram use it when they could just use Q4 gguf high and low models and get much better quality and faster due to speed lora's. So I'm not gonna bother when they release the wan2.2 5B version.

tl;dr is was DOA
>>
>>107008564
we definitely keep it safe around here

>https://files.catbox.moe/aasfd1.png
>>
>>107008597
wish it was good at making filled used condoms
>>
wan gen :
[Subject Description] + [Scene Description] + [Motion Description] + [Aesthetic Control] + [Stylization]

Example: "A young woman in a red dress (subject), standing in a bustling neon-lit city street at night (scene), walks forward then stops to look up at the rain, slow motion tracking shot (motion), cinematic lighting, moody atmosphere (aesthetic), film noir style (stylization)"
>>
File: 1745458766296140.mp4 (1.45 MB, 832x480)
1.45 MB
1.45 MB MP4
>>107007598
forgot to set size for the vid but it still worked pretty good.

the anime girl runs to the left out the door and closes it.

ty anon
>>
>>107008475
How many wan frames can you load and how fast is your wan gens?
>>
>>107008614
I just wish wan could handle longer prompts for more actions. It hardly ever works for me even when using context windows and 161+ frames at 81 frame chunks with overlap.
>>
>>107008079
My ROCM 7 has been rock solid after upgrading to the official stuff. ComfyUI amd memory management update also swagged my shit out.
50s/it seems pretty good for wan with amd, which card you got?
>>
>>107007599
https://github.com/kijai/ComfyUI-WanVideoWrapper/issues/1519#issuecomment-3440759925

I tried telling anons this when it first dropped, it can't be as easy as just 1 lastframe because base wan does not have any concept of the previous genned video. It treats each as a new video, so to me it looks more like how wananimate works in taking 5 previous frames to continue the motion since 1 frame isn't enough to continue motion with.

People were making videos with it but those would have been placebo gens.
>>
>>107008680
9070 xt. Not sure what you mean by "official stuff", I'm using the nightly URL from the official Pytorch page. I haven't upgraded Comfy in a while either.
>>
>>107008508
>and the fact that they apparently use total loss water cooling
is this one of those cases of turbo-niggering the environment to save $0.01?
>>
File: 1754900427998614.png (27 KB, 559x499)
27 KB
27 KB PNG
>>107008684
How to even feed 5 frames to next video? I don't think it's possible right now, last time I asked it was only 1 frame starting the next video.
So on top of their lora, we need some node to "inject" 5 frames instead of 1 as latent into a sampler.
Basically picrel but 5 frames.
>>
>>107008710
Gigawatt in equals gigawatt out. All that energy has to eventually turn into heat, and there ain't an air conditioning system on this earth that can dissipate 1 GW. That being said, I don't know the specifics, only the basic laws of physics.
>>
>>107008719
feed first 5 frames 0 denoise then the rest on the next sampler
>>
>>107008710
it's water in water out, they don't inject wastes from the Gange in it anon
>>
>>107008699
you should upgrade your comfy to at least 0.3.65, very good amd improvements
https://github.com/ROCm/TheRock
I think it's these but I see you're on linux so nevermind. Official windows support I meant.
>>
>>107008726
Images in wan aren't processed in series, all the frames are processed at the same time, it's just that the first one is "fixed" in latent, rest is sent as noise.
What we need is to "fix" the 5 first instead.
>>
>>107007598
willing to give this a try, i'm assuming scheduler simple?
>>
>>107008752
Yes
>>
File: psxzstyle_0018.png (911 KB, 896x1152)
911 KB
911 KB PNG
>>107008604
train a lora man, it takes like two hours
>>
>>107008726
won't work, this is shit we tried months ago I'm sure, it just ignores them. and how you even gonna do this? de-noise in advanced sampler the first 5? it won't matter because wan does all frames at once as a new video. it won't magically know there are 5 frames already done. KJ was wrong to assume it needed no codes changes, we gonna need a new node.
>>
>>107008748
>>107008788
vibe code it
>>
File: 1749642696327592.mp4 (1.47 MB, 480x832)
1.47 MB
1.47 MB MP4
>>107008619
the new 2.2 MoE high lora works so much better for motion/fluidity. ty kijai for fixing it
>>
>>107008769
thanks, but why KJ lora's? Is there something different about them? Or are just extracted from their model?
>>
>>107008792
I tested different loras and this combination had the best result
>>
>>107008788
>we gonna need a new node.
Yep, and even better if we do import latent corresponding to the last 5 images instead of degraded images through vae decode, but I'm not even sure that's possible.
>>
File: 1756528451241047.mp4 (1.61 MB, 720x1280)
1.61 MB
1.61 MB MP4
>>
>>107008846
it started so well, but we didn't get glorious cleavage bouncing
>>
>>107008792
NTA, but the LoRAs released by Lightx2v were extracted wrong initially so you had to use KJ extracted LoRAs. They Lightx2v ones were then re-uploaded with correctly working versions at some point. KJ didn't extract the newest I2V Lightx2v LoRAs because they actually did it right on the first try this time. https://huggingface.co/lightx2v/Wan2.2-Distill-Loras/tree/main
>>
>>107008797
well its no where near as good as my settings and setup, its fucking blurry at 720 x 720
>>
>>107008853
lol its still shit i will prove it...
>>
File: 1736965521980822.mp4 (866 KB, 832x480)
866 KB
866 KB MP4
>>107008791
>>
>>107008878
im using q8 2.2 with https://civitai.com/models/1818841/wan-22-workflow-t2v-i2v-t2i-kijai-wrapper
>>
Ok needed to figure out how to integrate CUDA Toolkit into my docker setup but Joy Caption is working now with GPU acceleration, 4 times faster.
You are the thread schizo who regularly shits it up. As such I won't give a (You), but credit where due thank you, bastard.
>>
>>107008900
>>107008890
>>107008878
wait a minute i forgot to change the god damn steps start and end... This has probably been why its so blurry lol. I'll check it again.
>>
>>107008913
I tried those LoRAs as well and I got some weird hyperspace zoom effect, but I'm just using whatever quants of WAN22 come with Comfy's default workflow.
>>
File: output_t2v_refine_1.mp4 (3.13 MB, 1280x704)
3.13 MB
3.13 MB MP4
I was able to run the LongCat Video demo. This is the stock prompt:

>prompt = "In a realistic photography style, a white boy around seven or eight years old sits on a park bench, wearing a light blue T-shirt, denim shorts, and white sneakers. He holds an ice cream cone with vanilla and chocolate flavors, and beside him is a medium-sized golden Labrador. Smiling, the boy offers the ice cream to the dog, who eagerly licks it with its tongue. The sun is shining brightly, and the background features a green lawn and several tall trees, creating a warm and loving scene."
>negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"

Generation is in 3 stages (initial, distilled, refined) that each output a video. It took 24 minutes to generate this and 74gb of vram (FP32).

Going to try the long video (1min) generation next and expecting it to take hours.
>>
>>107008927
I'm a quality improving study tonight, so we will see which is best.
>>
>>107008959
That is good quality, how long did it take?
>>
>>107008966
>It took 24 minutes to generate this
>>
>>107008966
nvm i didn't read full post 24 mins 74GB vram I'm gonna cry :(
>>
>>107008959
>Generation is in 3 stages (initial, distilled, refined)
ok that might mean it could run on smaller cards?
>>
>>107006468
can you guys make me some realistic apustajas?
>>
>>107008984
https://civitai.com/models/175781/apu-apustaja-model-sd-xl
https://civitai.com/models/679189/apu-apustaja
>>
File: psxannie_0007.png (856 KB, 896x1152)
856 KB
856 KB PNG
>>107008959
thanks for doing this anon, i ran oom on my 3090 multiple times before giving up
>>
>>107008992
scully?
>>
File: output_t2v.mp4 (1.36 MB, 832x480)
1.36 MB
1.36 MB MP4
>>107008983
It already uses 55gb on the first pass, but keep in mind it's at FP32. At Q8 the 74gb peak should be down to 18.5, so it would work on a 24gb card.

The first two passes aren't really meant to be used as-is, anyway. First stage here.
>>
Kind of annoying how the lightx2v ruins videos with end frames. It distorts right at the ending but without the lora it works fine
>>
>>107009003
allison brie you fucking cretin
>>
File: output_t2v_distill.mp4 (1.27 MB, 832x480)
1.27 MB
1.27 MB MP4
>>107009009
>>107008983
Second stage (distill)
>>
>>107009009
>letting your dog lick chocolate syrup
fucking retarded kid
>>
>>107009014
Everyone complaining about lightx2v color distortion, flickering, or blurryness or anything else is a workflow issue, i never had any of those with >>107008900
>>107007598
>>
>>107009016
she looked better younger
>>
>>107009030
I don't have them either with latest version, pretty nice.
>>
>>107009009
the first stage looks alright desu.
>At Q8 the 74gb peak should be down to 18.5, so it would work on a 24gb card.
Yeah I think we will be eating good again soon.
>>
>>107009038
they all do
>>
File: output_long_video_0.mp4 (2.84 MB, 832x480)
2.84 MB
2.84 MB MP4
Running the LongCat 1min demo. It generates 11 segments and chains them together. I'm guessing it'll take about 4.5 hours if it doesn't fail. Here's the initial step of the first 11th.

>prompt = "realistic filming style, a person wearing a dark helmet, a deep-colored jacket, blue jeans, and bright yellow shoes rides a skateboard along a winding mountain road. The skateboarder starts in a standing position, then gradually lowers into a crouch, extending one hand to touch the road surface while maintaining a low center of gravity to navigate a sharp curve. After completing the turn, the skateboarder rises back to a standing position and continues gliding forward. The background features lush green hills flanking both sides of the road, with distant snow-capped mountain peaks rising against a clear, bright blue sky. The camera follows closely from behind, smoothly tracking the skateboarder’s movements and capturing the dynamic scenery along the route. The scene is shot in natural daylight, highlighting the vivid outdoor environment and the skateboarder’s fluid actions."
>negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
>>
File: ComfyUI_temp_dbeyl_00046_.png (2.65 MB, 1440x1120)
2.65 MB
2.65 MB PNG
>>
File: ComfyUI_temp_dbeyl_00047_.png (2.63 MB, 1440x1120)
2.63 MB
2.63 MB PNG
captchas are failing, 4chan is going down
>>
>>107009152
[audience] wooooOOO
>>
File: 1743105653542370.mp4 (933 KB, 832x480)
933 KB
933 KB MP4
>>107008892
>>
File: ComfyUI_03653_.png (1.32 MB, 832x1216)
1.32 MB
1.32 MB PNG
>>
>>107008959
6 second gen took 24 mins to do? that's rough
>>
>>
>>
>>
>>107009180
Yeah but so is wan with out speed loras.

now imagine what this thing could do in future.
>>
>>107009243
what model?
>>
File: 1730557881155378.mp4 (1.07 MB, 832x480)
1.07 MB
1.07 MB MP4
the man in the blue shirt turns and fires a blue energy beam at the plane, which explodes into fire and smoke.

live action dragonball. used unipc instead of euler this time.
>>
>>107007598
I'm guessing you mean 8 total steps 4/4 ? because with only 4 total steps its blurry, I'm now trying with 8 total steps with your settings.
>>
File: ComfyUI_temp_dbeyl_00050_.png (2.24 MB, 1440x1120)
2.24 MB
2.24 MB PNG
>>107009301
chroma
>>
>>107009316
No it's 4 steps total, unipc, 720x1280, 81 frames, q8 wan, umt5 bf16
>>
>>107009330
catbox?
>>
>>107009330
>Chroma
Ew
>>
>>107009333
its not enough for 720 x 720 that's for sure, its looking a lot better using 8 steps total but I am using q4, umt5 16fp
>>
>>107009364
Wan was trained primarily for 1280x720 and 720x1280, and Q4 is too low even for full res anyway
>>
>>107009378
>low even for full res anyway
i think not :)



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.