[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models

Prev: >>107217949

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://rentry.org/wan22ldgguide
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2298660
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd
https://gumgum10.github.io/gumgum.github.io/
https://huggingface.co/neta-art/Neta-Lumina

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
File: 2151711482.png (990 KB, 896x1152)
990 KB
990 KB PNG
>>
File: 2634042630.png (1006 KB, 896x1152)
1006 KB
1006 KB PNG
>>
>>107227677
nice, did you manually add that filter or is it genned like that? prompt?
>>
>>107227307
Let's just say that the only reason this site exists is close monitoring by three letter agencies.
>>
>>107227709
I wish they could monitor us without being massively gay. Glowies should be more worried about what's happening on discord.
>>
How much longer will it be before a new wan version drops? 2.2 has been great and a huge upgrade on 2.1 and I'm looking forward to seeing how it will evolve.
>>
File: 4279123561.png (1.13 MB, 832x1216)
1.13 MB
1.13 MB PNG
>>107227694
It's cuz of the lora
https://files.catbox.moe/gkfm68.png
>>
Just wondering if it's possible to make nsfw sound effects and voice acting with AI? Like is there an equivalent software for it like Stable Diffusion?
>>
>>107227771
Audio is extremely behind locally, so no sadly. I wish we were able to make lewd asmr at home already
>>
>>107227771
I tried doing this with Stable Audio and had no success. Maybe there's a way to prompt it right but in all honesty I didn't try too hard.
>>
>>107227747
Thanks!
>>
>>107227781
>>107227782

Huh, I would have thought it would be easier to advance with sound and voice compared to image/video. Is it because there's not as much demand?
>>
File: ComfyUI_18740_.jpg (568 KB, 2432x1664)
568 KB
568 KB JPG
holy shit bros
I upgraded from a 1070 to a 5090
before
>3 mins to gen one 720p image
>15 mins to upscale it to 1440p
now
>2 seconds to gen one 720p image
>14 seconds to upscale it to 1440p
so from 18 minutes to 16 seconds.
insane.
>>
>>107227807
I think its mostly been skipped to go straight to video with sound. The potential is there, but there's a lack of trainable, quality base architectures.
>>
>>107227807
>Is it because there's not as much demand?
Audio is difficult to work with. Making the dataset alone is a nightmare. There is demand for audio but I don't think people care if it's local or not.
>>
>>107227833
>Making the dataset alone is a nightmare
Gemini is really good at both describing and transcribing audio. Just need to develop a structure for it to use
>>
Blessed thread of frenship
>>
>>107227816
welcome to the club, fren
>>
>>107227849
Gemini benefits from unfettered access to the entirety of YouTube, at least some of which is manually subtitled.
>>
>>107227816
Is this SDXL or Qwen?
>>
>>107227871
glad to be here
>>107227908
waiNSFWIllustrious-SDXL v150
>>
File: 2169459040.png (863 KB, 1216x832)
863 KB
863 KB PNG
>>107227816
hayai
>>
File: ComfyUI_00065_.png (2.67 MB, 1136x2048)
2.67 MB
2.67 MB PNG
>>
File: ComfyUI_00068_.png (2.59 MB, 1736x1344)
2.59 MB
2.59 MB PNG
>>
>>107228268
cute
>>
File: 00122-3478665731.png (2.43 MB, 1024x1536)
2.43 MB
2.43 MB PNG
>>
File: ComfyUI_00070_.png (2.42 MB, 1256x1856)
2.42 MB
2.42 MB PNG
>>
>>107227816
Post ur mahiro lora bro.
>>
>>107228303
>IllustriousXL - Oyama Mahiro 緒山 まひろ - onii-chan wa oshimai!
seaart.ai/de/models/detail/a42bec290f2cc9aa1c7fd4304a743b1e

for some reason the guy made a lora for every outfit he wears in the anime.
>>
File: 339182120.png (973 KB, 1344x768)
973 KB
973 KB PNG
>>107228287
thx
>>
>>107227636
Why are those family guy toes soooo good?
>>
>>107228482
pony v7
>>
>>107228518
nope, it's my custom lora/model for illustrious i'm working on.
>>
>>107228529
...for pony v7
>>
File: 1763326335439.jpg (3.08 MB, 2048x2048)
3.08 MB
3.08 MB JPG
Where can i find more of this content? Did the anon that made it have a website with more of this stuff
>>
>>107228649
if I knew, I wouldn't tell you
>>
>>107228649
I'm convinced this is just a real photo.
>>
>>107227771
There's an mmaudio nsfw finetune. It's not great and you might need to roll a couple dozen times to get a decent result but it's something.
>>
>>107228649
looks like gpt slop
>>
>>107227636
what base models are people using for 1girl nowdays? is SDXL still the best one?
>>
>>107228885
He was posting it on this thread i think. It was some ai art thread. Like a couple weeks ago
>>
>>107227636
Moar Bonnie from family guy
>>
want some?
https://files.catbox.moe/ramfs2.webm
>>
>>107229130
I hope that's yogurt.
>>
>>107227816
nice, reminds me of when i upgraded from local to saas. now i'm generating images at 4k in 4 seconds with seedream 4
>>
File: 1734303060436.png (737 KB, 1572x773)
737 KB
737 KB PNG
>>107229153
Oh wow that's really cool anon! Say, can you generate yourself some rope to hang yourself with? Go kys in 4k.
>>
WANfags, is there a video version of t2i low res gamblegen to i2i high res workflows? What is the realistic upper limit on quality for 3DCG/anime gens?

>sequenced together ~30 seconds of video with SDXL > 2.2 FUN INP FLF2V > 2.2 FUN VACE (transitions)
>i2v fine details tend to stay noisy.
>hand moving to point at camera, faces near edge of frame, eyes, etc.
>VACE color shift is pretty bad and motion transfer leaves a lot to be desired. am using q8 though

suggestions on the best v2v method to improve detail/quality? I don't want to replace the character or motion/expressions, just basically mimic the way you seed hop a handful of lowres SDXL gens then upscale the best one. Do i inpaint and pray the color/brightness saturation stays the same or do a full v2v with the lowres as a controlnet?

>T. 16GB VRAMlet and 32GB RAM (96 more in the mail)
>pic unrelated, just a wan-animate mix i had the HF generator make while i downloaded models
>>
File: 1747532360082405.jpg (722 KB, 2016x1152)
722 KB
722 KB JPG
>>
>>107229245
>WANfags, is there a video version of t2i low res gamblegen to i2i high res workflows?
why? t2i is dogshit fotm every time
>>
>>107229261
>t2i
*t2v
>>
File: 4176290622.png (667 KB, 1344x768)
667 KB
667 KB PNG
>>
>>107229237
>that image
kek
>>
>>107229261
>>107229271
gpt sama, if you process the rest of the context tokens you will see i am an i2v chad. I am only referencing t2i in that you use it to shit out several SDXL gens before i2i upscaling. agreed t2v is retarded, but my point is the workflow of low>high quality is the same
>>
>>107229237
>Grok Imagine is absolutely amazing!

I tried making a local only community on X. As of now it still has 1 member
>>
File: 3335339010.png (918 KB, 1216x832)
918 KB
918 KB PNG
>>
File: nxyz_Nov11-1763331678_0.png (2.62 MB, 1400x1704)
2.62 MB
2.62 MB PNG
1girl? 1girl.
>>
>>107229420
*sniffa*
can you do one with soles n ass?
>>
File: 86192421.png (900 KB, 896x1152)
900 KB
900 KB PNG
>>
File: 638826101.png (809 KB, 1216x832)
809 KB
809 KB PNG
>>
>>107229420
More Bonnie
>>
File: sarc_.png (2.06 MB, 1600x1220)
2.06 MB
2.06 MB PNG
>wake me up when it's over
>>
>>107227636
Has any of you guys managed to run ComfyUI on MacOS? It's trying to run CUDA, the instructions to run ComfyUI with MacOS are A MESS, what the fuck am I supposed to do? It's all confused.
>>
>>107229909
I didn't even know MacOS could even run Comfy or gen at all. Good luck brother.
>>
>>107229942
I just want to build a Window machine at that point but GPUs are so expensive I don't see the point.
>>
so what's the skinny on this kapersky video model? nobody talking about it doesn't sound promising
>>
>>107229769
Can you add Louise in next to her? Same pose? Maybe kissing?
>>
>>107229909
You're better off running ComfyUI in docker or using a virtual machine with linux than trying to get it to work properly on MacOS.
>>
>>107230001
Gotcha, I guess I'll build an AI Linux machine then.
>>
>>107229955
>kapersky video model
fucking kek
>>
>>107230025
you still need to buy a gpu -_-
>>
>>107230045
Yeah I know... at that point I'll just bite the bullet an get a decent GPU.
>>
File: patrician.jpg (444 KB, 768x1344)
444 KB
444 KB JPG
im considering training a lora using screen shots from HBO rome. does this work well? or if i wanted harry potter outfits i could do the same thing?
>>
File: nxyz_Nov11-1763338785_0.png (2.56 MB, 1400x1704)
2.56 MB
2.56 MB PNG
>>107230239
Screenshots work. Just make sure they're crisp and don't have a whole lot of motion blur going on. If you just wanted to do outfits you need to crop out any faces/heads.
>>
i heard that i shouldnt be using the taggers that come with OneTrainer. should i be using joycaption or something else for loras?
>>
File: nxyz_Nov11-1763327106_0.png (1.97 MB, 1400x1704)
1.97 MB
1.97 MB PNG
>>107230304
https://github.com/jhc13/taggui
This uses joycaption by default. I recommend it.
>>
>>107229420
>>107229958
Bros... I think that was too much.
>>
>>107230279
>If you just wanted to do outfits you need to crop out any faces/heads.
you mean just a closeup of the outfit? wouldnt that cause everything i generate to make heads out of frame?
>>
>>107230321
>wouldnt that cause everything i generate to make heads out of frame?
Yes if you over train. It's a balancing act.
>>
>>107230319
Worth it
>>
Fucking slow-ass VAE on AMD, aaaaaaaaaaa
>>
where are the yume gens
>>
>>107230321
To add to what I said, if you don't want to crop out faces just make sure there's different people wearing the same outfit. That should work as well. Just have the captions focus on the clothes.
>>
>>107230318
ill take a look then, thanks.
>>
>>107230452
>>107230318
It's not by default but you have to select it from the internal model downloader, but what he said. Also you'd want to load the 4bit version if you have <16G vram
>>
>>107227737
Anyone?
>>
>>107230509
No one knows. Probably not for a long time considering WAN 2.5 was released in September.
>>
>>107230509
2.5 already released. it's api only. I wonder what all the other releases will be?
>>
>>107230523
I refuse to try cloud.
Is 2.5 any good though? How does it compare to 2.2?
>>
>>107230534
No idea. As you said, I do not care about API/Cloud shit, so it may as well not exist to me.
>>
File: 1757702020726448.jpg (192 KB, 1339x673)
192 KB
192 KB JPG
Why did they even make a category for Wan 2.5? Literally no one uses it. Maybe they were confused and thought it was available locally.
>>
File: AniStudio-08181.png (1.59 MB, 1088x896)
1.59 MB
1.59 MB PNG
>>
>>107230580
wrong thread
>>107225374
>>107225374
>>
>>107230585
you forgot the rest of the tranime posters
>>
>>107230585
Shut up nigger
>>
>>107230594
you were the one that made /adt/ in a vain attempt to kill /ldg/. now fuck off back to your containment thread to shill your ui
>>
>>107230608
Wrong guy. I did create the first /sdg/ schism thread though.
>>
>>107229705
sovl
>>
>>107230551
Same as Seadream or Sora 2.
It's for API use where you give them shekels.
>>
>>107229408
>>107229526
nice wish these were higher res
>>
>>107227737
>new wan
the next big thing wont be from alibaba
>>
File: ComfyUI_00133_.mp4 (1.04 MB, 480x640)
1.04 MB
1.04 MB MP4
>>
>>107231150
Not using your russian garbage mate
>>
>>107231165
NTA, but why can't you see the very obvious writing on the wall that Alibaba is pulling out of local video generation?
>>
please amd just invest more into rocm already

i don't want to buy novideo
>>
>>107231226
Explain why.
>>
>>107231245
Can you provide any examples of Chinese AI companies close sourcing the next step in their open source models and then turning around and going back to open source later on?

Best we can expect in the future is maybe a fine tune of 2.2 or more meme shit like wan animate.
>>
>>107231259
Chinks quitting local video won't give us magically another model from somewhere else.
>>
>>107231157
nice I thought it was going to be the same old breast bounce slop
>>
File: ComfyUI_00076_.png (2.27 MB, 1392x1688)
2.27 MB
2.27 MB PNG
>>
>>107231165
>>107231245
I don't care about Kavinsky, but name a single lab or organization who released something wildly successful/popular and didn't pivot to closed source or completely fuck up all subsequent open releases.
>>
>>107231266
>Chinks quitting local video won't give us magically another model from somewhere else.

You might be confusing me for someone else. All I'm saying is Wan(Alibaba) is likely pulling out of the open source race for the foreseeable future.
>>
>tfw you forget the settings you used in some good gens you had

thank god im a gen hoarder
>>
>>
>>107230493
It's by default for me. Dunno what to tell you. Maybe because I'm on Linux, idk.
>>
>>107231288
I'm not complaining but why are you img2imging my gens? lol
>>
>>107231389
Lol don't pretend you are an artist. You are using other peoples work. Still, thanks for your gens
>>
>>107231389
He does that to some anons gens. No idea why.
>>
>>107231412
I just asked a question. I don't care if you claim you made them yourself or if you lied and told people you hand drew them. Just curious.
>>
>>107231412
>You are using other peoples work
This is the most dimwit take I always see when people argue against AI generated art, how about you learn some art history before writing those stupid arguments.
>>
>>107231325
>tfw you load up an old wf and the output is completely different
dam updoots
>>
>>
>>
>>
>>107231452
nice balls
>>
File: 1738851475862262.png (1.82 MB, 1024x1024)
1.82 MB
1.82 MB PNG
>>107231325
qwen-image-edit-2509
>"make a collage of this model with different angles; left, right, above, below, etc."
>>
>>107231440
>This is the most dimwit tak
>said by an anon who posted the most basic 1girl instagram gen ever
pottery
>>
>>107231525
>said by an anon who posted the most basic 1girl instagram gen ever
>implying you could generate the same type of image
lel, please try it
>>
Can we all agree that Alibaba actually will release wan2.5 next year?
>>
File: ComfyUI_00078_.png (2.52 MB, 1392x1688)
2.52 MB
2.52 MB PNG
>>
>>107231165
did a russian steal your girlfriend or something?
>>
>>107230239
also consider spartacus series
>>
So how do I get the AI to undress pictures?
>>
>>107231649
by saying please and thank you
>>
>>107231649
Qwen image edit with some nudity lora probably but never did that.
I prefer making wan videos of girls undressing instead.
>>
File: 1732050247528002.png (1.41 MB, 1024x1024)
1.41 MB
1.41 MB PNG
>>107227636
What are the current rules of uploading loras of people (ie celebrities).onto Huggingface? Do they care about them being uploaded there?
>>
>>107231675
Celebs are a no go on HF or Civit. I miss when you could but I totally get why they stopped allowing it. Swifties really ruined that for everyone.
>>
>>107231687
Damn.... Do ANY on-site generator sites support it? I hear Sea.AI allows Lora uploads but I'm unfamiliar with their TOS
>>
>>107231661
>Qwen image edit with some nudity lora
Okay so I'm really behind on this AI stuff because while I can figure out Qwen image edit, I think, I don't know what a nudity lora is or how to apply it.
>>
>>107231738
Nobody here will handhold you from zero to 100. Check the guides in the OP. You can also search civit for additional guides, of varying quality.
>>
>>107231738
>nudity lora
they were all nuked from the interwebs
>>
>>107231702
>Do ANY on-site generator sites support it?
support a potential federal crime? none of them
>>
File: 1734685397189775.jpg (366 KB, 1024x1024)
366 KB
366 KB JPG
>Lain Diffusion General
>no Lain
>>
>>107231738
https://civitaiarchive.com/search?base_model=Qwen&platform_status=deleted&rating=explicit
Check here for deleted loras. I have all the deleted ones but I haven't actually gotten around to using Qwen Image edit at all yet, so I don't know how well they work.

If you need a lora and it has no mirror just ask and I can re-upload
>>
>>107231805
Do you have this one?
https://civitaiarchive.com/models/398093?modelVersionId=444280
>>
>>107231782
you should visit and post in the lainch ai image thread btw
>>
>>107231818
Sure do! Sent ;)
>>
>>107231822
I forgot that was a thing, thanks
>>
>>107231818
>pony model from 2 years ago
No. I only have XL, WAN, Chroma & Qwen loras. I've never even used pony before.
>>
>>107231836
Me neither, I assumed it could only generate horses.
>>
>>107231836
Illustrious is better than Pony, imo especially when it comes to prompting.
>>
>>107230368
you are using cpu
>>
>>107231858
there's absolutely no reason to ever use pony currently. even ponyv7 was a complete waste of time.
>>
Do you guys miss SD1.5?
>>
>>107231865
score_9, score_8, score_7, score_6, hires, absurdres, high quality, best quality,
1girl, (thumbs up:1.2), large breasts
>>
File: 1744747857794680.jpg (358 KB, 1106x580)
358 KB
358 KB JPG
>quiet
>only 3k downloads
Your average NSFW wan 2.2 lora gets more downloads than that lmao. What a sad fate for Pony.

>>107231876
No? It's not like it went anywhere. The model is still there and you can use it anytime you want.
>>
>>107231876
No because it's still readily available and never went away.
>>
File: 1735789039020470.jpg (231 KB, 1024x1024)
231 KB
231 KB JPG
>>
>>107231876
Can it do Lain?
>>
File: 1757728222121249.jpg (228 KB, 1024x1024)
228 KB
228 KB JPG
Last one sorry, okay bye
>>
>>107231950
with a lora, yes.
>>
>>107227636
>Kandinsky drops a new image edit model
>Still no Comfy or any signs anywhere on how to run this thing
>>
File: 1752406989267238.png (689 KB, 1068x746)
689 KB
689 KB PNG
wtf reddit is based?
>>
ramtorch status?
>>
>>107232292
why do you think ram prices have been going up?
>>
>>107232134
Where do Russians get their GPUs from anyway?
>>
is it possible to run an img2vid model with a 12gb vram and 16gb ram?
>>
>>107232405
using quantized models, sure. is it worth your time is another question
>>
>>107232411
I'm cool with like..maybe 5 minutes per gen
>>
>>107232405
>16gb ram?
I have 48 and it is often not enough. Go grab a 64+ kit before the prices go completely retarded.
>>
File: WanVideo2_2_I2V_01396.webm (3.25 MB, 704x1280)
3.25 MB
3.25 MB WEBM
domina...
>>
>>
>>107229237
kek, vramlets keep taking the hits
>>
>>107229237
Most SaaS moment ever for me was when I was trying out Udio and I put in the lyrics to the Allen Toussaint song "Southern Nights", trying to get a cover of it, and it said "cannot generate, copyrighted lyrics detected" or something to that extent
>>
after about a week of using chroma I have to say that it might be slightly better at realism with 1girl gens but its coherency seems to break down big time as soon as you move towards something more complex than that. Flux still has plasticky skin and lacks realism but it's better at prompt adherence. I think, with loras, flux likely beats out chroma - though flux still has difficulty with genitalia
>>
kandinsky 20B is crazy

https://files.catbox.moe/erjcru.mp4
https://files.catbox.moe/ixbint.mp4
https://files.catbox.moe/enbhmg.mp4

https://github.com/Ada123-a/ComfyUI-Kandinsky
>>
File: drinking alone.png (1.06 MB, 832x1216)
1.06 MB
1.06 MB PNG
>>
>>107232708
I2V? T2V? If latter, does this mean it knows NSFW?
>>
>>107232764
T2V, yes
>>
File: 812853472.png (758 KB, 1216x832)
758 KB
758 KB PNG
>>
File: demon_sbs_1sec_2sec.mp4 (479 KB, 1440x480)
479 KB
479 KB MP4
more kandinsky 20B, it also knows gore btw
>>
>>107231805
So these need to be plugged into ComfyUI while using Qwen, somehow....
>>
>>107232364
No idea, but don't forget Russians made the best image search engine to date before it was neutered (Yandex), so by that logic they likely have insane image models.
>>
>>107232772
It seems inferior to wan 2.2 from what I have seen but this might give it a niche then.
>>
>russians end up making the best video gen model because they want to produce the most believable propaganda material

It's always war that brings out the best of our ingenuity.
>>
>>107232829
it has MUCH better motion than wan 2.2 and knows more and is 24fps, and its video quality is better. The 2B is incredible for its size as well, better than wan 2.1 14B imo
>>
>>107232708
what is comfy waiting for? why isn't he implementing it?
>>
>>107232851
>The 2B is incredible for its size as well, better than wan 2.1 14B imo
yeah right
>>
more kandinsky 20B
https://files.catbox.moe/pyoe7d.mp4
https://files.catbox.moe/efcrps.mp4
https://files.catbox.moe/pq5c8g.mp4
https://files.catbox.moe/3s4ot0.mp4

third one is a hybrid merge with the distill model's blocks
>>
>>107232839
>best video gen model
Every model we've gotten so far is still a toy compared to Sora 2 though. I want to know, how much copyright does this Russian model know? Can you name characters or celebrities? I know that Russians can do it, but I won't get my hopes up yet unless I see them release a model that can do stuff like this
https://files.catbox.moe/3jmasp.mp4

It would be a huge win for open source even if really hard to run.
>>
here is some 2B ones
https://files.catbox.moe/b31taq.mp4
https://files.catbox.moe/5xhjed.mp4
https://files.catbox.moe/ovddcz.mp4
>>
>>107232870
this, Sora 2 set the bar so high I'm not hyped by local video models anymore
>>
I might have bothered to check it out if it wasn't so slow to run.
>>
>>107232886
working on distill merge, you can get it working at 16 steps>>107232864 the third one here is the hybrid
>>
>>107232881
>not hyped by local video models anymore
I object that Ovi 1.1 is a step in the right direction, though still far from Sora 2's raw knowledge and capabilities, but the issue so far is that it needs scaling.
>>
File: 3642812852.png (826 KB, 1216x832)
826 KB
826 KB PNG
>>107230619
This style does have a certain *je ne sais quoi* to it.
>>
File: sdfsdfsdfssgggg.jpg (128 KB, 1280x720)
128 KB
128 KB JPG
"reee, ive been genning this one image for days for a good loop, ill do one final gen and then give up.."
>gen turns out almost perfect

Never give up, anons.
>>
>>107227636
what is the difference between this general and /sdg/ exactly
>>
>>107232864
can it do 1frame images like wan or is it just for vids?
>>
>>107233195
image is just 1fps video
>>
>>107233234
I think the question is still valid, as if it will produce anything coherent.
>>
This thread is the local diffusion general, that thread is stable diffusion general.
>>
>>107232708
i see goofs commits, where are the GOOFS GOOFS
>>
>>107233286
I have link in WF but https://huggingface.co/Ada321/Kandinsky-5.0-T2V-Pro-sft-5s-Q4_K_S.gguf

also working on a distill merge using 2B distill's single blocks
>>
>>107233307
>q4
>t2v
bruh
>>
>>107233311
this T2V model does porn out of the box, and you can kind of use it as a I2V model as well
>>
File: 1748680521628398.jpg (3.5 MB, 6554x1990)
3.5 MB
3.5 MB JPG
>>107233307
>Q4
get that shit off me!
>>
>>107233316
also I already have FP8 support that auto converts and blockswapping, gguf is for vramlets
>>
>>107233307
Does it run in comfy or I shouldn't bother for now?
>>
>>107233346
that is the whole point of >>107232708
its even named comfyui
>>
>>107233352
What text encoder does it use?
>>
>>107233359
the WF has download links for everything but qwen VL 2.5
>>
>>107232708
>the russians will save us
never expected that to be on my bingo card lmao
>>
>>107233368
sorry, it has links for that as wel., I meant: but it is qwen vl 2.5
>>
>>107233325
gguf q8 is superior to fp8 cope conversion, like there's no comparison.
>>
>>107233417
this
>>
>>107233417
my upload speed is painfully slow, just uploading 4bit takes more than a hour
>>
maybe once I get my distill + 20B pro merge I'll gguf that for both 4 and 8 bit
>>
File: AnimateDiff_00002.png (462 KB, 736x496)
462 KB
462 KB PNG
>>107233436
https://litter.catbox.moe/x6aom4h5gylsleww.mp4
ok bro what the fuck is this shit (catbox because 10mb)
>>
>>107233451
oh wait I have to run distill models at 1cfg
>>
>>107233451
could be 1 of 3 things
1. Not using correct res for 2B, 2B needs 768 x 512, 20B is more loose
2. Not using correct scheduler scale, like in tool tip use 5.0 for 5s models, 10.0 for 10 sec / I2v.
Or 3. You are using the wrong variant with a not matching model
>>
File: file.png (815 KB, 1729x1142)
815 KB
815 KB PNG
>>107233472
https://litter.catbox.moe/stz3hd6bbv8gkgkw.mp4
yeah I also changed the scale, sadly no knowledge of migu, also still a bit artifacty
I'm not sure about the prompt following capabilities
>>
>>107233493
hmm, use 5.0 scheduler scale
>>
>>107233493
2B is a bit weaker prompt following than wan 14B with 5 cfg, 20B is about the same but has better motion / visual quality. Plus 24 FPS is nice. It also knows more such as full nudity. 2B is far more limited there. Still lightyears better than wan 5B
>>
For new 2024/2025 model is flux schnell still the best option for vramlets?
>>
>>107233507
yeah figured as much, but someone was claiming 2b is wan 14b tier sooooooooooo yeah
>>
>>107233510
its not that far off and being only 2B it would be perfect for a big finetune
>>
>>107233516
im sorry bro but im gonna wait for q8 goofs of the big model, it better fucking know migu.
>>
>>107233493
Where do you get the distill? Also 5s/10s are step versions?
>>
>>107233694
they're all linked in the HF page, ask the totally not rep here in the thread. I'm experimenting with the 2B SFT model at 50 steps now.
The 2B variant doesnt know:
miku
niggers
black people
indians
Hopefully the 20B has more knowledge
>>
>>107233703
its 2B nigger, its knowledge base its gonna be its weak point, its motion quality and ease to train is what is great about it. You could do a big full on porn finetune at full precision on local hardware
>>
>>107233731
it doesnt do violence? I asked for a punch (2b sft 50 steps) and nothing. I SLEEP
>>
>>107233757
20B does full on gore
>>
>>107233759
ok bro can you or your fucking team produce the q8 goofs? WHERE ARE THE Q8 GOOFS
>>
>>107233763
>team
do you think I'm the one who made the model? sorry to disappoint. I am making the gguf though
>>
there its uploading, now give it like 4 hours
>>
>Almost 2026
>Best local model we got is still fucking SDXL
Dead hobby.
>>
is there a tool/workflow/UI plugin yet that checks inconsistencies of a character or whatever in a set of images?
>>
>>107233827
you could ask any vision model
best models for that would be qwen3vl and gemma3
>>
>>107233316
>this T2V model does porn out of the box
So it's uncensored with Chroma?
>>
>>107233840
uncensored like chroma*
>>
>>107233860
>>107232708
>>
File: 17633669472942.jpg (338 KB, 1248x1824)
338 KB
338 KB JPG
>>
File: 629.jpg (47 KB, 382x334)
47 KB
47 KB JPG
Do I pick sft or pro if the filename has both?
>>
>>107234649
pro
>>
File: images(1).jpg (19 KB, 457x437)
19 KB
19 KB JPG
>>107227636
Are we getting netayume garbage again in the OP? I thought we were over that
>>
>>107234762
fuck off retard
>>
>Bump stable portable to cu130 python 3.13.9 by @comfyanonymous in #10508
Does this break anything or can I pull?
>>
>>107234845
the last time they went from cu128 to cu129, it resulted in massive performance loss. it's been fixed, but i'd be skeptical of updating. wait a few days and check github for issues people have before updating.
>>
>>107234871
It's a two week old update so idk. Haven't been here in a while.
>>
>>107234879
there's almost no benefit from upgrading cuda. furthermore, cuda 13 only benefits blackwell gpus. if you don't have a 50xx gpu, then it is pointless.
>>
File: ComfyUI_00280_.png (842 KB, 1280x1120)
842 KB
842 KB PNG
>>107234773
>>
File: ComfyUI_00260_.png (1.21 MB, 1280x1120)
1.21 MB
1.21 MB PNG
>thread infested by 1girl half baked anime garbage
Yep, its NetaYume time
>>
File: AnimateDiff_00079.mp4 (1.17 MB, 480x480)
1.17 MB
1.17 MB MP4
Using the firstmiddlelast frame method with just the first frame hooked up seems to be nice.
>>
File: WAN2.2_00612.mp4 (3.85 MB, 832x624)
3.85 MB
3.85 MB MP4
>>
>>
File: AnimateDiff_00086.mp4 (3.26 MB, 480x480)
3.26 MB
3.26 MB MP4
"the chimpansee with a huge grin on his face submerges himself under the water he is in then the camera follows airbubbles surfacing from where the monkey submerged and towards the end the chimpanzee emerges very fast with a huge splash out of the water and scares the camera"

ok fuck you then
>>
File: WAN2.2_00617.mp4 (3.69 MB, 544x960)
3.69 MB
3.69 MB MP4
>>107235596
seems fine to me
>>
any vibe voice experts? i am trying to use a voice to narrate a story, and it actually changes voices for the characters based on context of the prompt. if i say "she has a high pitched voice" it will make the quotes high pitched. all i ma using is a 10 second wav file from a video game, this is incredible
>>
lol, thought some of you might get a chuckle out of this. I surely did.
>>>/v/726067598
>>
>>107233814
bro wheres it
>>
>>107235669
Yeah I got that too earlier. But it's meant to track him as he is getting closer to you for an attack. Doesn't seem to understand it.
>>
>>107233014
Love it when that happens. On the other hand
>almost perfect
The closer to perfect a gen is the more annoying the imperfections are.
>>
>>107235477
Kino
>>
>>107232870
dalle 3 (a 2023 model) is still superior to any local image gen model in terms of "knowing things" but that doesn't stop people from using them

I think "Sora 2 at home" will still happen in terms of a multiscene video model with audio, but I don't think we'll ever get a local video model that knows everything
>>
>>107235731
The problem is not "using AI" per se, but retards making terrible use of AI and not knowing which model to use and how to prompt the slop away
>>
i playing with uncensored seeddream 4.0, i honestly can't go back too using flux, qwen or chroma. Local seriously needs a new uncucked image generation model for photorealism.
>>
>>107235881
whats what i think. ai can be used to do al lthe bitch work like rendering, the old masters used groups of students to paint for them, the master would plan out the scene and prepare it, the students would do the bulk of the work, then the master would fix the hands and faces and stuff. ai should be used the same way, but right now companies are using people who cant draw to generate images that look barely good enough to sell a product.
>>
>>107235533
grift chink killed comfy fennec and comfy clapped
>>
>>107235920
I can't draw yet I can make better use of AI than most normies in those companies by actually having common sense and taste
>>
>>107236015
>the proompter artist
kys
>>
Any advice on fixing weird eyes? The rest of the picture is fine, but I always struggle with eyes, is it a matter of using the right LoRA? I tried one for eyes specifically and it looked even worse.
>>
>>107235910
Is that supposed to be impressive? It has the AIslop look and her mirror reflection doesn't even look right.
Go shill somewhere else, chang
>>
>>
>>107236023
Not just "prompting", I actually know my tools and the models I am using, I know that inpainting and controlnet exists, I can make proper use of the editing models, and I train my own Loras
>>
>>107236058
I 1-shot my coom gens or my work related gens
kys "artist"
>>
>>107232851
>The 2B is incredible for its size as well, better than wan 2.1 14B
hard X to doubt on that one bud
>>
File: ComfyUI_temp_hqpsf_00004_.png (2.21 MB, 1664x1216)
2.21 MB
2.21 MB PNG
>>107236025
look into a facedetailer node if you're using comfy
>>
>>107236072
There are lewd photos of women (real and 2D) in the internet with bodies in all shapes and sizes in all sorts of positions, why bothering dedicating AI "sessions" to it if it's not deepfakes? Why bother doing gacha with mangled bodies and plastic skin?
>>
>>107236107
I'm a Macfag and ComfyUI is a mess to set up, stuck with easy-diffusion for now.
>>
>>107235178
that's just standard i2v retard
>>
File: flux_0183.png (1.23 MB, 832x1216)
1.23 MB
1.23 MB PNG
>>107236116
yeah i got nothing there, the idea is the face is too small to be properly detailed, using a detailer scales up the area, denoises it then scales it back down
>>
Are you guys ready to get mogged by Nano Banana 2 and cry knowing local models will never ever be on that level?
>>
File: flux_0131.png (1.07 MB, 832x1216)
1.07 MB
1.07 MB PNG
>>107236113
>nooo don't have fun with technology
>>
>>107236189
Why do you people act like we care about closed models? You guys have your own general for that.
>>
>>107236180
Thanks, played around with LoRA and prompts, got much better results already, but I appreciate the tips!
>>
File: 1732019918825691.jpg (1.82 MB, 1632x1632)
1.82 MB
1.82 MB JPG
That second controller is waiting for me...
>>
Any idea why the results are like this?
Tried to remove the sage attention, the resize, light loras. Makes no sense.
>>
>>107236189
ohnononono localbros how will we cope when the bubble crashes and SAASissies lose their APIs???????
>>
>>107235477
based angel cop enjoyer
>>
>>107236469
the problem is you're using the moe ksampler with 4 steps so it's probably doing only 1 step on high. that's my guess anyway
>>
File: wan2.2_00104.mp4 (669 KB, 480x528)
669 KB
669 KB MP4
>>107236499
It's worked with that low steps before. It's nothing like I've seen before.
>>
>>107236475
>bubble crashes and SAASissies lose their APIs
You do realize the message you are replying to is mentioning a Google model, right?
>>
File: flux_0009.png (962 KB, 832x1216)
962 KB
962 KB PNG
>>107236206
people will say literally anything for a (You)
>>
Can Chroma or Qwen do text in the exact font I prompt?
>>
>>107236469
-not only do you have the sigma_shift value set to 0.9 (should be 5.00) on the moe ksampler but you also have the modelsamplingsd3 nodes which are redundant because that is what sigma_shift is on the ksampler
-use euler or lcm for your sampler
>>
cozy bread
>>
File: wan2.2_00114.mp4 (1000 KB, 480x528)
1000 KB
1000 KB MP4
>>107236544
>mfw the shift was the issue

I'm swapping back and forth between different samplers and encoders, completely blind to the shift nodes..
Thanks, bro.
>>
File: wan2.2_00134.mp4 (316 KB, 480x496)
316 KB
316 KB MP4
"the man is wearing a winter coat inside a windowed clinically clean room and raises hid hand to start pointing with his index finger and starts to chuckle for a moment as if he heard a good joke"
>>
>>107236547
that would mean comfyorg dies too kek. local just can't win
>>
>>107237290
meant for >>107236475
>>
What does local lose if comfy goes under though...? I still have my models and hardware
>>
>>107237290
>noooooooooooooo not the avatartroon software
genuinely who cares?
>>
File: wan2.2_00141.mp4 (293 KB, 480x304)
293 KB
293 KB MP4
"a young boy is looking at a laptop monitor facing away from the viewer and he is very shocked and very afraid, his expression then turns to disgust and he then looks at the camera and spins the laptop around revealing a bright laptop monitor displaying an image with the text (/ldg/:1.2) as his expression remaind disgusted."
>>
>>107237323
kek
>>
>>107237310
>What does local lose if comfy goes under
less shitty webapp slopapps is something to be celebrated
>>
>>107237323
>remaind
>>
>>107237323
Screen should say UOH! instead.
>>
Does anyone use the enhanced lightning v2 model? I'm using wan2gp and there's a default option then a enhanced lightning v2 option. Can't find anything that even talks about the latter though
>>
>>107237319
does he even post here?
>>
Fresh when ready
>>107237888
>>107237888
>>107237888
>>
>>107237896
kill yourself. i'm baking
>>
File: grok video master race.mp4 (2.1 MB, 560x560)
2.1 MB
2.1 MB MP4
>>107237290
>local just can't win
HIDE LOCAL THREADS
IGNORE LOCAL POSTERS
NEVER REPLY TO LOCALFAGS
(this post was made by API gang)
>>
>>107237907
The collage is okay so it's fine
>>
>>107237963
>sneaking in tranistudio
kys
>>
new real
>>107237999
>>107237999
>>107237999
>>107237999
>>
>>107238007
Fuck off
>>
File: 1760817124070810.png (1.86 MB, 1280x1313)
1.86 MB
1.86 MB PNG
>discordian schizo troons are desperately trying to disrupt every thread I lurk
>literally all day for the rest of their miserable lives (3 months)
>>
where can i find more nsfw wan loras that arent on vitai or the archive? i see there are some on hugging face but its hard to find something through the search



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.