[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Everyone Is Giving You Wrong Answers Edition

Discussion of Free and Open Source Diffusion Models

Prev: >>107784474

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/kohya-ss/sd-scripts
https://github.com/tdrussell/diffusion-pipe

>Z Image Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>WanX
https://github.com/Wan-Video/Wan2.2
https://comfyanonymous.github.io/ComfyUI_examples/wan22/

>NetaYume
https://civitai.com/models/1790792?modelVersionId=2485296
https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe|https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
>>107787932
thanks for the bread anon
>>
>>107787928
What the hell, are you running the full fp16?
>>
>hates debo
>is just a worse version of him
>>
>>107787967
no, ltx 2 at fp8 + the text encoder at fp16, why the fuck is comfyui asking for so much memory just to load them
>>
>>107787932
>TFT unironically using pony v7 in 2026
How the mighty hath fallen
>>
>>
>>107787984
135GB for FP8?? wtf
>>
Let's pretend LTX2 is awesome so wan niggers open source 2.5
>>
File: ZiMG_00044_.png (2.25 MB, 1280x1440)
2.25 MB
2.25 MB PNG
>>
>>107787932
>a new video model with sound just came out
>no video collages
ltx 2 is that bad? lool
>>
>>107787932
>>Maintain Thread Quality
>https://rentry.org/debo
>https://rentry.org/animanon
hey is there a reason these are in the OP? it's just off topic schizobabble
>>
File: zimg_00232.png (1.39 MB, 864x1280)
1.39 MB
1.39 MB PNG
i did not think this was gonna turn out at all
>>
>>107788014
make the same without tattoos
>>
I waited for LTX2. I wanted something to be better than wan 2.2 so badly, and what we got was something censored at its core and even more resource hungry than wan already is. My smile and optimism: gone.
>>
>>107788014
what? you thought z-image couldn't do 3girls, sitting?
>>
File: img_00597_.jpg (1.17 MB, 1520x1824)
1.17 MB
1.17 MB JPG
>>
>>107788013
the schizo ban evading baker will have a meltdown if we try to get rid of them. it's a reminder that /ldg/ is monitored by someone we originally left /sdg/ for but couldn't handle being an undesirable
>>
>>107788014
now make them look regular and not like turbo whores
>>
>>107788032
it's too big and the censorship part ia making me feel I'm using an API model, I don't want to feel like a cuck on local, fuck this model
https://files.catbox.moe/mgkbzy.mp4
>>
>>107788032
wait until training is possible, we'll see if censorship can't be beaten at least partially
>>
>>107788048
I like this style.
>>
File: 00036-3572371329.jpg (508 KB, 1728x1344)
508 KB
508 KB JPG
>>107787897
Yes the finetune was the issue. lexivisionII is better.
>>
>>107788051
why can't he just take the hint already and fuck off?
>>
ltx2 kissing audio quality?
>>
>>107788090
he insists on posting hit zit dogshit 2girls that look like a shitty photoshop layer filter
>>
>>
okay thats enough samefagging for now anonie
>>
>>107788070
>lexivisionII
what's that?
>>
File: TensorArt_00046_.png (1.18 MB, 768x1536)
1.18 MB
1.18 MB PNG
>>107788013
He is a literal rdrama fag. He used to link to rdrama threads way back when there was just one /sdg/ thread, thinking anyone actually gave a literal fuck about that juvenile bullshit.
>>
>>107788103
is the donkey gonna be okay? please gen a follow up where the donkey is okay
>>
>>107788102
at least it isn't uggo gilfs he normally makes
>>
>>107788108
https://civitai.com/models/1607200/lexivision-ii-and-lexivision-z
>>
File: zimg_00240.png (1.33 MB, 864x1280)
1.33 MB
1.33 MB PNG
>>107788029
>>107788034
>>107788053
i happen to like turbo whores
>>
>>107788115
what a disgusting cretin. nice image btw
>>
>>107788127
>an experiment with generation of synthetic data using SDXL LexiVision AIO VNSFW
>This is another merge of two private models that are being used to generate synthetic training data for other models.
jeet levels off the charts in orbit even
>>
who are you guys talking about? I'm new here
>>
File: img_00602_.jpg (1.28 MB, 1520x1824)
1.28 MB
1.28 MB JPG
>>107788069
James Jean. Picked it up decently, but could be better.
>>
Has anyone tried the new version of Zeta-chroma?
https://huggingface.co/lodestones/Zeta-Chroma/blob/main/zeta-chroma-x0-proto.safetensors
>>
>>107788162
Purchase an advertisement
>>
>>107788145
the baker that the mods refuse to do anything about
>>
>>107787984
mine loads with about 10gb free, I think it just highballs it and asks for 90% of your ram regardless of what it actually needs
>>
File: 1754725129850677.png (893 KB, 776x518)
893 KB
893 KB PNG
>>107788162
lodestone talked some techno babble i didnt understand other than it's in "pre training". i think this is from the latest version
>>
File: 00046-2970201777.jpg (413 KB, 1728x1344)
413 KB
413 KB JPG
>>107788140
idc what works works
>>
File: 1742422453750104.jpg (1.06 MB, 4164x1623)
1.06 MB
1.06 MB JPG
>tfw 32 stars and failed
>>
>>107788184
I don't have 150 gb of ram though, only 64, it's using the pagefile, comfyui's memory's management is so fucking ass when it comes to ram
>>
>>107788192
>idc what works works
based philosophy
>>
>>107788133
much better
>>
File: zimg_00246.png (1.37 MB, 864x1280)
1.37 MB
1.37 MB PNG
every time i want to post new loras civit is down, this is horseshit
>>
>>107788195
what failed?
>>
File: 1737464964720605.png (555 KB, 1280x720)
555 KB
555 KB PNG
>>107788133
>turbo whores
oh that's why it's called Z-image turbo!
>>
>>107788210
tell me about it
>>
>>107788226
it's not down
>>
>>107788162
So is there a workflow for this or can we use the regular chroma workflows? And no, not joining or ever using discord
>>
>>107788190
desu I think we have more chance to undistill turbo and make it good than finetuning the base and make it as good as turbo, they've hidden the RLHF process details on their paper and that's for a reason, it's an important secret sauce
>>
File: img_00611_.jpg (1.01 MB, 1520x1824)
1.01 MB
1.01 MB JPG
>>
File: 1766529041028215.png (1.85 MB, 1216x1728)
1.85 MB
1.85 MB PNG
>>107788269
>finetuning the base
what base?
>>
>>107788116
he's just taking a nap
>>
File: 1738203192151811.png (415 KB, 857x1200)
415 KB
415 KB PNG
>>107788269
>>107788285
people are still hoping for base to be released in the year of our lord 2026
>>
>>107788190
>random 'ecker cameo
>>
does anyone know how to re-enable automatic vhs preview? after the update, the vhs preview no longer appears automatically. fucking comfyui team
>>
>>107788054
https://files.catbox.moe/wdipti.mp4
>>
File: zimg_00177.png (1.77 MB, 864x1280)
1.77 MB
1.77 MB PNG
>>107788236
hell yeah brother
>>
>>107788241
models can't be rated/released
https://civitai.com/changelog?id=129
>>
>>107788338
i'm sorry for lying then :( get well soon
>>
File: 1763068878769611.mp4 (929 KB, 1280x704)
929 KB
929 KB MP4
5070ti with 64gb of ram here. default settings comfy workflow ltx2 took 225 seconds from button click to "It's raining it's raining", subsequent gens take 104 seconds (the text encoder offloading is painful)
>>107788154
>James Jean
based. now that you mention it i recognize it. i almost bought a bottle of johnny walker blue label just because it had a dragon drawn by him on it
>>
Does anyone have a workflow to generate a character in different outfits and poses?
>>
File: STOP BREAKING THINGS.png (310 KB, 736x736)
310 KB
310 KB PNG
>>107788305
and they removed the search bar on the settings so you can't even write "preview" to find what you want quickly, those Ui jeets are so incompetent I swear to fucking god
>>
File: 1746145017111317.png (53 KB, 1040x496)
53 KB
53 KB PNG
>>107788305
>vhs preview
i dont know what that is but you activate preview in the settings now
>>
Blessed thread of frenship
>>
>>107788393
nigger
>>
>>107788398
kek
>>
File: z-image_00024_.png (1.74 MB, 864x1280)
1.74 MB
1.74 MB PNG
>>107788348
ty fren
>>
>>107788013
those are our lolcows / schizos and there is a rich history as to why those rentries are needed newfren
>>
File: 1766163613788790.jpg (11 KB, 350x341)
11 KB
11 KB JPG
>>107788359
I still see it
>>
>>107788133
>>107788323
https://files.catbox.moe/ufzzwb.mp4
those powerpoint (((glitches))) are so annoying, it removes the fun of that model
>>
>>
https://www.reddit.com/r/StableDiffusion/comments/1q5jgnl/ltx2_runs_on_a_16gb_gpu/

it is indeed possible on 16gb
>>
File: z-image_00017_.png (1.75 MB, 864x1280)
1.75 MB
1.75 MB PNG
>>107788420
i'm gonna wait at least a week before i bother with that model
>>
File: we're so back.png (1.41 MB, 1673x1082)
1.41 MB
1.41 MB PNG
>>107788435
>powerpoints on 16 gb
let's goo!!
https://files.catbox.moe/bf7l3w.mp4
>>
and there you have it
https://files.catbox.moe/yuswxf.mp4
>>
>>107788409
nice try lolcow
>>
File: that was good.png (351 KB, 619x599)
351 KB
351 KB PNG
>>107788477
>in the soon
lmaooo
>>
>>107788477
>in the sun
are you sure this is a good idea Kim Jong Un? :d
>>
>>107788477
>re-erased soon
>>
>>107788495
that I can believe
>>
>>107788477
his mouth is blurrier than his eyes
can you feed this thing encoded audio?
>>
>>107788477
https://github.com/huggingface/transformers/pull/43100
you have no idea how much I want GLM image to be actually good so that we can move on from (((Alibaba))) once and for all
>>
>>107788434
very wet
>>
>>107788477
>even gets the engrish right
good model.
>>
File: 1745067912305775.png (87 KB, 301x168)
87 KB
87 KB PNG
>>107788477
In the end, he can't even pretend not to find it funny lool.
>>
>>
File: found an hidden gem.png (409 KB, 736x907)
409 KB
409 KB PNG
>>107788477
that's why I lurk this place, it knows the news before anyone else and there's some hidden gems here and there
>>
Hey guys, I'm looking into some Wan2.2 video gens. I'm looking at the list of diffusion models and there's like 30 different ones. Is there any info anywhere regarding what the differences are?

Like: high noise vs low noise? bf16 or fp8?

Please and thank you.
>>
File: 1755557513394055.jpg (2.04 MB, 7961x2897)
2.04 MB
2.04 MB JPG
>>107788560
>Is there any info anywhere regarding what the differences are?
>bf16 or fp8
>>
>>107788477
lol
catbox?
>>
>>107788560
there used to be a wan 2.2 rentry on the OP to help newfags out, don't know why it got removed though
>>
>>107788585
it already is a catbox anon
>>
>>107788560
If you're using ComfyUI then start with the example workflow and adapt it to your needs once you get it working. Wan 2.2 consists of a high noise model followed by a low noise model so you need both. Just download what the example workflow tells you to download.
Hope you have at least 16 GB VRAM.
>>
>>107788574
Okay how do I go about recreating the image for ZIT? Just do it in a straightforward way and that'll be fine?
>>
Is Nikolas' relative here? I have a question.
>>
I am new to ComfUI and installed everything from the wan22ldgguide and it works fine. I used Wan2GP before that and there was a continue video option. I have no clue how to do that with ComfyUI. Is there workflow for that? Can someone please point me in the right direction? Any help would be appreciated.
>>
Why is kij saying his 24gb 64gb ram setup is not eating pagefile when it is for basically everyone else?

Like I don't care. I can just make a fuck huge pagefile on my nvme but why?
>>
could someone share a working ltx 2 folder for comfyUI portable?
>>
>>107788657
linux vs windows memory mgmt maybe?
>>
>>107788597
32GB, I'm looking at this page, https://comfyanonymous.github.io/ComfyUI_examples/wan22/ from the OP, and it quickly puts me in a page with a lot of different wan2.2 models. I see, high noise to give it some broad strokes then low noise to refine the high noise output, in a nutshell?

Reading between the lines, use fp16 if I can.

What are the inpaint, camera, and controls models for?

wan2.2_ti2v_5B_fp16.safetensors
looks like a non-specialised but easy to begin model?
>>
File: 1750892061697776.jpg (998 KB, 3791x1623)
998 KB
998 KB JPG
https://huggingface.co/lightx2v/Qwen-Image-Edit-2511-Lightning/tree/main
slop vs slopitty slop
>>
File: ComfyUI_01038_.jpg (425 KB, 1792x2304)
425 KB
425 KB JPG
>>
>>107788477
https://files.catbox.moe/eukq6x.mp4
Come on dude, they even powerpoint the jeets
>>
>>107788699
Don't use 5B, it'll output crap. Especially since you have a 32GB card.
Open ComfyUI, click Templates, select Wan 2.2 14B Image to Video. Download what it tells you to download.
Don't do anything else until you get that working.
>>
>>107788587
it was dated and poorly written
>>
https://files.catbox.moe/ill7c8.mp4
a checkpoint in $current_year that can't do text really feels like a regression
I mean the video is no wan2.2 either, but at least it's fun
>>
>>107788734
I answered my own questions after looking at the templates list (i must have an old comfy its only got 2.1)

inpaint - fill in the gap from a start and finish image,
control net, use an existing clip as a guide for the prompt,
camera? don't know
>>
>>107788760
https://files.catbox.moe/ovc59x.mp4
yeah the text is absolute dogshit, at least the audio is consistent
>>
another Q, if i bash a folder with the comfy github will it erase missing files from the source (delete all my models)?
>>
>>107788734
Thank you by the way, you've actually been a great help!
>>
File: 1741299157759971.png (2.16 MB, 2000x3008)
2.16 MB
2.16 MB PNG
>>107788552
close but no cigar
>>
File: 1746986739567675.png (59 KB, 1296x394)
59 KB
59 KB PNG
>>107788809
btw, who does the image input has to be jpeg compressed for it to work? that's dumb
>>
File: file.png (3 KB, 172x70)
3 KB
3 KB PNG
>hear about ltx2
>open comfy
>check if there's a ltx2 template already
>oh cool there is
>click
>pic related
Where do I find an actual workflow?
>>
>>107788886
update, it was released today
>>
>>107788886
what? I'm using the template and it's not using api nodes
>>
>>107788760
Are you prompting anything related to style or era? Zoomer here, it looks like it could've come out of the 2000's or something.
>>
How do i get free ComfyCredits (tm)?
>>
>>107788880
kek
>>
LTX2 is the only model I can think of that actually cannot generate a miku t2v.
>>
>>107788898
nope, pretty barebones prompt you can check yourself
I get the feeling that a lot of the training data was sitcoms, it kinda pulls it in that direction
>>
>>107788906
kandinsky can't either
>>
>>107788032
They released the full model didnt they? You can probably train it in just need a horny chink or saudi oil baron lol
>>
File: 1739756314986780.png (7 KB, 366x188)
7 KB
7 KB PNG
>>107788906
the size, the powerpoint (((feature))) and no Migu is gonna kill the model, I feel I'm doing an humiliation ritual, having to spare more than 100gb of pagefile just to run this garbage
>>
File: 1741551712526781.png (898 KB, 1504x1000)
898 KB
898 KB PNG
>>
https://files.catbox.moe/unkrlq.mp4
>>
>>107788950
Lol
>>
Zit is actually inferior to Qwen/Flux 2 if you're trying to do anything multi-subject. It's retarded like SDXL
>>
>>107788967
when you go for NAG + boomer prompt it works fine
>>
>>107788950
migu :(
>>
File: chroma_00002_.png (1.78 MB, 768x1344)
1.78 MB
1.78 MB PNG
sometimes you need to dust off chroma and get some 1girl asian footpics to remind you what it's all about
>>
>>107788936
That's a lot to criticize the model for, but the powerpoint thing is a skill issue.
>>
https://files.catbox.moe/o5et5j.mp4
>>
File: 1742075422770515.png (912 KB, 1504x1000)
912 KB
912 KB PNG
>>
File: ComfyUI_00995_.png (1.21 MB, 896x1152)
1.21 MB
1.21 MB PNG
>>
File: 1751364144687926.png (918 KB, 1062x698)
918 KB
918 KB PNG
>>107788988
>>
>>107788999
https://files.catbox.moe/93vp5x.mp4
went from 0.5 megapixels (49 sec) to 0.92 megapixels (1.23 mn) and it's definitely better, the biggest strength of that model is the speed
>>
>>107788973
>NAG
I need the node link, cant find it
>>
File: 1741331799918699.png (88 KB, 967x844)
88 KB
88 KB PNG
>>107789048
https://github.com/scottmudge/ComfyUI-NAG
go for those settings for Z-image turbo, it's really important
>>
Is the MultiGPU anon a schizo trying to rugpull with a crypto miner? Comfy already has off-loading to RAM.
>>
>>107789076
>Comfy already has off-loading to RAM.
and it's shit, and when an automatic feature is shit, you go for the manual one
>>
I have no fucking idea why my RAM and pagefile is doing when running LTX 2. Sometimes it's comfortably around 40-60gb and other times it's maxxed out and raping my nvme drive and there is no pattern to it.
>>
>>107789031
we need an AI board more than ever, it's gonna be annoying to always have to upload videos through a catbox, c'mon 4chan, get your shit together
>>
https://files.catbox.moe/9mgyue.mp4
>>
Anon talk about nsfw T2V but don't use Kandinsky 5 pro? X)

Kijai made a cumfart version

No LoRA, 24fps, 100% nsfw.
>>
File: 1588629222545.gif (933 KB, 220x220)
933 KB
933 KB GIF
it's crazy how useless this thread is when you need help...
>>
File: file.png (102 KB, 1998x681)
102 KB
102 KB PNG
>>
>>107789182
30% of the time, it is because you are bad at asking for help and no-one is bothered to figure out what your problem even is
>>
just woke up, is it still not possible to run this ltx2 shit on a 4090?
>>
>>107789182
Most of the questions asked here could be answered by an LLM
>>
File: 1746417982909469.png (227 KB, 500x378)
227 KB
227 KB PNG
>>107789165
>100% nsfw.
I don't believe you, show a video then
>>
>>107789161
>>107788760
literally no different than "gibs buzz pls". it's tired.
>>
>>107789196
this was generated on my 4090
https://litter.catbox.moe/vehmzaqnfbs4xlxj.mp4
>>
>>107789196
you can run on a 24gb vram card, but you have to do some manual offload to make it work, like
--reserve-vram 4
>>
File: 1743801234553335.png (3.67 MB, 1248x1872)
3.67 MB
3.67 MB PNG
>>
https://files.catbox.moe/kpo5em.mp4

Having to upload this shit to catbox is annoying.
>>
File: 1765510403004917.png (1.05 MB, 1504x1000)
1.05 MB
1.05 MB PNG
>>
>>107789208
How much time?
>>
>>107789241
NTA but shit is ludicrously fast. Under 2 minutes on a 3090 at 720p and 200+ frames.
>>
>>107789182
let me guess you were the anon who didnt know how to
 pip install 
things
>>
>>107789212
wish i knew about this earlier. i have it at 1.5 but i can surf 4chan, watch porn and youtube videos while running heavy models all at the same time
>>
>>107788886
>see where ltx/light tricks is based
>see pic

lol, lmao
>>
>>107789241
nta but on my 3090 with the distilled version I have this time
>0.92 megapixels
>125 frames
>8/8 [01:22<00:00, 10.33s/it]
it's really fast, the slow part is loading/unloading the models unfortunately :(
>>
>>107789216
Nice.
>>
>>107789249
you also have multiGPU if you want to some manual offloading
https://github.com/pollockjj/ComfyUI-MultiGPU
>>
https://files.catbox.moe/q44ttz.mp4
>>
>>107789252
>>107789245
Pretty good!
>>
File: 1749629675089253.png (74 KB, 186x271)
74 KB
74 KB PNG
>>107789280
my ears...
>>
>>107789280
kekd
>>
>>107788936
>no migu
i2v miku in then
>>
It's cool that with edit model, you can make your character do an "A-pose", edit to get the other sides, and then generate a 3D model. I wish it came with textures though...
>>
https://files.catbox.moe/ub810c.mp4
>>
>>107789303
yeah but even on i2v I had my fun by making miku appear on the screen, can't do that with that model, sad
>>
audio + image to video with ltx2 is fucking insane btw. And these take less than 2 minutes to gen. Could be ever better with more time im sure.
https://files.catbox.moe/eea5wn.mp4
https://files.catbox.moe/wunip1.mp4
https://files.catbox.moe/m3tt74.mp4
https://files.catbox.moe/k29y60.mp4
>>
>>107789307
kek
>>
>>107789308
loras work, there is nothing stopping a lora from doing off screen mikus once they are made
>>
>>107789271
I thought about running multiple GPUs like with an RTX 4000 pro but I would need a new PSU
>>
>>107789309
>https://files.catbox.moe/wunip1.mp4
ok that's pretty cool
>>
>>107789309
so that's a yes to >>107788510 ?
>>
>>107789309
I had thought sora 2 would be some massive 1T model or something, but ltxv2 has made me think it could be closer to like 50-100B
>>
>>107789305
Which image edit model and which model for 3d?
>>
>>107789325
WF: https://files.catbox.moe/f9fvjr.json
>>
>>107789208
do I need to modify the template workflow a lot to make it work?
>>107789212
>--reserve-vram 4
how much vram would that reserve?
4gb or 400mb?
>>
>>107789326
sora 2 is much better though, and with way more pop culture knowledge, and what LLMs made me learn is that you need for the model to be big to remember all those concepts
>>
>>107788133
based turbowhoremaxxer
catbox/prompt?
>>
lol, comfy fixed the previews
he disabled them
>>
Is there a general on this site for video generation? I remember a few threads on /gif/. Or is this the only thread?
>>
>>107789329
I use my finetune to get a specific type of non-flat color 3d model (but its Qwen Edit so 30~GB of GPU RAM). Alternatively use Grok Edit, that works too and you get like 1000 images per hour?
>>
>>107788585
based 4chanXtard
>>
>>107789326
No it could still be an autoregressive LLM model of like 1T, but probably MoE so it doesn't cost so much it's impossible to run.
>>
>>107789341
He thinks on a level beyond the average programmer. He finds solutions for problems others deem unsolvable.
>>
>>107789332
>how much vram would that reserve?
4gb, you can take my workflow if you want, I removed a lot of useless bullshit from the official template (fuck upscalers)
https://files.catbox.moe/lclc9t.json
>>
>>107789344
I meant which model that generates 3d models from images. All the ones I've seen before weren't very good and I don't think I've seen any that take 4 images as input.
>>
We are living in the future.

https://files.catbox.moe/t3emm7.mp4
>>
>>107789309
>And these take less than 2 minutes to gen. Could be ever better with more time im sure.
Since the current workflow can't use sageattention somehow
>Error running sage attention: Input tensors must be in dtype of torch.float16 or torch.bfloat16, using pytorch attention instead.
yeah it can be even better
>>
>>107789362
spell it phonetically, smoke-you-lease
>>
>>107789355
Hunyuan3DMV. It generates voxels then smooths them out, so you might holes. I then go sculpt mode in blender to fix it up.
>>
>>107789363
I meant the quality could be better with way more steps but that too. Still a lot of speed that could be had as well. LTXV2 is black magic in that regard
>>
>>107789371
>LTXV2 is black magic in that regard
they tried to get speed by compressive the vae latents on some shit on ltvx1 but it didn't work out, glad to see they didn't give up on that idea, it looks better than Wan 2.2 while being much faster, that's what I love to see, actual architecture improvements, and not just "stack moar layers bro"
>>
https://www.reddit.com/r/StableDiffusion/comments/1q5k6al/fix_to_make_ltxv2_work_with_24gb_or_less_of_vram/

there is a fp8 gemma encoder too it seems, 12gb
>>
Still remains to see how it trains but for some reason people last night were freaking out because it kept generating indians.
>>
File: greenland.png (3.49 MB, 1824x1248)
3.49 MB
3.49 MB PNG
>>
https://files.catbox.moe/qoooe4.mp4
>>
>>107789387
fp8 gemma encoder only works on the LTXV2 nodes WF btw, naitive comfy does not support it yet, it just loads it as fp16 anyways
>>
>>107789396
>"death to all kikes"
>uses a model made in Jerusalem
kek
>>
>>107789309
desu if that model can be pruned to 14b and have its censorship layers removed it can definitely be a Wan 2.2's replacement
>>
>>107789182
Yeah, its pretty terrible. Its usually filled with combative elitist contrarians who'll tell you how something they never built actually works. The only helpful advice from ldg was a suggestion to use chroma last year. Mind you, its the same place who will throw an isle 7 tantrum when their favorite variation of a model doesn't get released on time. I come here for the keks now.

>>107789192
>>107789201
kek
>>
>>107789387
>>107789400
fp8 on text encoders have never been a good idea, I'm waiting for its Q8 gguf
>>
>>107787989
>falling for obvious shitposting
>>
>>107789406
https://files.catbox.moe/41k7n2.mp4
>>
https://files.catbox.moe/s6k434.mp4
it can do trump's voice really well lol
>>
>>107789458
lmao, true true
>>
>>107789458
I assume everyone who's ever asked for a nsfw TTS is either retarded or pretending to be retarded
>>
https://files.catbox.moe/jmus4n.mp4
it's so biased towards indian shit :(
>>
>>107789483
you can prompt the accent, dialect, even the tone
it's actually an impressive tts on its own. In my limited testing it knows angry, sarcastic and worried
>>
>>107789352
thanks bro, I'll try it out.
>>
The LTXV WFs are WAY better btw https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/example_workflows
You get much better results, something in comfy is broken
https://files.catbox.moe/mpd5u9.mp4
https://files.catbox.moe/lvnhqk.mp4
https://files.catbox.moe/htkh8y.mp4
https://files.catbox.moe/55tof3.mp4
https://files.catbox.moe/jgj4md.mp4
>>
how much better would the tech have been if everything wasn't python?
>>
>>107789540
I dunno, save an extra litre of water per thousand gallons
it adds up
>>
>>107789518
the ltx workflows all crap out at the text encoder stage for me.
>>
>TXVGemmaCLIPModelLoader
No files matching pattern 'tokenizer.model' found under E:\ComfyUI_windows_portable_nvidia-latest_20\ComfyUI_windows_portable\ComfyUI\models

you piece of shit, I have gemma_3_12B_it.safetensors
>>
ltx gens regularly have extra limbs, this was never an issue with wan 2.2
>>
>>107789540
pyTorch is C++
>>
>>107789547
use the fp8 encoder, and switch out the vae decode, their WF is made for 5090 but can work on 4090
>>
File: 1752570522354587.png (223 KB, 3235x1201)
223 KB
223 KB PNG
>>107789518
:(
>>
workflow says to use:

text_encoders/
comfy_gemma_3_12B_it.safetensors

and the workflow link doesnt have it:

https://huggingface.co/google/gemma-3-12b-pt/tree/main
>>
>>107789518
>something in comfy is broken
pottery
>>
>>107789548
you have to download the entire gemma 3 folder and put that inside of it.https://huggingface.co/google/gemma-3-12b-it/tree/main
you dont need the checkpoints from this, just all the smaller files
>>
>>107789555
all niggardry that surrounds it is python. 3/4 of pytorch is python abstractions. it's so bad comfy made a separate module just for skipping over python abstractions
>>
>>107789540
not better, everything actually important is not in python
>>
>>107789568
That's just the API. The heavy work is done in C++ and Cuda.
>>
>>107789560
>>107789567
the ltxv2 WF's need the full gemma folder and the checkpoint put inside of it
>>
>>107789547
I had to start comfy with "-reserve-vram 4" flag
>>
>>107789567
>inside of it.
inside of what? I make a new folder on the text_encoder folder?
>>
>>107789562
How do I do that on windows? It only says git
>>
>>107789578
then it's not pytorch anymore, it's libtorch
>>
>>107789584
no, put the comfy_gemma_3_12B_it.safetensors
inside of the https://huggingface.co/google/gemma-3-12b-it/tree/main folder
>>
>>107789562
I use this: https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/blob/main/gemma_3_12B_it_fp8_e4m3fn.safetensors

you can also use

https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it.safetensors
>>
>>107789592
no
>>
>>107789572
>>107789578
why is the API 3x more code than the actual kernals?
>>
File: 1760967533693706.png (182 KB, 3633x1429)
182 KB
182 KB PNG
>>107789601
>https://huggingface.co/google/gemma-3-12b-it/tree/main
is this a fucking joke?
>>
>>107789567
thanks, was worried I had to download 30gb of shit again
>>
>>107789583
That works fine for the cumfart workflows. Not so much for the ltx ones.
>>
https://www.reddit.com/r/StableDiffusion/comments/1q5r23b/comment/ny3fedo/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
>Create a subfolder inside the text_encoder folder, for example "gemma3fp8", move the gemma12b fp8 file inside the newly created subfolder, and download all the json files from this this link (google's huggingface page) inside the newly created folder where the fp8 file is.
>>
>>107789621
try with the fp8 text encoder?
>>107789603
>>
>>107789613
yes, now choose your preferred pronoun and safe word anon
>>
>>107789609
Doesn't matter.
>>
File: 1767748979750677.png (509 KB, 1062x698)
509 KB
509 KB PNG
>>107789382
>it looks better than Wan 2.2 while being much faster, that's what I love to see, actual architecture improvements
>>
I've never seen an application that makes people jump through so many hoops like this before. it's truly aweful
>>
>>107789613
get the files here I guess
https://huggingface.co/unsloth/gemma-3-12b-it/tree/main
>>
>>107789629
You could find any number of high motion wan frames to clown on the model too.
>>
>>107789615
and now:

ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

wtf
>>
>>107789628
you don't find it off the python wrapper API is more bloated than the actual inference kernals? I thought python was supposed to be convenient
>>
only anons who used early LLMs ooba would understand actual annoyance
installing ltx2 is nothing
>>
>>107789629
I already answered to you about that >>107789031
>>
>>107789609
jeets like this >>107789628
>>
>>107789641
ah, disable the prompt enhancement node, you gota do a whole load of stuff to get that working
>>
>>107789641
nm it's the stupid enhancer node.
>>
>>107789643
It doesn't matter the heavy lifting is in the kernals. Everything else is just API. There's nothing that can't be provided by Python fast enough.
>>
>>107788435
>16gb
it works
https://files.catbox.moe/mqkjkn.mp4
>>
>>107789659
>>107789661
it just works for me with 24GB vram
>>
File: 1757065898988566.png (239 KB, 2382x1169)
239 KB
239 KB PNG
>>107789636
>https://huggingface.co/unsloth/gemma-3-12b-it/tree/main
it should be working now if you do this shit, btw the "model.safetensors" is this file
https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it.safetensors
>>
Oh. AND USE Res_2s NOT ULER!!!
>>
Do I just have some problems with my workflow's prompt adherence or is LTX just not trained on choking? Or gut punches?
>>
can I get a basic workflow without all this enhancer garbage by these retards who think a BASIC WORKFLOW should add all this crap, it wont work with it bypassed.
>>
>>107789678
bro why is it so slow to load the encoder, it's taking ages to fill up my ram
>>
>>106977329
is it possible you could reupload this? sorry for replying to something ancient, i've been looking for something that works to do this
>>
File: DXW1RM9WAAA1BbM[1].jpg (38 KB, 1000x581)
38 KB
38 KB JPG
https://files.catbox.moe/sipc7m.mp4
>>
>>107789702
>>107789352
>>
Is there a point in using gemma3 12b abliterated?
https://huggingface.co/mlabonne/gemma-3-12b-it-abliterated
>>
>>
>>107789722
if we're lucky, using abliterated would remove the censorship and the powerpoint shit lool
>>
>>107789702
>LTXVGemmaEnhancePrompt
Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

and if I bypass the enhancer:

CLIPTextEncode
mat1 and mat2 shapes cannot be multiplied (1024x62208 and 188160x3840)

I wonder how retarded normies figure out comfy
>>
>>107789712
kek
>>
>>107789736
>I wonder how retarded normies figure out comfy
They don't, they use grok.
This stuff is too technical and cutting edge for most normies.
>>
>>107789664
you keep saying this but it's literally a pile of shit all over and on top of it. just make bindings because we don't need any of this fucking python shit.
>>
>>107789748
@grok is this true?
>>
>>107789664
>It doesn't matter the heavy lifting is in the kernals
if it doesn't matter why force python in order to use it? why make the utilities for processes in python? you say it doesn't matter but that would mean using python at all is pointless
>>
>>107789352
this is loading/working normally unlike the shit template workflow in comfy, waiting to gen but it's working, along with --reserve-vram 4 in launch options

update it comfy anon
>>
>>107789768
it's for researchers first
>>
>>107789768
What a pointless discussion.
Because researchers in ML use python since forever, it's the de facto language used, and no amount of bitching from a random anon will change that.
If you are trying to argue that without python the generation would be faster, that's just false.
>>
late to the party, can I run ltx 2 with 16gb vram and 32gb ram?
>>
>>107789518
>RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select)
sigh... I guess it won't work if you use --reserve_vram right?
>>
>This model will only work on a 5090 at the bare minimum
>Actually err, a 24gb gpu will work.
>Well maybe 16gb
>okay 8gb is fine too

how does this happen every time?
>>
>>107789795
>>107789798
if researchers put out a model worth a damn they wouldn't need to waste time learning python. that's how you know a code assist model is garbage, it's training and inference is written in python
>>
fresh when ready
>>107789820
>>107789820
>>107789820
>>107789820
>>
>>107789798
you are clearly are a junior intern or something
>>
>>107789822
why do you bake early? what the fuck is wrong with you?
>>
>>107789823
I'm sure I am anon, secretly we all use cobol.
>>
>>107789847
obsession over his male crush
>>
>>107789854
kek
>>
>>107789854
cuda uses fortran not cobol

>>107789861
samefag. it really isn't that funny when he had the low hanging fruit
>>
>>107789794
the anime girl runs towards the camera and says "miku miku miku" in a cute anime style voice.

sticky anon's workflow cause the comfy template one sucks dick and this works, also this model is pretty fast, 16GB 4080 64GB ram and it works just fine.

proof it works: https://files.catbox.moe/v1qdpo.mp4
>>
>>107789823
>>107789865
mr catjak doesn't have a junior position, he can't get a job
>>
>instant crying
>>
Qwen Edit 2511
"Remove the jacket. Keep everything else the same"
It doesn't remove the jacket but it zooms out and put the girl at half scale in the middle. What causes this?
>>
>>107789880
hope Mr catjak feels better soon
>>
>>107789918
if you read the last 15 minutes of posts it appears that someone is crying because of "Mr catjak"
>>
File: Qwen2511Edit.png (145 KB, 1001x391)
145 KB
145 KB PNG
>>107789912
>>
i remember hearing news over the past few days about a z-image lora fix inside of either z-image or ComfyUI.
not the lora that fixes other loras, but something inherent that was resolved in either the model or the GUI.
i was working when i read about it.
anyone know what it was, or am i hallucinating?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.