Miku Edition Discussion of Free and Open Source Diffusion ModelsPrev: >>>/g/107791088https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/musubi-tunerhttps://github.com/kohya-ss/sd-scriptshttps://github.com/tdrussell/diffusion-pipe>Z Image Turbohttps://huggingface.co/Tongyi-MAI/Z-Image-Turbo>WanXhttps://github.com/Wan-Video/Wan2.2https://comfyanonymous.github.io/ComfyUI_examples/wan22/>NetaYumehttps://civitai.com/models/1790792?modelVersionId=2485296https://nieta-art.feishu.cn/wiki/RY3GwpT59icIQlkWXEfcCqIMnQd>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>Illustrioushttps://rentry.org/comfyui_guide_1girlhttps://tagexplorer.github.io/>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe|https://litterbox.catbox.moe/GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkBakery: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/r/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Quality
wrong board
BASED
>>6067153this is nicer and nobody will ask me to install gentoo
>>6067724lul
I still don't get why /g/ doesn't let you upload video with sound, it's retarded
>>6067725
>>6067729can you try and see if ltx 2 can do some ytp kino like sora 2
>>6067731>ytp kinoqrd
>>6067732>>6067731I mean in the context of sora
>>6067734this model sure loves to do some powerpoint shit, I wonder if going for an abliterated version of gemma 3 could fix it
>>6067740Oh for sure, I'm getting a fuck ton of powerpoints and posting the least bad ones
When base
>>6067746they're saying boo-urns
>>6067746>When baseif they don't release it before Chinese's new year (Feb 17, 2026) it's definitely over
fucking powerpoints
>>6067746
>>6067750I give you the original image input so that you can get a better result kek (we can't upload images on this place? this sucks wtf)https://files.catbox.moe/1jwczb.jpg
>>6067756thanks, already had this one
>>6067750>>6067757absolute kino, love those ending transitions
https://www.reddit.com/r/StableDiffusion/comments/1q6zb57/comment/nycrhpl/seems like it's working better on Wan2GP
migu left :(
>>6067773lmaooo, I guess you tried to stitch the videos together by going for the last frame but it's getting more and more horrific for each iteration kek
>>6067774got so bad miku left the video and made me end it, svi when
>>6067773>migu leftcatch her back! without the sacrifice we won't get Z-image turbo!
>>6067768i'm having a tinker with it, I'm upping the res and frames each time but the resource usage never moves, could it be infinite?
https://github.com/modelscope/DiffSynth-Studio/commit/0efab85674f2a65a8064acfb7a4b7950503a5668Oh, looks like we'll finally get it!https://files.catbox.moe/lney3m.JPG
>>6067795>https://github.com/modelscope/DiffSynth-Studio/commit/0efab85674f2a65a8064acfb7a4b7950503a5668oh shit it's from Modelscope, finally something is happening
waow
>>6067795I thought they would've released it right before Chinese's new year, but if it's sooner than that I'll definitely take it, gimme gimme gimme
did I do something wrong?
>>6067807sounds correct to me
>>6067795
>>6067803>>6067811those powerpoints zoom/de-zoom is killing this model, without that it would be way more fun to play with
>>6067803
>>6067789>could it be infinite?there has to be a resource usage increase, but maybe they found some tricks to make it minimal, this is a huge deal desu
>>6067823seed lotto or is this a good prompt
>>6067845starting to notice this music in a lot of videos
>>6067863Probably the most generic suspense sounds all mashed into one homogeneous suspense slop.
>>6067859My first try. I am using qwen 8b to enhance the prompt other than that it's the standard comfy flow for the distill model.
>>6067789it happenedlooks like the limit is 960x960_240, or more frames for fewer pixels and vice versapretty good, especially considering comfy won't even try at 832x480_121
>ltxv2>input picture of woman>prompt her to say something and do a simple action>every single gen it hangs on the static input image for several seconds while audio plays then the last second it cuts to show an unrelated woman doing the action I prompted (while also being garbled slop)what the fuck gives?
is chroma better than lumina?
>>6067970Side grade
>>6067970lateral step
>>6067964>what the fuck gives?they censored the model, so we're getting the API cuck treatement, but in local!
bruh
good enough I guess
>>6068165>>6068167it's terrible when the movement is fast, not a big fan of the blurry shit lol
gens really shouldn't look this fake in year of our Lord 2026. Even on good rolls everything always goes a bit blurry. Colours change. Weird motions.Is it comfy's fault?
>>6068170its ai slop but the sound makes it funny
>>6068171>gens really shouldn't look this fake in year of our Lord 2026.I agree, Z-image turbo showed that you can make good and small models, the others need to learn a thing or two from Tongyi
>>6068171>dried cum moving when she moves tummy
>>6068171I mixed height with width, grim
>>6068179eh
>>6068171>>6068181are you using the upscaler? if yes, remove that shit and go for a vanilla render with more pixels (like 0.9 megapixels)
>>6067150d*bo status?
>>6068189you can't post images there so you can't be an avatarfag, we're safe from those fuckers lol
I want to try this out even if I'm a 16+32 ramlet What's the best UI to pick up?
>use LTX-2 to create the audio>then use Wan 2.2 S2V with the audio for better video qualityI'm too lazy to set it up but someone should try this.
>>6068462wangp
>>6068462Pinokio + wan2gp if you are lazy and/or have no idea what you are doing. ComfyUI for more speed, but you need to learn a few things first.
>>6068165lmao
breh why
i give up
ltx hates migu
>>6068647>>6068641>>6068639>>6068625wtf 4chan supports audio now?
>>6068650Only chad boards like /wsg/ do
>>6068651i thought i was on /g/ lol
Not what I asked, but kinda cute ngl
cozy bread
>>6068656is this real?
>>6068639SONGIK
>>6068655moar
>>6068655Would watch.Hiroshima is a greedy gook.
Attempt 1
>>6068655Attempt 2
>>6068655
>>6068655Ok this is awesome.>Glad you could bake it, Uther.
>>6068675
thanks for the powerpoint
Just heads up that if you aren't using the Q8 ggufs yet, you might want to consider it.
>>6068696link?
>>6068698https://huggingface.co/Kijai/LTXV2_comfy/tree/main/diffusion_models
>>6068702>>6068700thank
>3 difsferent thread sbruh
>>6068704this is the shelter from schizos plus we got audio
desu, I'm finding having the audio ready and genning the i2v over it gives some pretty awesome results.
>>6068702>>6068696Getting " LTXVEmptyLatentAudio 'VAE' object has no attribute 'latent_frequency_bins'" when I run it, any idea what's going on?
>>6068721Looks like you are trying to use GGUF model?Its not yet fully supported, the GGUF model loader node needs an update.Either wait a little bit with patience until its out in the public version, or use git to pull the prAlthough you have a VAE error, maybe you connected something wrong with the VAE loaders (if you have already the PR update from the GGUF model loader)But thats beyond my "pay grade", not sure why, will leave that for the experts ;-)https://github.com/city96/ComfyUI-GGUF/pull/399
i just got my 5090 last week and am paranoid about those 12vhpwr issues so i run it at 70% power the whole time. it keeps it dead silent at least. need to look into undervolting
>>6068709ear torture
>>6068733you did buy the asus with pin monitor right? if not you better get the debaur cable that does it, else good bye gpu
>>6068736no, that one was out of stock and almost a $1000 more expensivei am literally just YOLOing it
powerpoint again gee
>>6068737
>>6068739RIP finety-ninety
>>6068739that would be just my luck and wouldn't surprise me desu.
>>6068736>debaur cable that does itIt's a pretty cool product to a self imposed problem by nvidia, but it doesn't support FE card, sadly.
>>6068744grim
>>60687383 things that help : - the compression node LTXVPreprocess + node that changes resolution Resize Images by Longer Edge- the node LTXVImgToVideoInplace- long and detailed prompts
>>6068745It's fine, mine is running almost every day and there was no problem last I checked a month ago when adding ram.It's capped at 460W though, which gives roughly the same performance while consuming 20% less.
>>6068171
What causes powerpoints?
>>6068774doesn't happen on their api
>>6068784i undressed this bitch yesterday
dunno why the dramatic pause at the start
>>6068790isn't it obvious?
sloppa
lmao
quick test that's one feature down I was really hoping for, now just video continuation with audio encoded
you're really loving this inspector gadget shit despite ltx have piss poor special effects
>>6068821True story
>>6068821prompt adherence too, unless the person is facing the camera they won't speak most of the time, at least that's my experience as you can see in the previous and this one
>>6068828yes, getting the correct person to speak the correct lines has been the bane of my gensI wish I knew the trick because sometimes all it takes is 'the person on the right' and others it needs their social security number and mothers maiden name to specify them
>>6068829ltx regional prompting
>>6068699this is fucking sick
wish it could do good 2d, maybe some lora will save us
>>6068675Just in case someone misses on /g/, this quant of gemma3 is stupid fast on 50xx series https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/blob/main/gemma_3_12B_it_nvfp4_uncalibrated.safetensors
>>6068851will try thanks king
>>6068851shidwanted to post a better one
see what I mean about the sitcom vibe
>>6068774>What causes powerpoints?they censored their model, desu it ruins the fun for me, I feel like an API cuck who's told what's acceptable to generate or not
>>6068851>fp4 on a text encoderbruh... using fp8 on the text encoder is already a bad idea but fp4 is on another level lmao
>>6068702>https://huggingface.co/Kijai/LTXV2_comfy/tree/main/diffusion_modelsyou have to activate this PR to make it work right?https://github.com/city96/ComfyUI-GGUF/pull/399
>>6068851my shit is not working with that, can you share your workflow?
>>6068859is this real?
>>6068870ye
>>6068851Pretty decent considering its 1/3 the size.Left is og, right nvfp4
https://civitai.com/models/2292336/ltx-2-nsfw-text-encoder-gemma-3-12b-abliterated?modelVersionId=2579572looks like the censorship/powerpoint shit can be removed if we go for the uncucked version of gemma 3?
>>6068895but it powerpoints even normal shit
>>6068897>it powerpoints even normal shitthat's false positives, the model is so censored it sees anything as NSFW
>>6068898lmao if true
>>6068903is that ace-step?
>>6068906nope, its ltx2 disobeying my prompt and making music lol
https://github.com/city96/ComfyUI-GGUF/pull/402there's a way to go for gguf's of the text encoder now
>>6068733same, but watercooled. I run it at 80% and have a max of 500w. Thinking about going lower.
>>6068928why
>>6068929I get 95% performance for - 100w.
>>6068870>https://github.com/city96/ComfyUI-GGUF/pull/399>>6068925>https://github.com/city96/ComfyUI-GGUF/pull/402to get both PR on your repository you can do thisgit checkout -b temp-test-branch maingit fetch origin pull/399/head:pr-399git fetch origin pull/402/head:pr-402git merge pr-399 -m "Merge PR 399"git merge pr-402 -m "Merge PR 402"
>>6068877u might need to upgrade to cu130 mby? ive not tried ltx ever but needed it for ZIT nvfp4 to work properly otherwise it would output a bunch of geometric shapes
>>6068936what each video is supposed to represent though? what are you comparing to?
>>6068937just seeds
>>6068936does i2v not care about portait dimensions?
fun stuffthis has ruined WAN for me, I need audio now
>>6068939no
>>6068932you also have to go to this ComfyUI\custom_nodes\ComfyUI-GGUF\loader.py file and change this code>sd, arch = gguf_sd_loader(path, return_arch=True, is_text_model=True)to>sd, arch, metadata = gguf_sd_loader(path, return_arch=True, is_text_model=True)and you're good to go, you'll be able to use GGUFs on both ltx2 and its text encoderand don't forget to download the tokenize.modelhttps://huggingface.co/unsloth/gemma-3-4b-it/blob/main/tokenizer.modeland put it on ComfyUI\models\text_encoders
>>6068952thank you
>>6068932>>6068870>>6068952it's amazing how much better at memory efficiency GGUF is compared to .safetensors, before I was peaking at 130 gb of ram usage, now it doesn't go over 80gb
So we need kijai and gguf repos to update?
>>6068967what do you mean exactly?
>>6068940I just can't go back to 10 gens for slo-mo tiity bounces. LoRas for LTX can't come fast enough
>>6068970*10 min gens
>>6068968my mom will be mad at me if I use unmerged commits, so do i just wait till those two I mentioned merge?
https://github.com/modelscope/DiffSynth-Studio/commit/0efab85674f2a65a8064acfb7a4b7950503a5668https://xcancel.com/bdsqlsz/status/2009566444632334479#m
>>6068973if you don't want to use unmerged commits you'll have to wait yeah
>>6068974zimage
>>6068974
>>6068989kek, the music was pretty decent on that one
>>6068989was THAT ace-step?
>>6068993Just ltx, I don't even have ace installed
>The woman throws her phone on the ground and then runs to jump into a swimming pool.its prompt adherence is kinda garbage, I guess it's because we use the distilled model, but going for the regular one + 20 steps is gonna take too long just for some shitpost desu
Where ace step 2 btw?
>>6068998i forgot
>>6068974>https://github.com/modelscope/DiffSynth-Studio/commit/0efab85674f2a65a8064acfb7a4b7950503a5668looks like only Omni Base will be released first, they also promised "Z-image" which has the SFT finetune to ithttps://github.com/Tongyi-MAI/Z-Image?tab=readme-ov-file#-model-zoo
bro what
>Error running sage attention: Input tensors must be in dtype of torch.float16 or torch.bfloat16, using pytorch attention instead.Do you also have that when making ltx2 videos? can this be fixed at all?
>>6069007i do
>>6069006qrd? is this image from some news or something?
>>6069012i think its the woman who got killed by ice the other day
>>6069007>>6069011https://github.com/kijai/ComfyUI-WanVideoWrapper/issues/1192#issuecomment-3259588742>That error is not coming from these nodes as it mentions falling back to pytorch attention. It's probably from some other model in the workflow, not the model itself that reports it.>It's probably from the audio encoder and it should not matter.
>>6069015thanks king
>died for nothing award
>The woman is playing with her guitarI'm tired of those fucking powerpoint shit, fuck this model
>>6069018just saw the video, goddam she's so fucking retardedhttps://xcancel.com/EricLDaugh/status/2009682363077509453#m
>>6069022welcome to the club
>>6069022
>>6069027this is so fucking bad lmao, they said they were going for a 2.1 version, but I hope they realize how much improvement this model actually needs, it's a really rough model at the moment
>>6069030loras will fix this in two weeks
From what I’ve seen of this model it absolutely loves Loras and gobbles them up. My hopes are high.
>>6069033do you have that lora anon posted for nsfw? the download got deleted
>>6069034My dude I just woke up like ten minutes ago.
https://github.com/huggingface/transformers/pull/43100/files>glm_image.md>This model was released on 2026-01-10 and added to Hugging Face Transformers on 2026-01-10.interesting, maybe that's why the Tongyi fucks started to wake up, maybe they see GLM image as a threat
>>6069044is it?
Mom, the chinks are making fun of us on discord :(https://files.catbox.moe/s6h9l2.mp4
>>6069049it do be funny
>>6069022>I'm tired of those fucking powerpoint shit,maybe going for the abliterated version helpshttps://huggingface.co/mlabonne/gemma-3-12b-it-abliterated-v2-GGUF/tree/mainhttps://civitai.com/models/2292336/ltx-2-nsfw-text-encoder-gemma-3-12b-abliterated?modelVersionId=2579572
>>6069052im feeling abliterated
>>6069051>Me when someone brings up z-image base
>>6069062You become a hot devil woman? please be in london
>>6069022>>6069052it barely changed anything, sad
>>6069071she moved a finger
>>6069071Try a different seed it works for me
>>6069071What model are you using?
>>6069078ltx2?>>6069077the point was to try to remove the powerpoint seeds, to not have this shit anymore
>>6069080>he thinks he can fully undo mossad censoring by changing the encoderlol
>>6069082:(
>>6069080>ltx2?No like, quant, gguf, fp8 or fp8 distilled
>>6069083wait for loras
>>6069085Q8 + distilled
>>6069088use the q8 distilled instead
>>6069088Must be the prompt then. The only times I've gotten powerpoint slides was when I didn't prompt enough actions to fit the time.
>>6069088are you using negatives?
>>6069090with ablit?
>>6069088>Q8 + distilled>>6069089>use the q8 distilled instead?>>6069091how can I use negatives it's distilled it's at cfg 1
>>6069092>with ablit?Ima be real. I have no idea if ablit does anything or not.
>>6069095gazebo
>>6069086uhmm my prompt was to make the planet gain a face
>>6069099>128x64lul
>>6069101each pixel assembled with care
>>6069098
>>6069098sounds different to how I remember it
>>6068169Wow. ltx 2 is really bad. Is that a super quantized distilled fp1 version?
>>6069109you wish lol
>>6069111ahah, it do be like that mr stancilhttps://www.youtube.com/watch?v=l_oaNrAl83Q
>>6069112kino
I tried adding voices from VibeVoice but it seems to struggle with multiple people speaking in an audio clip. It kept making different people speak from the same voice.
>>6069119regional prompt when
This was the worst one kek
>>6069124lol
>>6069124i watched this 4 times, too funny
which gemma 3 12b am I meant to download?
>>6069131all
>>6069131the normal one, I tried the "uncucked" versions (aliberated and heretic) and it didn't do shit >>6069071
>>6068925>>6068932problem is now it won't load the ltx2 gguf ugh... I need both clip and checkpoint gguf to load, the unet loader is giving.ValueError: too many values to unpack (expected 2)can anyone get both the model and clip in gguf format to actually work?
>>6069138maybe
>>6069136the 24gb one? Is there a smaller version that will work with the LTXV Audio Text Encoder Loader?
>>6069142use https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/blob/main/gemma_3_12B_it_nvfp4_uncalibrated.safetensors
>>6069138>the unet loader is giving.>ValueError: too many values to unpack (expected 2)it shouldn't, did you do that? >>6068952
>>6069142>>6069145performance was the same in my tests
>>6069146i didn't do this to merge both >>6068932I'm a bit hesitant to run those git commands because I don't know what directory I need to be inside of, i don't want to break my custom nodes directory.
>>6069150you aren't using claude code?
>>6069150>I don't know what directory I need to be inside ofthose git commands have to be done inside the gguf custom node folder, and if you want to go back to the "normal" branch again you do this>git checkout master
>>6069156error: Your local changes to the following files would be overwritten by merge: loader.py nodes.pyPlease commit your changes or stash them before you merge.Abortinghow do?
>>6068828kek
>>6069160git stash (and LLMs exist, use them if you have a coding problem)
I guess LTX just sucks at interpreting new audio because it can understand this stupid prompt better when using the model's own audio.
>>6069167what happens if you include the dialogue from the audio in the prompt?
>>6069169nta, but I've found it to be helpful when it just doesn't get it.
>>6069169That's what I did earlier but it still got the actors confused every time or they weren't speaking to the right person
>>6069145Missing weight for layer gemma3_12b.transformer.model.layers.0.self_attn.q_proj
>>6069160>>6069164you have to do the merge shit BEFORE modifying the script by yourself
>>6069174fixed, I had to update comfy...again
This key framing with audio shit is finicky but I'm getting close.
>>6069175DualCLIPLoaderGGUFUnexpected text model architecture type in GGUF file: 'gemma3'Back to the original problem, fuck it I'll wait. I guess I can't have both...
>>6069167Try raising the denoise strength on the encoded audio latent so that the model thinks that it's generating it. You will get better results that way.The "real" way to do it is to reverse sample on the latent like with RF-Edit, but it's too complicated to do right now.
>>6069206not bad, but can you make a 3d version? i hate live action
ltx is pretty fun, trying the kijai Q8 distil model, works on a 4080 (16gb) and 64gb ram just fine:Donald Trump wearing a black suit walks in from the right and says "she has great genes, you know. the best genes! the best."
>>6069211also, this gen didnt work as planned but it's still hilarious, cause it captures his emotes so well and appearance:
>>6069206Make a Tiktok account and just post these with "Warcraft if real"
>>6069214I used to have a youtube back when harry potter balenciaga was a thing, even got monotized within three days then the dickheads at youtube banned me from earning on my acccount for no reason. "Reuploading content" they said. IT WAS MY CONTENT.
there we go, prompt edit fixed it.the camera pans to the right is what I needed to add.
lmaothe camera pan is 10/10
>>6069218Maybe a lot of jealous people report everything. There's money involved after all.