Shift Scheduling Edition Discussion and Development of Local Image and Video ModelsPrevious: >>108807440https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, & Upscalershttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.info>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/tdrussell/diffusion-pipehttps://github.com/kohya-ss/sd-scriptshttps://github.com/kohya-ss/musubi-tuner>Zhttps://huggingface.co/Tongyi-MAI/Z-Image>Animahttps://huggingface.co/circlestone-labs/Animahttps://tagexplorer.github.io/>Qwenhttps://huggingface.co/collections/Qwen/qwen-image>Kleinhttps://huggingface.co/collections/black-forest-labs/flux2>LTX-2https://huggingface.co/Lightricks/LTX-2>Wanhttps://github.com/Wan-Video/Wan2.2>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>Illustrioushttps://rentry.org/comfyui_guide_1girl>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkCollage: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/r/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debohttps://rentry.org/animanon
>mfw Resource news05/13/2026>AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillationhttps://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers>RealDiffusion: Physics-informed Attention for Multi-character Storybook Generationhttps://github.com/ShmilyQi-CN/RealDiffusion>OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generationhttps://zghhui.github.io/OmniNFT>Logit-Attention Divergence: Mitigating Position Bias in Multi-Image Retrieval via Attention-Guided Calibrationhttps://github.com/brightXian/LAD>Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Modelshttps://github.com/JD-GenX/Uni-AdGen>Elastic Attention Cores for Scalable Vision Transformershttps://github.com/alansong1322/VECA>LychSim: A Controllable and Interactive Simulation Framework for Vision Researchhttps://lychsim.github.io>Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generationhttps://image2code.github.io/vision2code>ComfyUI-ppm Implements NegPip on the Z-image serieshttps://github.com/BigStationW/ComfyUI-ppm#clipnegpip>DreamX-World: A General-Purpose Interactive World Model https://github.com/AMAP-ML/DreamX-World>FLUX Identity Adjusterhttps://github.com/Magirad/Flux_ID_Adjuster05/12/2026>Pixal3D: Pixel-Aligned 3D Generation from Images https://ldyang694.github.io/projects/pixal3d>SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generationhttps://github.com/ShanwenTan/SWIFT>Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Modelshttps://zju-jiyicheng.github.io/Forcing-KV-Page>Masked Generative Transformer Is What You Need for Image Editinghttps://weichow23.github.io/EditMGT>Pixal3D: Pixel-Aligned 3D Generation from Imageshttps://ldyang694.github.io/projects/pixal3d>Micro-Defects Expose Macro-Fakeshttps://zbox1005.github.io/MDMF-project
>mfw Research news05/13/2026>CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narrativeshttps://yihao-meng.github.io/CausalCine>EPIC: Efficient Predicate-Guided Inference-Time Control for Compositional Text-to-Image Generationhttps://arxiv.org/abs/2605.11722>STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Modelshttps://arxiv.org/abs/2605.11494>UniFixer: A Universal Reference-Guided Fixer for Diffusion-Based View Synthesishttps://arxiv.org/abs/2605.12169>Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generationhttps://arxiv.org/abs/2605.12305>Couple to Control: Joint Initial Noise Design in Diffusion Modelshttps://arxiv.org/abs/2605.11311>MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generationhttps://arxiv.org/abs/2605.12134>Beyond Text Prompts: Visual-to-Visual Generation as A Unified Paradigmhttps://yaofang-liu.github.io/V2V_Web>One-Step Generative Modeling via Wasserstein Gradient Flowshttps://arxiv.org/abs/2605.11755>FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsityhttps://arxiv.org/abs/2605.11869>UniCustom: Unified Visual Conditioning for Multi-Reference Image Generationhttps://arxiv.org/abs/2605.12088>Cluster-Aware Neural Collapse Prompt Tuning for Long-Tailed Generalization of Vision-Language Modelshttps://arxiv.org/abs/2605.11939>L2P: Unlocking Latent Potential for Pixel Generationhttps://nju-pcalab.github.io/projects/L2P>Principled Design of Diffusion-based Optimizers for Inverse Problemshttps://arxiv.org/abs/2605.11506>AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Rewardhttps://huangrh99.github.io/AlphaGRPO>A Mimetic Detector for Adversarial Image Perturbationshttps://arxiv.org/abs/2605.11492
>>108816682Nice crow.
>>108816682Tell me about Crow, why does he wear the randoseru?
kill ani in real life, intercept him on his way home and stab him multiple times
>>108816725What the fuck?
>>108816764Pretty sure it's a mental patient sending threats to himself
Blessed thread of frenship
Someone bake the artemis photos lora
>>108816976Artemis photos? Elaborate
>>108817094the moon mission
>>108816976I'm making one for Z Turbo
>>108817334>dat grill
https://github.com/resemble-ai/DramaBoxHoly shit, this thing is the best TTS I've ever tried. It beats absolutely everything.It's based on LTX-2.3, handles NSFW content perfectly, and can do just about anything you can imagine with near-perfect cross voice cloning. Even though it was released as English-only, it handles my language perfectly.SAM-Audio can extract nearly any voice from any series or whatever.Now I’m going to start building my perfect dataset with emotions for a single-voice model using Qwen3-TTS as the backbone, which handles all emotions for real-time inference.I thought I’d have to wait another 1–2 years for this moment. What a time to be alive.
>>108817334Nice
>>108817360>Holy shit, this thing is the best TTS I've ever tried. It beats absolutely everything.What are the newest other models that you tried? Fishaudio S2P, Kokoro, Chatterbox, Qwen 3.5?
>>108817360>It's based on LTX-2.3interesting. i already knew ltx was good at voices since i used it to generate a low quality talking videos just to extract the audio from it. but can it generate only spoken audio? i would like something that can generate sound effects as well
>>108817423None of them are expressive, generate sound effects, ambiance, BGM, etc.I honestly don't know why there are so many pure TTS models but little T2A models.
>>108817360Upload some examples?
>>108817496qwen's default voices and fishaudio s2p are expressive, omnivoice IIRC also had a few tagssure, t2a models should have appeal
Big Russ is GONE and we are ALONE
King Russ sits permanently on the throne in my heart
anima4...
v3 is unironically all you need
>>108817511There are some on their hf pages:https://huggingface.co/ResembleAI/DramaboxThe output quality is shit but the tone is really good.
>>108817360VibeVoice is better. DramaBox has better non spech controls but the quality is awful. Sounds like they're in a tin can, which isn't surprising since it's based on LTX.
>>108818172It's based on LTX2, not even 2.3, so no surprise it sounds really bad.
y no flo from progressive
>>108817360There's also this one.https://github.com/ScenemaAI/scenema-audio
>>108818393Kino gen idea but she looks way too plastic.
anyone got a SIMPLE ltx eros i2v workflow lying around?
still having fun with Anima
behold my opus
>>108811965>>108811978>>108812292I just got newest Convrot Int8 quant for Chroma and it runs at the same speed as nunchaku. (half of q8)Quality seems much better although I won't make comparisons with bf16 (I guess I might for something faster like anima)Int8 inference for diffusion seems to have matured significantly recently. (Surprising how long we've been sitting on it and did nothing worthwhile with it during all these years int8 acceleration had been available.)This post on plebbit brought the newest int8 technique to my attention:https://www.reddit.com/r/StableDiffusion/comments/1tazxqz/int8_in_the_age_of_mxfp8_an_investigation_into/Seems worth considering if you are 3000 or 2000 series.
>>108818610ooh la la
Any suggestion on how, in ComfyUI, to quickly turn a turbo lora on/off?I want to use Anima Turbo for quick prototyping, but it is messing up the style I like, so I want to quickly turn it off to get a quality gen.I can bypass the LORA loader, but I also need to change the sampler setting. Way to do it with a switch woud be great.
>>108818778People use switches all the time in workflows but I don't know how it is precisely done from memory neither.All I can say is that it can easily be done by pressing a single toggle. (Bypass node + change cfg and step count)
>>108818818 >>108818778the easy to understand method many use is to just put the respective nodes into one group / subgraph each and group bypass one or the other
>>108818818* my personal recommendation would be to also have the other cfg and step count in each of the group or subgraph you toggle. YMMV.
>>108818731>if you are 3000That's-a me. Thanks for the info.
>>108818778just ask claude code/codex to build a subgraph for you inside your current workflow. You'll end up with something totally insane like this, but somehow it will work.
>>108818778if you just want on/off
We might actually be back.>https://arxiv.org/html/2605.12964v1>https://hanshengchen.com/asymflow/
>>108818955And how does switching change sampler setting?
>>108818962Cool
>>108818962that does look pretty cool also based on the sample imagesbut I barely get what they describe they did achieve... a better technique to relate 256*256px space to the actual latent space?
>>108818962Busy now only briefly skimmed but seems interesting.Also interesting is that they did this on 9b with the more iffy license.Will probably post more thoughts later after I finish reading their paper.
>>108818962hol up
>>108819035* demo gen on their demo site not yet THAT amazing, but perhaps I'm testing for the wrong capabilities
>schizophrenic situation
https://huggingface.co/spaces/Lakonik/AsymFLUX.2-klein>generate one (1) image >You have exceeded your ZeroGPU quota
>>108819058>ZeroGPUthey call it that for a reason
>>108819058seems like they released the model? https://huggingface.co/Lakonik/AsymFLUX.2-klein-9B/tree/main
>>108818962>Finetuning Latent Models into Pixel Models>Hansheng Chen, Jan Ackermann, Minseo Kim, Gordon Wetzstein, Leonidas Guibasso which one is lodestone
>>108819069>707mbobviously do not download this
>>108819069That's the adapter
>>108819085 >>108819076isn't that they thing they made for use with klein 9b?
>>108819071They don't even consider what he does actual research lmao.Never seem his stuff pop up in papers.
>>108819089Yes. There is a usage example in the model card. You load this adapter with the base flux2 klein 9b model
>>108818962
>>108819100yes, as far as I can tell that's how you'd use asymflow for local inference right now?i don't get the complaints >>108819085 >>108819076i mean sure, you can wait and see how the comfyui implementation will be done but I also wouldn't be surprised if they just kept the klein9b+adapter setup
>>108818731I also made a bf16 baseline comparison.I would say it's holding up reasonably well for quanting a just 2B model. You are going to get less divergence from half precision baseline with larger models. And possibly larger speed boost if you need offloading for the bf16/fp16.You have to enable dynamic lora option and take a 10-15% speed penalty over usual int8 speeds when using loras though. Otherwise loras have very minimal effect.
my niggasis it possible to train a zimage lora on a 6 gb, 32 gb machine?thinking of doing the oneitis
>>108819146Should be indeed possible with offloading, paged 8bit optimizer memes, maybe going for fp8, int8 or something for the weights and gradient accumulation, any combination of these until it's enough.The quality and speeds being desirable is another question.Desu you can get 5090 on vast for half a buck an hour.I would rather train on that rather than heavy VRAMlet workarounds.
>>108819146i think so, with offloading to RAMmaybe you even just need to change one offloading slider on onetrainer or ai-toolkit or whatever you use
>>108819176not sure you need to massively quantize, i think just taking the speed hit from offloading to system probably will do?the quality then shouldn't be worse, it's just not going to be a fast training
>>108819190I do not know how much speed hit you will take for the significant offloading you are going to do.If you are not dipping significantly slower than 10 seconds per step, that's still overnight training run while you are sleeping territory.I guess you can try and see what you get.I would temper expectations though.
>>108818984laziest way would be chaining two ksamplers together and just bypassing the one you don't want when you toggle turbo on/offhttps://files.catbox.moe/7gva2i.png
>>108818731>2000 series*cough cough*... wake up, my ol' betsy...
>>108819205usually it's a linear type of slowdown with not too extreme a slowdown factor, which is not ideal but also not prohibitivei'd actually recommend to try it, it's probably not a lost cause unless you have a large training data set and want to train it all at full resolution and all the other stuff that can also pump up the requirements
>>108819237I have to admit I haven't tried how much impact offloading makes for training speeds.I will keep this in mind next time I decide to train a lora for a larger model.
>>108819409holy plastic
not too great at prompt adherence, but i don't hate the hidream o1 randomized anime 1girls
>>108818776holy kino
>>108819642>>108819658is there some noise pattern or am I imagining things?
>>108819642They lied by colossal margins about what the model is.It's like someone promising you to buy you a brand new 5090 and then brings a second hand RX 6600 that needs to be repasted and get its broken fan replaced.It's fast enough to get some curiosity from experimenting with it I suppose.But also don't zoom in to your HiDream gens.Once you see it you won't stop seeing it.>>108819662It has 32x32 patch artifacts.
>>108819662i think it has either jpeg flaws trained or some other artifacts, not at all sure if it's just my settings or the model>>108819666>They lied by colossal margins about what the model is.could be, I didn't actually hear the promises before releasebut i think it's not bad for 1girl, questionable rating
it's 2026 and people still use wan and illustrious
ltx ooms on my 8gb gpu
>>108819690because they're nicely trained for anime 1girl, among other things.>>108819692probably just offload more?
>>108819694>probably just offload more?isn't it offloading automatic already
>>108819680>i think it has either jpeg flaws trained or some other artifacts,Pixel space diffusion is done in patches of 32x32 pixels. You need to smooth out transitions between different patches some way.Well it seems they simply didn't bother to for this garbage.Zeta-Chroma also has them (alongside its million other issues)llada also has them.GLM, despite its shit quality, is the only local pixel space model I know of that doesn't have them.>not at all sure if it's just my settings or the modelIt's the model. Well, maybe they KNOW a way to prevent them, but they didn't bother to include it in the inference code, so it's still the model.
>>108819700maybe with some tools/workflows? idk.but since you OOM clearly something didn't work so either reserve moar RAM on whatever automatic mode you're using or just decide manually how much is offloaded in advance of running.