Discussion and Development of Local Image and Video ModelsPrevious: >>108517229https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/musubi-tunerhttps://github.com/tdrussell/diffusion-pipe>Zhttps://huggingface.co/Tongyi-MAI/Z-Imagehttps://huggingface.co/Tongyi-MAI/Z-Image-Turbo>Animahttps://huggingface.co/circlestone-labs/Animahttps://tagexplorer.github.io/>Qwenhttps://huggingface.co/collections/Qwen/qwen-image>Kleinhttps://huggingface.co/collections/black-forest-labs/flux2>LTX-2https://huggingface.co/Lightricks/LTX-2>Wanhttps://github.com/Wan-Video/Wan2.2>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>Illustrioushttps://rentry.org/comfyui_guide_1girl>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkCollage: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/r/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debohttps://rentry.org/animanon
Blessed thread of frenship
At the end of 2025, we received our last batch of decent new models: Flux Klein, Z-Image, and Qwen 2512. 4 months later, where are the finetunes?
>>108520023lodes(chroma) is fine tuning Z-Image.
>>108520023Z image base released in January of this year. Previous finetroons took up to 8 months.
>mfw Research news04/03/2026>Modular Energy Steering for Safe Text-to-Image Generation with Foundation Modelshttps://arxiv.org/abs/2604.02265>Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolationhttps://arxiv.org/abs/2604.01700>Why Instruction-Based Unlearning Fails in Diffusion Models?https://arxiv.org/abs/2604.01514>SteerFlow: Steering Rectified Flows for Faithful Inversion-Based Image Editinghttps://arxiv.org/abs/2604.01715>MAR-MAER: Metric-Aware and Ambiguity-Adaptive Autoregressive Image Generationhttps://arxiv.org/abs/2604.01864>Low-Effort Jailbreak Attacks Against Text-to-Image Safety Filtershttps://arxiv.org/abs/2604.01888>HieraVid: Hierarchical Token Pruning for Fast Video Large Language Modelshttps://arxiv.org/abs/2604.01881>UniRecGen: Unifying Multi-View 3D Reconstruction and Generationhttps://arxiv.org/abs/2604.01479>Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraininghttps://junxuan-li.github.io/lca>Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariancehttps://arxiv.org/abs/2604.01848>Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generationhttps://arxiv.org/abs/2604.02289>Reinforcing Consistency in Video MLLMs with Structured Rewardshttps://arxiv.org/abs/2604.01460>Model Merging via Data-Free Covariance Estimationhttps://arxiv.org/abs/2604.01329>Steerable Visual Representationshttps://arxiv.org/abs/2604.02327>Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigationhttps://arxiv.org/abs/2604.01989>ViT-Explainer: An Interactive Walkthrough of the Vision Transformer Pipelinehttps://arxiv.org/abs/2604.02182>Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Modelshttps://arxiv.org/abs/2511.18123
>>108520023just stop posting. go for a walk or take a nap.
>>108520053i was told that 4 months ago. still no finetunes
>>108520063sdxl took longer than 4 months to tune
>>108520071SDXL released July 26 2023NovelAI v3 (sdxl) released November 15 2023only 3.5 months, and they converted it to vpred too
>>108520101what does NAI have to do with how long local tunes take
>>108520063what do you want from a finetune? clearly all the finetunes for zit/zib, flux 2 and qwen don't count. is there some hyper specific criteria you need met before it counts?
For anon who was looking for help at the end of the last thread>>108520043I can't speak as to whether or not your comfy workflow is broken though.
>mfw (mono filament wire)
cozy breas
>>108520101> SDXL released July 26 2023Its hard to believe it hasn't even been three full years. Feels like an eternity since SDXL came out.
>>108520160and you don't blame the software itself shitting the bed for months?
>>108520004Thank you for baking this thread, anon >>108520014Thank you for blessing this thread, anon
damn, things have progressed so far since SDXL. We had Dall-E 3, GPT-o, Nano Banana 2, and now per-pixel intelligence with Uni-1. Even ComfyUI and CivitAI grew up, both started only supporting small local models but now feature the best API models the world has to offer. It's great coming back to our roots and remembering where things started with SD1.4, back when we were all stuck using local. Now we can gen 4k outputs with Seedream in under 10 seconds, how crazy is that?
>>108520507Nope. All my gens with anon's chosen artist were busted too, on Forge Neo. Checkpoint didn't know their artist.
>>108520527going to be interesting to watch it unfold over the next few years, a lot of the smaller companies or companies that don't have brand recognition are going to go tits up. autodesk, adobe and pixologic will all run their own models that get fully integrated into their applications. chatgpt, x and googles models are already mspaint for boomers and normies to make memes. hollywood studios will run their own proprietary models. ironically the local coomers will be the biggest winners, nothing drives tech adoption like pornography. just look at grok.
https://huggingface.co/RuneXX/LTX-2-Workflows/blob/main/older_comfy_pre_feb2026/LTX-2%20-%20I2V%20and%20T2V%20Basic%20(Custom%20Audio).jsonltx is pretty great at lip syncing with custom audio: 20s even workedhttps://files.catbox.moe/20mzie.mp4
https://huggingface.co/SanDiegoDude/JoyAI-Image-Edit-SafetensorsDid anyone try that new edit model?
>>108520789see how good ltx works for sync audio? harry potter is saved.https://files.catbox.moe/kh3rpd.mp4