Discussion and Development of Local Image and Video ModelsPrevious: >>108528950https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/musubi-tunerhttps://github.com/tdrussell/diffusion-pipe>Zhttps://huggingface.co/Tongyi-MAI/Z-Imagehttps://huggingface.co/Tongyi-MAI/Z-Image-Turbo>Animahttps://huggingface.co/circlestone-labs/Animahttps://tagexplorer.github.io/>Qwenhttps://huggingface.co/collections/Qwen/qwen-image>Kleinhttps://huggingface.co/collections/black-forest-labs/flux2>LTX-2https://huggingface.co/Lightricks/LTX-2>Wanhttps://github.com/Wan-Video/Wan2.2>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>Illustrioushttps://rentry.org/comfyui_guide_1girl>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkCollage: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/r/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debohttps://rentry.org/animanon
>>108535367kino
MYTH: api models are censoredFACT: api models are less censored than local models and are in fact trained on NSFW imageryMYTH: api models are too expensiveFACT: it's actually quite cheap to use API through ComfyUI API Nodes. the price for api has went down in comparison to the price of hardwareMYTH: api nodes collect your data and are unsafe to useFACT: api is safer than local because nothing is stored on your hard drive. with local models, you need to download hundreds of loras and custom nodes, any of which could be infectedMYTH: an api can pull the plug at any time, why use something like that?FACT: everything you generate can be saved to your desktop so nothing is lostMYTH: it's impossible to train a custom style of character with api, loras make local way betterFACT: api can learn any style or character with a single image reference, which is much faster and smarter than lorasMYTH: if i buy api credits and don't like the model, that's money wastedFACT: comfyUI's API nodes credit system allows you to prompt hundreds of cutting-edge api models. the credits share between models so you aren't locked in to any one ecosystemMYTH: api users are poor and from third world countriesFACT: the top hollywood productions and anime studios all use api models. api is the weapon of choice for everyone world-wideMYTH: discussion of api models is off-topicFACT: api models are part of the comfyui experience and are relevant to this thread. combining api models with local workflows is still local
im becoming too powerful
Blessed thread of frenship
>thread collage has actual hand-drawn hard work from non-AI artists in itlmao
>>108535397based anon finally switched to api
>thread is baked >anon immediately seething How does baker do it?
>>108535438By being a retard who can't tell an AI gen from a real painting I guess.https://x.com/Nyte_Tyde/status/1909771508697964672
>>108535453No I meant this anon >>108535396
>>108535562
>>108535599nice
>>108535622
nice to see a 2023 ai nostalgia thread
>mfw Resource news04/05/2026>ComfyUI-ZImage-Triton: Triton-accelerated W8A8 quantizationhttps://github.com/newgrit1004/ComfyUI-ZImage-Triton>ComfyUI Assets Manager v2.4.4 update https://github.com/MajoorWaldi/ComfyUI-Majoor-AssetsManager/releases/tag/v2.4.4>From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AIhttps://blogs.nvidia.com/blog/rtx-ai-garage-open-models-google-gemma-4>FLUX.2-klein-9B — PolarQuant Q5: 9B rectified flow transformerhttps://huggingface.co/caiovicentino1/FLUX.2-klein-9B-PolarQuant-Q5>Qwen3.5-9B-Neo-PolarQuant-Q5: 9B on any GPU with PolarQuanthttps://huggingface.co/caiovicentino1/Qwen3.5-9B-Neo-PolarQuant-Q504/04/2026>STAGE: Storyboard-Anchored Generation for Cinematic Multi-shot Narrativehttps://github.com/escapistmost/Storyboard-Anchored-Generation>Regularizing Attention with Bootstrappinghttps://github.com/ncchung/AttentionRegularization>LTX2.3-Multifunctional: Functionality optimization based on LTX desktop versionhttps://github.com/hero8152/LTX2.3-Multifunctional>Gemma 4 31B IT NVFP4 model is quantized with NVIDIA Model Optimizerhttps://huggingface.co/nvidia/Gemma-4-31B-IT-NVFP4>AP Netflix VOID – ComfyUI Custom Nodeshttps://github.com/adampolczynski/AP_Netflix_VOID04/03/2026>JoyAI-Image: Awakening Spatial Intelligence in Unified Multimodal Understanding and Generationhttps://github.com/jd-opensource/JoyAI-Image>Netflix VOID: Video Object and Interaction Deletionhttps://huggingface.co/netflix/void-model>OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoninghttps://huggingface.co/tencent/HY-OmniWeaving>Bias mitigation in graph diffusion modelshttps://github.com/kunzhan/spp>Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusionhttps://dedoardo.github.io/projects/control-dino>FlowSlider: Training-Free Continuous Image Editing via Fidelity-Steering Decompositionhttps://huggingface.co/spaces/dominoer/FlowSlider
>mfw Research news04/05/2026>PANDORA: Pixel-wise Attention Dissolution and Latent Guidance for Zero-Shot Object Removalhttps://vdkhoi20.github.io/PANDORA>A Benchmarking Methodology to Assess Open-Source Video Large Language Models in Automatic Captioning of News Videoshttps://arxiv.org/abs/2603.27662>Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformershttps://arxiv.org/abs/2603.27666>NeedleDB: Generative-AI Based System for Accurate and Efficient Image Retrieval using Complex Natural Language Querieshttps://arxiv.org/abs/2603.27464>Domain-Invariant Prompt Learning for Vision-Language Modelshttps://arxiv.org/abs/2603.28555>MolmoPoint: Better Pointing for VLMs with Grounding Tokenshttps://arxiv.org/abs/2603.28069>AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Modelshttps://arxiv.org/abs/2603.29410>LivingWorld: Interactive 4D World Generation with Environmental Dynamicshttps://arxiv.org/abs/2604.01641>Efficient Inference of Large Vision Language Modelshttps://arxiv.org/abs/2603.27960>Wan-R1: Verifiable-Reinforcement Learning for Video Reasoninghttps://arxiv.org/abs/2603.27866>A Robust Low-Rank Prior Model for Structured Cartoon-Texture Image Decomposition with Heavy-Tailed Noisehttps://arxiv.org/abs/2603.27579>CDH-Bench: Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Modelshttps://arxiv.org/abs/2603.27982>Rényi Entropy: New Token Pruning Metric for Vision Transformershttps://arxiv.org/abs/2603.27900>HSFM: Hard-Set-Guided Feature-Space Meta-Learning for Robust Classification under Spurious Correlationshttps://arxiv.org/abs/2603.29313>LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generationhttps://arxiv.org/abs/2603.27693>Jagle: Building a Large-Scale Japanese Multimodal Post-Training Dataset for Vision-Language Modelshttps://arxiv.org/abs/2604.02048
> >108535715thread schizo
>>108535361https://huggingface.co/circlestone-labs/Anima/discussions/112Really interesting and thoughtful discussion about Anima’s obvious issues, Qwen’s attention, memory and the whole artist tag dissolution debate.Feels like it’s time to take a step back, be a bit more realistic about this model, and figure out if it’s actually worth it.
>>108535731Anima white knights will tell you that @artist tags are outdated tech like loras.
don't care still using anima
>>108535731The only solution I can think of is for tdrusell to rebuild Anima from scratch, but make it style agnostic and move all styles into small loras. That way, he could free up memory to focus only on characters and concepts, and then extract loras from those styles so we can apply them ourselves with different weights, something like DLCs in video games.
>>108535361https://huggingface.co/circlestone-labs/Anima>Any LoRA you train on a preview version should be considered a "throwaway" LoRA. There's no guarantee it will work well on the final version.Any word on when this "final version" will be finished and uploaded? Or is that what preview-2 is supposed to be?
>>108535855THIS just like saas! we all know that models like GPT-2 Image use microloras for concepts, which is why they're able to accurately display the hands of an analog clock or fill a wine glass to the brim. They load up loras based on your prompts. I'd even better they're really all just running Flux.1 Dev in the background
>>108535821This same issue would’ve probably happened to Rouwei dev with their model that adapts the T5 text encoder to SDXL. CLIP was a gift from God for anime models.
https://huggingface.co/lodestones/Zeta-Chroma/tree/mainthis is so bad lmao
That's some pretty good, and thoughtful FUD desu. It matches my thoughts on blending multiple artists being a giant pain in the ass.>>108535889Sometimes I forget CLIP is from OpenAI, before they became a giant scam.
>>108535855>>108535882SaaS models probably ran into this same issue much earlier. At some point, having the model know every artist’s style all the time became unnecessary, especially since many art styles contradict each other and just end up confusing the model.
>>108535890i see why they call it pixel space!
>>108535930> many art styles contradict each other and just end up confusing the model.You’ve got a point. It could also be that MoE (Mixture of Experts) technology from LLMs gets applied to diffusion models, where the model doesn’t always activate all its parameters, but instead uses different ones depending on the prompt.
>>108535890Lodestone should retire.
>>108535890it's not finished. why are you judging underbaked models?
>>108536014>it's not finished.https://xcancel.com/LodestoneRock/status/2040745179372818437#m
chroma will never be finished because >>107962458
>>108535890why doesn't he try video model?
>>108536051money doesn't grow on trees
>>108536069Talent neither, he vibe trains models
>>108536038no model is ever truly finished because you can always make it better.
in kekstone's case, you can always make it worse!
SPARK Chroma is very promising even at 512 resolution. I'm looking forward to the 1024 version.
>>108535731the idea of embedding tables and removing artist string to avoid fucking up the semantics is interesting, has any model done this before?
>>108536069not starting from scratch. he can begin with loras, then merge models, and so on. it's better than spending money on new weird image models
>>108535731Fuck...
For the guy that wanted a "lewd" Jennie, ZIT gave me exactly one (and only one!) that wasn't too horrible... scaled down further than usual to help smooth over any flaws.Happy Easter!https://files.catbox.moe/7kr6oy.png>>108535890...aaaaaaand DONE!
>>108536181I wish he made gguf versions too
>>108536181>SPARK Chroma is very promising even at 512 resolution.can you showcase some images?
>>108536200why don't you make them yourself?
quick rundown of why CLIP is at the same time outdated and outperforming other encoding methods?
>>108536194??
>>108535731Anything that isnt clip will have similar issue, its not Anima specific.