Discussion and Development of Local Image and Video ModelsPrevious: >>108659074https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/musubi-tunerhttps://github.com/tdrussell/diffusion-pipe>Zhttps://huggingface.co/Tongyi-MAI/Z-Imagehttps://huggingface.co/Tongyi-MAI/Z-Image-Turbo>Animahttps://huggingface.co/circlestone-labs/Animahttps://tagexplorer.github.io/>Qwenhttps://huggingface.co/collections/Qwen/qwen-image>Kleinhttps://huggingface.co/collections/black-forest-labs/flux2>LTX-2https://huggingface.co/Lightricks/LTX-2>Wanhttps://github.com/Wan-Video/Wan2.2>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>Illustrioushttps://rentry.org/comfyui_guide_1girl>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkCollage: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/r/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debohttps://rentry.org/animanon
What happened to local?
Blessed thread of frenship
>>108664800may 2024 you say
>>108664800the worst part is that Alibaba is still here, feeding the llm fags, but they seem to have abandoned us :( >>108664796
Am I supposed to be upset that a company is no longer supporting local? It's happened before and every time someone else steps up.
>>108664800>>108664809>GPT was able to deduce Alibaba's sellout plan 2 years ago, we just didn't listenholy shit, saas is actually insanely powerful
Why are the api niggers so upset at local? They mad they can't gen boobies or what?
>>108664820at this point its instigating or participating in a flame war desu
>armchair jannying as if jannies give a single fuck about AI threads
>>108664817>It's happened before and every time someone else steps up.yeah anyone crying about it is either very very new or just trolling
>>108664817>It's happened before and every time someone else steps up.what if no one else steps up?
>I am supposed to be upset she slept with another man? It's happened beforethey really are localkeks after all!
Are you equating a company with a woman because you have never had sex before
>>108664800sharty alertsharty alertsharty retard
>>108664833go on then post some nippies or vagene
>>108664820>They mad they can't gen boobies or what?have you not seen the grok threads? we can get our fill of boobs whenever we want
Local Diffusion?
>>108664862Then why is you so upset little nigga
>>108664784Thank you for baking this thread, anon >>108664802Thank you for blessing this thread, anon
>mfw Resource news04/22/2026>Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Modelshttps://github.com/cvims/EMBEDDING-ARITHMETIC>Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generationhttps://github.com/CompVis/patch-forcing>TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generationhttps://github.com/Hong-yu-Zhang/TS-Attn>AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Modelhttps://yutian10.github.io/AnyRecon>SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editinghttps://github.com/vivoCameraResearch/SmartPhotoCrafter>Soft Label Pruning and Quantization for Large-Scale Dataset Distillationhttps://github.com/he-y/soft-label-pruning-quantization-for-dataset-distillation>Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representationhttps://github.com/AMAP-ML/EMF>Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weightinghttps://github.com/YonseiML/dpw>IR-Flow: Bridging Discriminative and Generative Image Restoration via Rectified Flowhttps://github.com/fanzh03/IR-Flow>TRELLIS.2-stableprojectorz: Trellis.2 optimized to fit inside 8GB gpushttps://github.com/IgorAherne/TRELLIS.2-stableprojectorz>Fizgig — Klein 9B LoRA Studiohttps://github.com/shootthesound/Fizgig>Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Itemshttps://huggingface.co/datasets/TaobaoTmall-AlgorithmProducts/Tstars-VTON04/21/2026>MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Style Mappinghttps://jeoyal.github.io/MegaStyle>UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Modelshttps://github.com/Yovecent/UDM-GRPO>Noise-Adaptive Diffusion Sampling for Inverse Problems Without Task-Specific Tuninghttps://github.com/NA-HMC/NA-HMC
>>108664868im not upset, im happy as a clam genning these cool pics with gpt image 2
>mfw Research news04/22/2026>Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generationhttps://arxiv.org/abs/2604.18215>Diff-SBSR: Learning Multimodal Feature-Enhanced Diffusion Models for Zero-Shot Sketch-Based 3D Shape Retrievalhttps://arxiv.org/abs/2604.19135>ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesishttps://arxiv.org/abs/2604.19720>Long-Text-to-Image Generation via Compositional Prompt Decompositionhttps://arxiv.org/abs/2604.18258>HP-Edit: A Human-Preference Post-Training Framework for Image Editinghttps://arxiv.org/abs/2604.19406>Geometric Decoupling: Diagnosing the Structural Instability of Latenthttps://arxiv.org/abs/2604.18804>CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layershttps://arxiv.org/abs/2604.19632>Allo SR $^2$: Rectifying One-Step Super-Resolution to Stay Real via Allomorphic Generative Flowshttps://arxiv.org/abs/2604.19238>Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generationhttps://arxiv.org/abs/2604.19234>Deep sprite-based image models: An analysishttps://arxiv.org/abs/2604.19480>LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Modelshttps://arxiv.org/abs/2604.18803>Hierarchically Robust Zero-shot Vision-language Modelshttps://arxiv.org/abs/2604.18867>Rethinking Dataset Distillation: Hard Truths about Soft Labelshttps://arxiv.org/abs/2604.18811>Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learninghttps://arxiv.org/abs/2604.19009>Benign Overfitting in Adversarial Training for Vision Transformershttps://arxiv.org/abs/2604.19724>BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillationhttps://arxiv.org/abs/2604.16514
>>108664887is that GW2?
Why do people throw a fit about Grok/GPT/NBP gens? They are supported fully in ComfyUI now and integrate well into local workflows. Despite this, the same freetards continue to cry over them. This isn't linux general you boomers.
>>108664901>>108664887>>108664862>This isn't linux general you boomers.it literally is, it's a local thread, if you want to spam your API garbage you're simply off topic, what's hard to understand about that? get the fuck out >>108653190
imagine local living so rent free in your mind that you cant stop lurking and posting here KEK
>>108664901>>108653190
>try Images 2.0 to add some text on an image that just happens to have a shirtless man in the background>I'm focused on the text don't even notice the man. prompt to "make it a little bigger">We’re so sorry, but the image we created may violate our guardrails around nudity, sexuality, or erotic content. If you think we got it wrong, please retry or edit your prompt.local will still be needed.
>>108664993i dont think gpt image is that strict, i just genned this a minute agotry playing around with the prompt a little
>>108665013>>108653190
>>108664901is this really what your life has come down to? Doing le epic troll 8+ hours a day on /ldg/? Surely there are better ways of spending your limited time on this planet?
>>108664993>We’re so sorry, but the image we created may violate our guardrails around nudity, sexuality, or erotic content. If you think we got it wrong, please retry or edit your prompt.>change the prompt to say make the text a little bigger>just werksmaybe another day local will be needed
>>108664901>grokEnjoy your 480p vertical videos gens lmao
>>108664993>search for court cases>chatgpt spitting out text>deletes reply>Stopped searching>ask it the same question>I have to time it and click stop before it deletes replyCensorship is cancer.
Realism lora for animahttps://civitai.com/models/1662740/lenovo-ultrareal
>>108665062huge
>>108665062It'll be so easy to pick up a fuck ton of cred by making a very simple style lora for realism with anima. Fuck I should do that.
>>108665062>Only trained on 30 pics. No way is that enough. I sleep.
>>108665089Where does it say 30 images?
are there any good frame interpolators? RIFE just blurs between frames
ai could never>>108665062my crystal maidens are all chubby now
>>108665115In the lora metadata.
So on tdrussell's Rutowski lora config it says 1000 epochs on a 153 image dataset.Surely that doesn't mean anima needs 153k steps for a lora?How many epochs did it actually go through?
>>108665149Going off epochs is for chumps. It's 2k - 4k steps depending on the style.
>>108665149Apparently I can't read the civit page>This version corresponds to 40 epochs (120 passes over the data when considering the 3 resolutions)120x153 steps.>>108665161He sets LR low so it needs more steps I think.
>>108665149I'm literally overfitting with 250 steps bs4
>>108665149Anima training is for Linux, retvrun to mugen
>>108665169Just get your LoRA to 2k steps and start saving every 500, go until 4k and pick which version you think looks the best.
>>108665169>>108665188To clarify, I mean when using his default hyperparams. It works quite well.
Has anyone experimented whether anima responds nicely to timestep shifts during training btw?>>108665188I am skeptical you actually trained a decent anima lora with the method you preach.I had a very bad time trying to train lora for anima with high LR and lower step counts.All epochs looked shit.
Why did tdrusell only release his training on Linux? This is some kind of Triton bad faith move...
>>108665198>>108665192I also tried a low LR run (coincidentally similar to his, tried before he published his example lora) with 8-10k steps (I think, I don't remember too well), that also looked bad. Maybe the 18k ballpark figure is needed, I intend to try that.
>>108665198Anima trains fine with a decent dataset, his sane defaults, and somewhere between 2k and 4k steps. Rarely do I have to go up to 4k. Often around 3k is fine. I have yet to feel the need to change any params from his example LoRA.
>>108665200He's a piece of shit. Obviously gonna put his training scripts behind Linux since he knows Linux users are less artsy and more sloppers than Windows Mac artists where it just werks.
>>108665205Just looked it up.I tried 8k steps with 0.00004 LR. I am planning to try 0.00002 or 0.00003 with 18k or slightly below that.Oh btw I just remembered his 18k steps is with gradient_accumulation_steps = 4So does that loosely equal 4500 "real" steps?
anima is really separating the wheat from the chaff
>>108665205>>108665247Just use his trainer and the config from his example. All I'm meaning is it just werked for me. >>108665200You should already be running Linux desu.
>>108665149I did about 5k steps with 200 images for this stonetoss lora, using big russ's example config. Prior to that I did a couple with prodigy optimizer and they turned out ok too. >>108665062Seemed pretty bad when I tested it.>>108665126Have you tried InterpAny-Clearer? It's some stuff built on top of RIFE, I never really noticed blurring that much with RIFE, just awful artifacting with fast motion which InterpAny-Clearer fixed for me. It's a bit slower than RIFE I think.
what are you guys using to train anima anyway?
>>108665307my computer
>>108665200keep seething wintroon
>>108665310This PC*
>>108665062He needs bigger dataset if he wants to go this dimly lit grainy shit route with Anima
>>108665307very funny anonmay your loras overfit and your gens fail
remember all the retards claiming it couldnt be trained due to catastrophic forgetting good times
>>108665304>I did about 5k steps with 200 images for this stonetoss lora, using big russ's example config. Prior to that I did a couple with prodigy optimizer and they turned out ok too. Thanks for the reference point anon.
>>108665374don't forget the retard claiming it wouldnt be trained due to licensing
OneTrainer support FUCKING when
>>108665129aaaaiiiieee i'm not paying. fuck the police
>>108665198I know I can ask a chatbot but I'd rather ask a real anon with real experience. What are the benefits of timestep shifts? Or, what are they used for?
What we gennin' tonight, GPTgods??
doing anima3 -> zit, works pretty well for creating realistic imagesbut I'm struggling with anima3 -> klein9b, has anyone had any luck getting good outputs from this kind of workflow?
i'm getting astray heads/body parts with the anima turbo lora, is there anything i can do to fix it? already tried editing the prompt and changing steps from 12 to 8 but the problem persists. it doesn't happen without the lora. it also doesn't happen if i use the highres aesthetic boost at a certain value (definitely doesn't on 1) but i dislike the aesthetics of turbo + high res, they remove too much detail together
>>108665451It comes from the SD3 paper. Shift values above 1 make the model spend more time on higher timesteps/sigmas, which improve composition for flow models (anything SD3 or newer). Going too high makes the image blurry/fucks up details so you can't crank it indefinitely. This is for genning images. If you never touched it, Comfy automatically uses the value 3 for most image models, feel free to experiment with model sampling node sometime.I am not too well versed about precise impact for lora training, but it might help under some situations I believe.
>>108665473nevermind, i tested it and still happened with both loras at 1.0. fuck. i guess it's the turbo lora. i like the aesthetics too, i wish i could get them even without dumbing down the model
>>108665477>I am not too well versed about precise impact for lora training, but it might help under some situations I believe.you adjust it regarding dataset complexity and training resolution>>108665473>>108665490can you post image, not sure what you mean exactly
>>108665501So what shift value would you use for full 1024p anima lora training on circa 100 images for a decently complex style?
>>108665524No clue, I haven't done testing with anima. I'd stick to settings russel guy uses
If v3 isn't the final version, what more is he going to add? Just further highres training? If v3 is the final version, e621 injection wen?
I got a 5090. What's the best model I can run?
Ohh, I still have some of my old 1.5 gens on my PC
>>108665644More hopeful days