Discussion and Development of Local Image and Video ModelsPrevious: >>108664784https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/musubi-tunerhttps://github.com/tdrussell/diffusion-pipe>Zhttps://huggingface.co/Tongyi-MAI/Z-Image>Animahttps://huggingface.co/circlestone-labs/Animahttps://tagexplorer.github.io/>Qwenhttps://huggingface.co/collections/Qwen/qwen-image>Kleinhttps://huggingface.co/collections/black-forest-labs/flux2>LTX-2https://huggingface.co/Lightricks/LTX-2>Wanhttps://github.com/Wan-Video/Wan2.2>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>Illustrioushttps://rentry.org/comfyui_guide_1girl>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkCollage: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/r/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debohttps://rentry.org/animanon
is it over or are we back
>>108668948why is it so brown
>>108668954ghibli niggers did this
>>108668948>GreasePT
Why is civitai full of new accounts literally named "abc123abc" commenting in every single z-image lora to make an Ernie version. For fuck sake, just take a look at the Commodore64 lora for Ernie, is disgusting, makes me puke just to stare at the images.
my gpu fans are starting to rattle. the end is near
>>108668948get out! >>108653190
>civitai split between red boards and blue//green board
QRD on Ernie? Is it a meme or can it actually save local?
>>108669029infographic generator
>>108668948that's a lot of inpainting and many hours in gimp
>>108668972chinks shill army nothing newthey are also shilling chink models in r/localllama right now
>>108668954the room was prompted to be bathed in warm light with dusty color pallete because it looks cozy>>108669037facts. i really like what it did with groks coffee cup
Why do ai images look like ai? I can't see the exact reason. How can you then make your gens look less than ai?
>mfw Resource news04/23/2026>ParetoSlider: Diffusion Models Post-Training for Continuous Reward Controlhttps://shelley-golan.github.io/ParetoSlider-webpage>DynamicRad: Content-Adaptive Sparse Attention for Long Video Diffusionhttps://github.com/Adamlong3/DynamicRad>Normalizing Flows with Iterative Denoisinghttps://github.com/apple/ml-itarflow>LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Modelhttps://github.com/inclusionAI/LLaDA2.0-Uni>Illustrious XL & NoobAI-XL Style Explorerhttps://github.com/ThetaCursed/Illustrious-NoobAI-Style-Explorer>AI Model & ‘MAGA’ Influencer Emily Hart Unmasked as Indian Manhttps://www.yahoo.com/news/articles/ai-model-maga-influencer-emily-091027504.html04/22/2026>Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Modelshttps://github.com/cvims/EMBEDDING-ARITHMETIC>Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generationhttps://github.com/CompVis/patch-forcing>TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generationhttps://github.com/Hong-yu-Zhang/TS-Attn>AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Modelhttps://yutian10.github.io/AnyRecon>SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editinghttps://github.com/vivoCameraResearch/SmartPhotoCrafter>Soft Label Pruning and Quantization for Large-Scale Dataset Distillationhttps://github.com/he-y/soft-label-pruning-quantization-for-dataset-distillation>Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representationhttps://github.com/AMAP-ML/EMF>Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weightinghttps://github.com/YonseiML/dpw>IR-Flow: Bridging Discriminative and Generative Image Restoration via Rectified Flowhttps://github.com/fanzh03/IR-Flow
>>108669070put "AI" in the negative prompt
>mfw Research news04/23/2026>Image Generators are Generalist Vision Learnershttp://vision-banana.github.io>Camera Control for Text-to-Image Generation via Learning Viewpoint Tokenshttps://randdl.github.io/viewtoken_control>Hallucination Early Detection in Diffusion Modelshttps://arxiv.org/abs/2604.20354>Wan-Image: Pushing the Boundaries of Generative Visual Intelligencehttps://arxiv.org/abs/2604.19858>MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddingshttps://arxiv.org/abs/2604.19902>Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editinghttps://arxiv.org/abs/2604.20258>Amodal SAM: A Unified Amodal Segmentation Framework with Generalizationhttps://arxiv.org/abs/2604.20748>FluSplat: Sparse-View 3D Editing without Test-Time Optimizationhttps://arxiv.org/abs/2604.20038>HumanScore: Benchmarking Human Motions in Generated Videoshttps://arxiv.org/abs/2604.20157>Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedbackhttps://arxiv.org/abs/2604.20730>Mitigating Hallucinations in Large Vision-Language Models without Performance Degradationhttps://arxiv.org/abs/2604.20366>Cognitive Alignment At No Cost: Inducing Human Attention Biases For Interpretable Vision Transformershttps://arxiv.org/abs/2604.20027>X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inferencehttps://arxiv.org/abs/2604.20289>Self-supervised pretraining for an iterative image size agnostic vision transformerhttps://arxiv.org/abs/2604.20392>Efficient INT8 Single-Image Super-Resolution via Deployment-Aware Quantization and Teacher-Guided Traininghttps://arxiv.org/abs/2604.20291>From Diffusion to Flow: Efficient Motion Generation in MotionGPT3https://arxiv.org/abs/2603.26747
>>108669037that's basically what image 2 is doing. it's a second pass that projects the text onto the genned image. the easiest way to spot it is on clothing, the X for example, it's just sitting on her dress. it's actually almost pixel perfect with the X on the laptop.
>>108669070Hire an artist to paint over it.
>>108669092why cant local models do that?
>>108669088>>108669090thanks
>>108669107they probably can but nobody is developing the tooling for it, at least not in a user friendly way
>>108669107probably because they don't care, it's a parlor trick to impress indians and boomer investors. sorry to pull the curtain back. point in case, the gen has uses the same X, it just has a slight skew on the dress. same with the openAI logo, it's just sitting on her shirt.
>>108669117Put the phone away fag
>>108669093Gay>>108669089There is no way it's that simple. But now that I think of it putting tags like "masterpiece" seem to help
>>108669137?
>>108669190api image thread is here >>108653190
>>108669182>masterpiece in the positives helps make outputs not look ai Erm..... Anon? When's the last time you saw an optometrist??
>>108669070Can we talk about this pls?
>>108669135its a cool trick honestlyhopefully the chinese will be able to reverse engineer it for local models
>>108669231?
>>108669243honestly i think a random person could figure out a better implementation in a few days, local has a lot more head room to fuck around. there are 3d models, i assume they have some kind of texture projection.you could probably jury-rig something from preexisting nodes. convert a masked area into a plane or 3d topology, project text or an image onto it, then lay it on top of the gen.
>I haven't checked in on /ldg/ in a while what are they up to>Thread gets diverged againStill at it huh
aight, you can now use NAG on Animahttps://github.com/BigStationW/ComfyUI-NAG-Extendedhttps://github.com/BigStationW/ComfyUI-NAG-Extended/blob/main/workflows/NAG-Anima-ComfyUI-Workflow.jsonhttps://civitai.com/models/2560840/anima-turbo-lora
>>108669455>bigstationwim sorry i dont use vibesharted code :")
What kind of hardware and software/driver combinations do you guys use to generate images and videos and what not?
>turbo lora for a 2b model
>>108669455Thanks king. Does left also use a negative prompt tho?
>>108669476>Does left also use a negative prompt tho?left can't use a negative prompt, it's at cfg 1
>>108669455https://github.com/pamparamm/ComfyUI-ppmI've been using this for negative weights while at CFG 1.0, works great, you just have to get used to the fact that you are putting negative weighted tags in the positive prompt instead of writing in the negative prompt. This has worked better for me than NAG ever did.
>>108668921>my Roll-chan made it in the OP
>>108669475if you're using a sophisticated sampler like ClownsharKSampler, going for cfg > 1 + 50 can be really long (like more than 2 mn on my 3090)
>>108669477Would you be so kind as to compare non turbo lora with regular CFG vs NAG? I'm just curious
>sophisticated sampler
>>108668271what model did you use anon, looks clean
>>108669495God-tier aesthetics in that series. Shame there's so few images tagged "reaverbot" on Danbooru, I want to gen some fucking bots. Guess I have to train a lora...
>>108669513probably zimage turbo
Can someone explain why than new fancy chatgpt image thing isn't possible locally? Couldn't you just hook up something like z-image or anima to a smart LLM like Gemma with vision?
I wonder if there is a way to automate gemma 4 with its vision capabilities as an agent + whatever model + inpainting tools to approach the result of the gpt autoregressive model.
>>108669495baker doesnt like my anime2real images sadgely :(
>>108669503the issue is that those NAG parameters don't work for cfg > 1, it can be used yeah but I'm just too lazy to find the right values again, I mean if you already have CFG, adding NAG on top of that is kinda useless imo (and it's slower)
>>108669528Yes
turdbo looks so stale i have no idea how anon isnt tired of that look already it was cool when it came out but its just a demo for ZiBjust use ZiB
>>108669449is there a finetune for anima I didn't hear about?since when did it do realistic?
>>108669528>inpainting toolsreplace that with an edit model like klein and you can probably do it yeah
>>108669544ZIB cant do fine detail
How do I anima with krita?
>>108669522alright, thanks
>>108669563its the same VAE yes it does
>>108669544>>108669563I've seen some workflow where they use ZIB to do the begining of the image (like the first 50% of steps), then switch to ZiT to make it look good
>>108669553Since always. https://civitai.com/models/1662740/lenovo-ultrareal?modelVersionId=2882170 This lora helps a tiny bit.
>>108669574that's what im doing
>>108669555issue is that edit won't be able to target specific things to enhance>>108669528can gemma select a part of an image?
>>108669574>switch to ZiT to make it look goodkek i guess if you enjoy distill slop then sure
>>108669582>edit won't be able to target specific things to enhanceyes it can, edit can just modify one specific part of the image, that makes shit easier because you just have to say "hey, add a hat to that girl's head" instead of trying to automate an inpainting process
>>108669513>>108669522Anima -> ZIT
>>108669528I don't know what it is but they did something else than just "look at this image and fix it".Even SOTA API models don't really have a super great visual reasoning.Again I don't know what precisely it is, but they are feeding ChatGPT more than a few hundred visual tokens.
i wish lodestones didn't have the attention span of a fruit fly
>>108669553I use loras for photography and interior. Haven't uploaded anywhere yet.
>>108669528>>108669629it's probably something like this>it makes the image -> it uses its visual encoder to see mistakes -> it makes an edit prompt -> it edits the modela gemma 4 + klein combo could definitely do the trick
Retard here, is there any reason to use Klein 9b base over distill when you do upscaling and editing? Or is it just slower without any real benefit?
>>108669622so anime gen with anima + zit at 0.x denoise to make it realistic?
>>108669683use base with speed lora
>>108669503>>108669531give me a prompt and a negative prompt, I'll try it out
>>108669613yeah but it's not as precise for things it doesn't know about, or background things, or basically very specific things you want to target
>>108669707>things it doesn't know aboutOpenAI probably uses a tool calling to browse the internet, fetch some images and ask the model to merge those images onto the canevas
>>108669475>he doesnt want to gen instantaneously
>>108669629Anything can be bruteforced with enough tokens, and seeing the prices on the api sides, I'm pretty sure it's feeding a whole lot of tokens to refine the image.The result is good though, and I'd like to see that locally done with the tools we have.
https://huggingface.co/TheRemixer/ChenkinNoobRF-T5Gemma-adapter Neat, T5gemma adapter for Chenkin Noob!>>108669726Anima is x2 slower than SDXL and for train Anima loras it's between 2.3 -2.5 slower than sdxl
>>108669653>it uses its visual encoder to see mistakesthis is probably their secret sauce (along with using agents), I think they trained specifically for "wrong looking texts" and details, which means the model is probably very good at spotting that
>Anima is x2 slower than SDXL and for train Anima loras it's between 2.3 -2.5 slower than sdxlloving this indian reasoning
>>108668948It’s rare, the aesthetic is kind of fried, like SD 1.5, but with better coherence. It reminds me of one of those SD 1.5 shitmixes with a lot of inpainting, regional prompter, and Photoshop.
>>108669683In my single testcase the results differed, but base did more unwanted fiddling than distilled. YMMV.
>>108669684Pretty much, expect I start with a realistic Anima gen.
>>108669856>double eyelid zitslopgirlNinTenLOL
>>108669742>SDXL *pukes*
Any way to stop the artist name from showing up with anima? I already have signature, artist name, twitter username, patreon username, and watermark in the negative prompt but it's still doing it.
>>108669731I guess you can take your time trying to min-max llama.cpp params and see if it scales up well enough? I wouldn't be too hopeful but worth a shot.Maybe 3.6 works better for this, that's also worth experimenting.
>>108669653>>108669747you need a very good model that doesnt use a vae in order to do what gpt 2 is doingdont waste your time trying to squeeze water out of a stone with these outdated latent diffusion models
>>108670003>you need a very good model that doesnt use a vae in order to do what gpt 2 is doingthe thing is that it's obvious that gpt 2 is still using a vae, when you go for very complex images, it starts to be slopped fast and there's more and more noises and artifacts, it's probably the result of the model doing like 10 edits and at this point the vae issues starts to be really amplified
>>108670022indians think api models are magic.
>>108669980gemma 31B has pretty ok image understanding, and is easy to stop from moralfagging over nsfw
>use pear-shaped figure tag>turns her into a literal fucking pairKek
>>108668863catbox?
>>108670109canon btw
>>108670109what a smug pear.
>>108670022>there's more and more noises and artifactsThis is due to overly aggressive distillation + RL. See Ernie Turbo for a local example, it is trying to "fake" detail by just having noise on everything.
>>108670191me on the right
>>108669856The problem with this method is that the anatomy more or less sucks. Talking about anatomy, what model produces the most anatomically accurate gens? I have been using virt a mate + ZIT to make my gens look realistic, but that's kind of a hassle.