Discussion and Development of Local Image and Video ModelsPrevious: >>108597963https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/musubi-tunerhttps://github.com/tdrussell/diffusion-pipe>Zhttps://huggingface.co/Tongyi-MAI/Z-Imagehttps://huggingface.co/Tongyi-MAI/Z-Image-Turbo>Animahttps://huggingface.co/circlestone-labs/Animahttps://tagexplorer.github.io/>Qwenhttps://huggingface.co/collections/Qwen/qwen-image>Kleinhttps://huggingface.co/collections/black-forest-labs/flux2>LTX-2https://huggingface.co/Lightricks/LTX-2>Wanhttps://github.com/Wan-Video/Wan2.2>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>Illustrioushttps://rentry.org/comfyui_guide_1girl>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkCollage: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/r/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debohttps://rentry.org/animanon
>mfw Resource news04/14/2026>ERNIE-Image: Text-to-image generation model built on a single-stream Diffusion Transformerhttps://huggingface.co/baidu/ERNIE-Image>Danbooru Dataset Filter: High-Speed Metadata Explorer for AI Traininghttps://github.com/ThetaCursed/Danbooru-Dataset-Filter>ChatGPT will praise the mood and 'bedroom/DIY texture' of fart sounds pulled from YouTube https://www.pcgamer.com/software/ai/chatgpt-will-praise-the-mood-and-bedroom-diy-texture-of-fart-sounds-pulled-from-youtube>RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Detailshttps://limuloo.github.io/RefineAnything>Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillationhttps://github.com/leeruibin/hybrid-forcing>Energy-oriented Diffusion Bridge for Image Restoration with Foundational Diffusion Modelshttps://jinnh.github.io/E-Bridge>FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Datahttps://github.com/yuandaxia2001/FashionMV>Degradation-Aware and Structure-Preserving Diffusion for Real-World Image Super-Resolutionhttps://github.com/jiyang0315/DASP-SR.git04/13/2026>LTX 2.3 Distilled v1.1https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distilled-1.1.safetensors>UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representationshttps://huggingface.co/tencent/Unicom-Unified-Multimodal-Modeling-via-Compressed-Continuous-Semantic-Representations>CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generationhttps://catalogstitch.github.io>Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagementhttps://github.com/Metaverse-AI-Lab-THU/ImViD>Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noisehttps://github.com/gezbww/Vis_Prompt>MixFlow: Mixed Source Distributions Improve Rectified Flowshttps://github.com/NazirNayal8/MixFlow
>mfw Research news04/14/2026>EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Modelhttps://editcrafter.github.io>VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluationhttps://arxiv.org/abs/2604.10127>FineEdit: Fine-Grained Image Edit with Bounding Box Guidancehttps://arxiv.org/abs/2604.10954>AIM-Bench: Benchmarking and Improving Affective Image Manipulation via Fine-Grained Hierarchical Controlhttps://arxiv.org/abs/2604.10454>Continuous Adversarial Flow Modelshttps://arxiv.org/abs/2604.11521>OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Videohttps://arcomniscript.github.io>Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generationhttps://arxiv.org/abs/2604.10837>Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compressionhttps://arxiv.org/abs/2604.10546>Rethinking the Diffusion Model from a Langevin Perspectivehttps://arxiv.org/abs/2604.10465>Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understandinghttps://arxiv.org/abs/2604.11177>SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Modelshttps://arxiv.org/abs/2604.11530>Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inferencehttps://arxiv.org/abs/2604.11496>LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool for Pre-trained Model-based Class-Incremental Learninghttps://arxiv.org/abs/2604.11091>Agentic Video Generation: From Text to Executable Event Graphs via Tool-Constrained LLM Planninghttps://arxiv.org/abs/2604.10383>Omnimodal Dataset Distillation via High-order Proxy Alignmenthttps://arxiv.org/abs/2604.10666>What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Modelshttps://arxiv.org/abs/2601.06165
ok ernie turbo is fucking garbage at prompt following
>>108604751forgot prompt>A photorealistic candid photo of a woman with long, flowing hair that transitions from icy white at the roots to vibrant cyan-blue at the tips, cascading over her shoulders and partially obscuring her face as she looks downward. She wears a form-fitting, sleeveless top with a high neckline, primarily white with bold geometric yellow trim and a large, faceted blue diamond-shaped emblem centered on the chest. The garment has a structured, armored appearance with gold-brown segmented panels along the waist and hips, suggesting a fantasy or sci-fi outfit. Her right hand rests on a smooth, light-colored surface in the foreground, fingers slightly curled. The background is an out-of-focus twilight landscape under a deep indigo sky, with a soft gradient of magenta and purple along the horizon. A faint, glowing horizontal line runs across the lower portion of the frame, possibly a railing or edge of a platform. The lighting is directional, casting soft shadows and highlights on her hair and clothing, emphasizing texture and form with natural depth and contrast. No text, speech bubbles, or tears are visible.
https://huggingface.co/baidu/ERNIE-Imagehttps://huggingface.co/baidu/ERNIE-Image-Turbohttps://yiyan.baidu.com/blog/posts/ernie-imagehttps://ernieimageprompt.com/LOCAL IS SAVED!!
>>108604754wait nvm im gay, fucked up a setting
can some littlebox or gofile some nsfw gens of ernie image? the huggingface demo is too censored.
>>108604759But can it do anime loli porn?
>>108604759>no editthat's a shame, imagine doing edit with such a monster of a model, the prompt following is on another level, can't believe it's using a simple 3b text encoder to get that shit, and fucking ministral of all things
>>108604786ZAMN!
>>108604759https://github.com/Comfy-Org/workflow_templates/blob/main/templates/image_ernie_image_turbo.jsonhttps://huggingface.co/Comfy-Org/ERNIE-Image>AttributeError: 'Ministral3_3B' object has no attribute 'generate'thanks Comfy
>>108604759Can it do nude?
>>108604759Can it do shrek?
>>108604759bruh, turbo has garbage anatomy, downloading the base model
>>108604759buy an ad>>108604810have you pulled?
>>108604842>implying the monk didn't cultivate enough to master the four immeasurables and grow two extra armslol?
The gen times for non-turbo on my 3060 is a bit slow, 2 and half minutes for 20 steps, probably needs more steps, but it's not unusually slow for a model of this size.Let's see how it holds up further testing.
>>108604817>Can it do nude?https://litter.catbox.moe/9z9qwbnxpflyqt27.jpg
>>108604751>>108604763What did you fuck up so I can avoid it
>>108604861I see you tasted the base model, I hope it's the good one, I don't really like my tests on turbo so far>>108604843yes I'm on the latest version, seems like comfy hasn't implemented the prompt rewriting yethttps://github.com/Comfy-Org/ComfyUI/pull/13395>Needs template before it works properly.
https://huggingface.co/lightx2v/Wan2.2-Distill-Models/blob/main/wan2.2_i2v_A14b_high_noise_lightx2v_4step_720p_260412.safetensorsWhy is the high and low noise close to 60gb?
>>108604759What VAE does it use?
>>108604817>>108604862https://litter.catbox.moe/tz2g5anklf3bmmmt.jpgas expected, garbage genitals lol>>108604879the best one, flux 2's vae
>>108604817>>108604772It hasn't been trained on boobs, it generates mediocre breasts. Though from my very limited testing it doesn't seem to be deliberately poisoned like Flux models are.>>108604871I just had a feeling that the distill will be problematic and went for the base immediately.
>>108604888Is this turbo or base
>>108604889>I just had a feeling that the distill will be problematic and went for the base immediately.good, was about time that we got a fully finetuned model that isn't distilled, no need for some NAG cope, we can directly use CFG, and we'll be able to train and make loras on it>>108604893turbo
>>108604872FP32 precision.4 bytes for every weight:14b x 4 = 56
>>108604906Huh, I haven't seen the fp32 version before.
>>108604842>downloading the base modelI really don't like the anatomy, like this is base at 50 steps, come on
>migu:)
>>108604940smells like more and more like a nothingburger, the realism quality is Klein tier, but ernie can't even edits to compensate, sad
>>108604940I am wondering if Comfy fucked something up, or did they do Chroma-tier cherry picking for the images?>>108604922FP32 is usually only used for training because the benefits to inference are almost non-existent.
>>10860497450 steps turned out better.Seems also a bit wild when it comes to adding shit to the image. First time I have seen AI add a knife to 1girl, standing prompt unsolicited.
>>108604959
>>108604983Oh I think image is so different due to the fact that Control after generate is bugged with the retarded subgraph Cumfy has shipped with the template. So it ran a whole new seed.The point about knife stands though, same prompt.
>>108604991
>>108605000Ernie knows only one anime style: "Nano Banana Pro":]
>>108605023kek, I think I've seen enough
>>108605045maybe turbo at 16 steps is the best it can get
ernie base with the default settings and default prompt in comfyui gave me a guy with 3 legs.. not a great start
>>108605060Z-image turbo be like:https://youtu.be/WO23WBji_Z0?t=10
One of the better gens I got.Still has this Kleiny look to it.
>>108605080something is wrong with the proportion of their body, looks like they're midgets, Flux Kontext style lool
>>108605064>3 feet>>108605080>3 handslol I think I won't downloading this
it's all right, the jews will save ushttps://xcancel.com/ltx_model/status/2044108750592643279#m
This model has been trained on 3 billion images of Nano Banana Pro kek.
>>108605126>This model has been trained on 3 billion images of Nano Banana Pro kek.Z-Image supremacy, yeaaah! We had Qwen Edit and then the Tongyi model/s, but all other Chinese t2i are all equally sloppy, GLM, this, whatever.
I am kinda liking things about it despite it's faults.But they probably either overcooked this thing or it needed a little bit of post training aesthetic alignment to temper schizo anatomy.
>>108604974>I am wondering if Comfy fucked something upI think the model is just not that good, in my tests it's inferior to Z-image turbo almost everywhereIt can be a great base model to train on though, but yeah, 8b is big, people prefer something smaller like 2b so that they can do Anima type of models or some shit
>>108605183>8b is big
>>108605183yup, same experience, back to zit for me
>>108605183>it's inferior to Z-image turbo almost everywherethe niggas thought that training a model only on Nano Banana Pro's images would do the trick, all we got is that Synth-ID watermark pattern everywhere lmao, once again, synthetic data BTFO
>>108605115oops, forgot to attach their paperhttps://arxiv.org/abs/2604.11788
>>108605183I think there are issues with finetuning klein and ZIB for some reason.If it responds to training well this look salvageable. Decent text encoder + best vae + good size balance between quality and being able to be run on most hardware + OK quality bar anatomy issues + mid instruction following but can be possibly ironed out.I hope someone besides Kekstone takes a crack at it.>>108605208Can't we improve realism with finetuning/lora? I know training on slop sucks but banana pro is really high quality baseline.
>>108604759>https://ernieimageprompt.com/or else something is wrong with ComfyUi, or those baidu fucks are straight up lying to us, I'm not getting something even close to those images in that site
>>108605236Chinks lying? How can it be...
I love to complain about the jpeg artifacts on Z-image turbo, but for Erenie we arrived to a whole other level, jesus this is ugly af
>>108605262I don't think those are jpg artifacts, probably the watermark patterns of NBP >>108605126
>>108605262Is this Turbo? I am not really getting these on the Base.
turbo seems more slopped overall, and if there's one thing I can say base does better than Z-image turbo, is that it seems to know more stuff, but knowing more stuff is useless if the anatomy is ass and the realism is not even close too
>comparing z turbo to ernie base Why not compare base to base tho
>>108605278I think you are right anon, base doesn't seem to have that much noise
>>108605321as a ""base"" model it looks like it's destroying Z-image base, let's hope that we can train it well then, both ZIB and Klein had their issues
I don't see anything in which Ernie is the best at, Chroma has the best kino, Z-image has the best realism and anatomy, this shit is just slop after slop
>>108605317it's been compared here >>108605080
the ledditors are loving it thoughhttps://www.reddit.com/r/StableDiffusion/comments/1slg4wh/we_may_have_a_new_sota_opensource_model/
>piggies love slop STOP THE PRESSES A FROGFAG IS SPEAKING !!!
Can't the chinks do anything else than just make cheap copies of murica's products?
>>108605408
>Tezuka Rin \(katawa shoujo\) sitting on a benchis that how you're supposed to prompt on Anima? I can't manage to get her
>>108605115distilled seedance 2.0 (ltx 4) and kazar milkers honeypot spy gf was promised to me 6 gorillion years ago but unironically.
>>108605262>>108605278>>108605276i never had the artifacts problem with zit, just dont use the suggested retard samplers and instead use:euler (/euler_a) + simple (/normal)
>>108605468Yes for tag based prompts but I don't think there is full consensus on how to prompt characters when prompting with natural language. Try Tezuka Rin from Katawa Shoujo.If all options are exhausted try it on preview 2.
>>108605468>Tezuka Rin from Katawa Shoujo, a girl with short messy red hair and green eyes and no arms, sitting on a wooden bench, wearing her school uniform, calm distant expression, soft afternoon light, On the left knee there's a plush of Hatsune Miku, on the right there's a plush of Kazane Tetoskill issue
https://xcancel.com/DylanTFWang/status/2043952886166761519>Open-source tomorrowdamn, if it's not too big to run locally maybe Tencent finally cooked
big jump in real time interactable video genWaypoint-1.5 apache2 first person shooter focused 1.2b 720p 512 frames of context 56fps on 5090, need at least 30xxonline demo https://www.overworld.stream/https://github.com/Overworldai/world_engine
>>108605539Anons what's the actual use case for this world model thing?Every single world model I see looks like "cool tech demo you play for five minutes and then never touch again".
>>108605539forgot that link toohttps://3d-models.hunyuan.tencent.com/world/
>>108605552newfag. luddite. brown, even.the point is to enjoy the cool new tech and tinker with it while thinking about how you can maybe use it and change it yourself now while also thinking about how cool it will be in a year from now on.for example chaining multiple generated rooms you can traverse infinitely is a software problem and thus solveable relatively easily while allowing you to get much more out of that tech there.
>>108605550>512 frames of context 56fps on 5090So? less than 10 seconds? lol>>108605552desu I'd enjoy lurking on a world made out of a cool drawing image, like this shit
A very sloppy double exposure sloppa.
>>108605586https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22_Lightx2vkijai made the loras out of the new lightning version of Wan 2.2
There are some who call me...Tim
audio in ltx 2.3 1.1 seems nicer. we wuz hogwarts:https://litter.catbox.moe/hnjzczuml64krkjr.mp4
>>108605592not bad, Wan 2.2 may be an ancient model, it's still the best thing we have :')
>>108605651that's cool, I was tired of the ultra metalic sound of ltx, if those jews keep improving on that shit it might end up being a genuinely good model, still a long way to go to seedance 2.0 though lol >>>/wsg/6128285
>>108605654>first frame + last framekek, I forgot how much vram wan 2.2 is asking, I think I might return to LTX just for that
>>108605627lul, did you combine monty python screenshot with the cat meme?
>>108605686What do you mean by that, isn't LTX to heavier on resources?
>>108605701it uses a less heavy VAE so the kv cache usage is less punitive, good luck going for 720p on wan 2.2
>>108605723this. i can make 720p resolution gens on ltx. literally impossible on wan-hunyuan
>>108604726does any of this shit run simply and reasonably well on AMD cards yet?I have tried multiple times over the last couple of years to get a functional pipeline up and running on my 6800xt 16gb and it has never once workedI'm no genius but I'm also not retarded
>>108605689yeah
MLEM MLEM MLEM HECKIM CHNGUS
>>108605813if linux rocm + forge neo work fineif windows i pray for you
https://youtu.be/XUxKm40X__g?t=907benchmarks was a mistake...