/g/ - /ldg/ - Local Diffusion General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/ldg/ - Local Diffusion Genera(...) 04/03/26(Fri)17:55:56 No.108520004

File: highlights_g_108517229_17(...).jpg (665 KB, 2404x1133)

/ldg/ - Local Diffusion General Anonymous 04/03/26(Fri)17:55:56 No.108520004

Discussion and Development of Local Image and Video Models

Previous: >>108517229

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon

Anonymous
04/03/26(Fri)17:57:28 No.108520014

Anonymous 04/03/26(Fri)17:57:28 No.108520014

Blessed thread of frenship

Anonymous
04/03/26(Fri)17:59:05 No.108520023

Anonymous 04/03/26(Fri)17:59:05 No.108520023

At the end of 2025, we received our last batch of decent new models: Flux Klein, Z-Image, and Qwen 2512. 4 months later, where are the finetunes?

Anonymous
04/03/26(Fri)17:59:52 No.108520032

Anonymous 04/03/26(Fri)17:59:52 No.108520032

>>108520023
lodes(chroma) is fine tuning Z-Image.

Anonymous
04/03/26(Fri)18:00:14 No.108520034

Anonymous 04/03/26(Fri)18:00:14 No.108520034

>>108520023
Z image base released in January of this year.
Previous finetroons took up to 8 months.

Anonymous
04/03/26(Fri)18:02:07 No.108520047

Anonymous 04/03/26(Fri)18:02:07 No.108520047

>mfw Research news

04/03/2026

>Modular Energy Steering for Safe Text-to-Image Generation with Foundation Models
https://arxiv.org/abs/2604.02265

>Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation
https://arxiv.org/abs/2604.01700

>Why Instruction-Based Unlearning Fails in Diffusion Models?
https://arxiv.org/abs/2604.01514

>SteerFlow: Steering Rectified Flows for Faithful Inversion-Based Image Editing
https://arxiv.org/abs/2604.01715

>MAR-MAER: Metric-Aware and Ambiguity-Adaptive Autoregressive Image Generation
https://arxiv.org/abs/2604.01864

>Low-Effort Jailbreak Attacks Against Text-to-Image Safety Filters
https://arxiv.org/abs/2604.01888

>HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models
https://arxiv.org/abs/2604.01881

>UniRecGen: Unifying Multi-View 3D Reconstruction and Generation
https://arxiv.org/abs/2604.01479

>Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining
https://junxuan-li.github.io/lca

>Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance
https://arxiv.org/abs/2604.01848

>Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation
https://arxiv.org/abs/2604.02289

>Reinforcing Consistency in Video MLLMs with Structured Rewards
https://arxiv.org/abs/2604.01460

>Model Merging via Data-Free Covariance Estimation
https://arxiv.org/abs/2604.01329

>Steerable Visual Representations
https://arxiv.org/abs/2604.02327

>Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation
https://arxiv.org/abs/2604.01989

>ViT-Explainer: An Interactive Walkthrough of the Vision Transformer Pipeline
https://arxiv.org/abs/2604.02182

>Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
https://arxiv.org/abs/2511.18123

Anonymous
04/03/26(Fri)18:03:03 No.108520053

Anonymous 04/03/26(Fri)18:03:03 No.108520053

>>108520023
just stop posting. go for a walk or take a nap.

Anonymous
04/03/26(Fri)18:04:40 No.108520063

Anonymous 04/03/26(Fri)18:04:40 No.108520063

>>108520053
i was told that 4 months ago. still no finetunes

Anonymous
04/03/26(Fri)18:05:28 No.108520071

Anonymous 04/03/26(Fri)18:05:28 No.108520071

>>108520063
sdxl took longer than 4 months to tune

Anonymous
04/03/26(Fri)18:08:53 No.108520101

Anonymous 04/03/26(Fri)18:08:53 No.108520101

>>108520071
SDXL released July 26 2023
NovelAI v3 (sdxl) released November 15 2023
only 3.5 months, and they converted it to vpred too

Anonymous
04/03/26(Fri)18:09:33 No.108520108

Anonymous 04/03/26(Fri)18:09:33 No.108520108

>>108520101
what does NAI have to do with how long local tunes take

Anonymous
04/03/26(Fri)18:10:52 No.108520117

Anonymous 04/03/26(Fri)18:10:52 No.108520117

>>108520063
what do you want from a finetune? clearly all the finetunes for zit/zib, flux 2 and qwen don't count.
is there some hyper specific criteria you need met before it counts?

Anonymous
04/03/26(Fri)18:17:18 No.108520160

Anonymous 04/03/26(Fri)18:17:18 No.108520160

For anon who was looking for help at the end of the last thread
>>108520043
I can't speak as to whether or not your comfy workflow is broken though.

Anonymous
04/03/26(Fri)18:55:02 No.108520382

Anonymous 04/03/26(Fri)18:55:02 No.108520382

File: ComfyUI_00140_.png (1.64 MB, 1200x1600)

1.64 MB PNG

>mfw (mono filament wire)

Anonymous
04/03/26(Fri)18:55:35 No.108520387

Anonymous 04/03/26(Fri)18:55:35 No.108520387

cozy breas

Anonymous
04/03/26(Fri)19:10:14 No.108520479

Anonymous 04/03/26(Fri)19:10:14 No.108520479

File: succ_6749.jpg (1000 KB, 2008x1568)

1000 KB JPG

>>108520101
> SDXL released July 26 2023
Its hard to believe it hasn't even been three full years. Feels like an eternity since SDXL came out.

Anonymous
04/03/26(Fri)19:16:53 No.108520507

Anonymous 04/03/26(Fri)19:16:53 No.108520507

>>108520160
and you don't blame the software itself shitting the bed for months?

Anonymous
04/03/26(Fri)19:19:49 No.108520525

Anonymous 04/03/26(Fri)19:19:49 No.108520525

>>108520004
Thank you for baking this thread, anon
>>108520014
Thank you for blessing this thread, anon

Anonymous
04/03/26(Fri)19:20:09 No.108520527

Anonymous 04/03/26(Fri)19:20:09 No.108520527

damn, things have progressed so far since SDXL. We had Dall-E 3, GPT-o, Nano Banana 2, and now per-pixel intelligence with Uni-1. Even ComfyUI and CivitAI grew up, both started only supporting small local models but now feature the best API models the world has to offer. It's great coming back to our roots and remembering where things started with SD1.4, back when we were all stuck using local. Now we can gen 4k outputs with Seedream in under 10 seconds, how crazy is that?

Anonymous
04/03/26(Fri)19:22:17 No.108520538

Anonymous 04/03/26(Fri)19:22:17 No.108520538

File: HE9wsBHbUAIsLPb.jfif.jpg (62 KB, 768x1344)

62 KB JPG

Anonymous
04/03/26(Fri)19:28:54 No.108520561

Anonymous 04/03/26(Fri)19:28:54 No.108520561

>>108520507
Nope. All my gens with anon's chosen artist were busted too, on Forge Neo. Checkpoint didn't know their artist.

Anonymous
04/03/26(Fri)19:40:18 No.108520630

Anonymous 04/03/26(Fri)19:40:18 No.108520630

>>108520527
going to be interesting to watch it unfold over the next few years, a lot of the smaller companies or companies that don't have brand recognition are going to go tits up.
autodesk, adobe and pixologic will all run their own models that get fully integrated into their applications.
chatgpt, x and googles models are already mspaint for boomers and normies to make memes.
hollywood studios will run their own proprietary models.
ironically the local coomers will be the biggest winners, nothing drives tech adoption like pornography. just look at grok.

Anonymous
04/03/26(Fri)20:09:15 No.108520789

Anonymous 04/03/26(Fri)20:09:15 No.108520789

https://huggingface.co/RuneXX/LTX-2-Workflows/blob/main/older_comfy_pre_feb2026/LTX-2%20-%20I2V%20and%20T2V%20Basic%20(Custom%20Audio).json

ltx is pretty great at lip syncing with custom audio: 20s even worked

https://files.catbox.moe/20mzie.mp4

Anonymous
04/03/26(Fri)20:19:35 No.108520849

Anonymous 04/03/26(Fri)20:19:35 No.108520849

https://huggingface.co/SanDiegoDude/JoyAI-Image-Edit-Safetensors
Did anyone try that new edit model?

Anonymous
04/03/26(Fri)20:28:48 No.108520904

Anonymous 04/03/26(Fri)20:28:48 No.108520904

File: ComfyUI_temp_sizay_00039_.png (1.61 MB, 720x1520)

1.61 MB PNG

Anonymous
04/03/26(Fri)20:30:27 No.108520912

Anonymous 04/03/26(Fri)20:30:27 No.108520912

>>108520789
see how good ltx works for sync audio? harry potter is saved.

https://files.catbox.moe/kh3rpd.mp4

Anonymous
04/03/26(Fri)20:32:42 No.108520925

Anonymous 04/03/26(Fri)20:32:42 No.108520925

File: ComfyUI_temp_sizay_00041_.png (2.33 MB, 864x1824)

2.33 MB PNG

Anonymous
04/03/26(Fri)20:34:43 No.108520935

Anonymous 04/03/26(Fri)20:34:43 No.108520935

>>108520925
does chroma give that body shape with just prompt?

Anonymous
04/03/26(Fri)20:38:41 No.108520949

Anonymous 04/03/26(Fri)20:38:41 No.108520949

File: ComfyUI_temp_sizay_00043_.png (2.09 MB, 960x1632)

2.09 MB PNG

Anonymous
04/03/26(Fri)20:39:48 No.108520955

Anonymous 04/03/26(Fri)20:39:48 No.108520955

File: ComfyUI_temp_sizay_00045_.png (2.02 MB, 960x1632)

2.02 MB PNG

Anonymous
04/03/26(Fri)20:40:50 No.108520961

Anonymous 04/03/26(Fri)20:40:50 No.108520961

File: ComfyUI_temp_sizay_00046_.png (2.23 MB, 960x1632)

2.23 MB PNG

Anonymous
04/03/26(Fri)21:08:21 No.108521097

Anonymous 04/03/26(Fri)21:08:21 No.108521097

is z even trainable? I haven't seen a single lora that can remotely generate something that resembles a real penis yet klein 9b had decent loras that could do it from the first couple of weeks after its release.

Anonymous
04/03/26(Fri)21:10:51 No.108521117

Anonymous 04/03/26(Fri)21:10:51 No.108521117

>>108520904
>>108520925
>>108520935
>>108520949
>>108520955
>>108520961
I've got a friend that was asking for these images without the bikini on. Could you please provide these for scientific purposes?

Anonymous
04/03/26(Fri)21:16:04 No.108521138

Anonymous 04/03/26(Fri)21:16:04 No.108521138

File: ComfyUI_temp_zfvnk_00045_.png (2.11 MB, 960x1632)

2.11 MB PNG

Anonymous
04/03/26(Fri)21:16:06 No.108521140

Anonymous 04/03/26(Fri)21:16:06 No.108521140

->>108521097
https://civitai.com/search/models?baseModel=ZImageBase&sortBy=models_v9&query=penis

Anonymous
04/03/26(Fri)21:21:58 No.108521174

Anonymous 04/03/26(Fri)21:21:58 No.108521174

>>108520904
Too old btw

Anonymous
04/03/26(Fri)21:22:06 No.108521175

Anonymous 04/03/26(Fri)21:22:06 No.108521175

File: ComfyUI_00227_.png (3.65 MB, 1248x1824)

3.65 MB PNG

Anonymous
04/03/26(Fri)21:28:20 No.108521204

Anonymous 04/03/26(Fri)21:28:20 No.108521204

File: HEpqyd-aEAAX8Wt.jfif.jpg (475 KB, 1584x2816)

475 KB JPG

Anonymous
04/03/26(Fri)21:39:15 No.108521266

Anonymous 04/03/26(Fri)21:39:15 No.108521266

File: no lora vs no reg vs reg (...).jpg (1.13 MB, 3443x1152)

1.13 MB JPG

Blogpost time.
Experimenting with anima lora training again. I tried training with diffusion-pipe two months ago, but got bad results and the tool was imo quite bad. Using good old sd-scripts now and getting better results, though it's not there yet.
I am experimenting with using a regularization dataset for this style lora. And the result is, mixed? Notwithstanding the fact that it takes twice as long (and anima is already slower to train than sdxl) as not using a regularization dataset, when it works, it seems to improve the results a bit. But often it makes the lora stop working. The lora decides to draw an unrelated style (seems to be mix of stuff in the reg dataset) for some prompts (couldn't find a pattern, all have the TW). And this persists across seeds of the prompt and epochs of the lora. Very weird.
Maybe a larger regularization dataset might help. I am using only 40 images (80 in the training dataset, reg dataset is automatically repeated so there is no mismatch however).
I went with a in the middle --prior_loss_weight value of 0.66. I am skeptical that bigger value would help, considering the problems I am facing, but maybe it performs better with lower values like 0.25-0.33?
Currently using NLP captions for training, it might perform differently with tag captions, or it might not.
The trigger word for my lora is "x style art.". There is no equivalent to this in regularization dataset captions, so maybe adding "y style art", "z style art" to captions might help to prevent unrelated styles from showing up?
1/2

Anonymous
04/03/26(Fri)21:40:18 No.108521272

Anonymous 04/03/26(Fri)21:40:18 No.108521272

File: no lora vs no reg vs reg (...).jpg (825 KB, 3314x1216)

825 KB JPG

>>108521266
Maybe I am approaching with the wrong philosophy? Regularization images I believe are supposed to be different members of the same "class" of whatever I am training, so that the training doesn't fry model's knowledge, and also as general purpose help against overfitting. Given that I am training a style lora, I grabbed artworks from differ artists. Maybe something else like images of underrepresented concepts in my dataset would work better? Or would this backfire and make my lora completely less able to draw regularization data content in the correct style?
There is so little decent resources for this shit man. Trial and error shots in the dark until something works.
Also bunch of other shit I want to test like network_args "train_llm_adapter=True" (might fry the lora but seems worth testing) or shift values (attempted to train a few flow loras in the past, though never experimented with this in detail. Currently using 3.0)
Thanks for reading my blog. Open to suggestions.
2/2

Anonymous
04/03/26(Fri)21:41:49 No.108521280

Anonymous 04/03/26(Fri)21:41:49 No.108521280

File: ComfyUI_temp_zfvnk_00055_.png (2.08 MB, 960x1632)

2.08 MB PNG

I'm so hungry for a pizza rn

Anonymous
04/03/26(Fri)21:43:40 No.108521287

Anonymous 04/03/26(Fri)21:43:40 No.108521287

>>108521272
*make my lora less able to draw regularization data content
*There are so few decent resources for this shit man
I re-wrote this part and left some Frankenstein sentences from the previous draft, my bad.

Anonymous
04/03/26(Fri)21:48:35 No.108521314

Anonymous 04/03/26(Fri)21:48:35 No.108521314

>>108521266
>>108521272
small models just suck at learning texture, they all have this overly smooth look to them. the same happened with SDXL. try the exact same thing on Qwen and it will work fine

Anonymous
04/03/26(Fri)21:48:53 No.108521317

Anonymous 04/03/26(Fri)21:48:53 No.108521317

>>108520003
ez bait

Anonymous
04/03/26(Fri)21:49:44 No.108521321

Anonymous 04/03/26(Fri)21:49:44 No.108521321

>>108521266
>>108521272
Low and slow is the name of the game.
>The trigger word for my lora is "x style art."
Why not as the model itself uses artists? i.e. @artist

Anonymous
04/03/26(Fri)21:51:13 No.108521328

Anonymous 04/03/26(Fri)21:51:13 No.108521328

File: ComfyUI_temp_zfvnk_00061_.png (2.09 MB, 960x1632)

2.09 MB PNG

Anonymous
04/03/26(Fri)21:51:14 No.108521329

Anonymous 04/03/26(Fri)21:51:14 No.108521329

>>108521266
>Currently using NLP captions for training, it might perform differently with tag captions, or it might not.
tdrussell explained how he captioned his dataset on the hf repo so you should probably follow that

Anonymous
04/03/26(Fri)21:52:57 No.108521339

Anonymous 04/03/26(Fri)21:52:57 No.108521339

File: ComfyUI_temp_zfvnk_00062_.png (2.01 MB, 960x1632)

2.01 MB PNG

Anonymous
04/03/26(Fri)21:57:48 No.108521358

Anonymous 04/03/26(Fri)21:57:48 No.108521358

>>108521328
>>108521339
downgrade desu

Anonymous
04/03/26(Fri)21:58:55 No.108521365

Anonymous 04/03/26(Fri)21:58:55 No.108521365

>>108521266
it looks like youre frying the fuck out of it

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.