/g/ - /ldg/ - Local Diffusion General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/ldg/ - Local Diffusion Genera(...) 04/23/26(Thu)11:09:13 No.108668921

File: highlights_g_108664784_17(...).jpg (2.47 MB, 3418x4647)

/ldg/ - Local Diffusion General Anonymous 04/23/26(Thu)11:09:13 No.108668921

Discussion and Development of Local Image and Video Models

Previous: >>108664784

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon

Anonymous
04/23/26(Thu)11:11:30 No.108668934

Anonymous 04/23/26(Thu)11:11:30 No.108668934

is it over or are we back

Anonymous
04/23/26(Thu)11:13:39 No.108668948

Anonymous 04/23/26(Thu)11:13:39 No.108668948

File: 1755007656454961.jpg (392 KB, 1448x1086)

392 KB JPG

Anonymous
04/23/26(Thu)11:13:41 No.108668949

Anonymous 04/23/26(Thu)11:13:41 No.108668949

File: ComfyUI_10703_.png (368 KB, 1024x1024)

368 KB PNG

Anonymous
04/23/26(Thu)11:14:37 No.108668954

Anonymous 04/23/26(Thu)11:14:37 No.108668954

>>108668948
why is it so brown

Anonymous
04/23/26(Thu)11:15:34 No.108668958

Anonymous 04/23/26(Thu)11:15:34 No.108668958

>>108668954
ghibli niggers did this

Anonymous
04/23/26(Thu)11:16:48 No.108668960

Anonymous 04/23/26(Thu)11:16:48 No.108668960

>>108668948
>GreasePT

Anonymous
04/23/26(Thu)11:18:27 No.108668972

Anonymous 04/23/26(Thu)11:18:27 No.108668972

Why is civitai full of new accounts literally named "abc123abc" commenting in every single z-image lora to make an Ernie version. For fuck sake, just take a look at the Commodore64 lora for Ernie, is disgusting, makes me puke just to stare at the images.

Anonymous
04/23/26(Thu)11:19:29 No.108668978

Anonymous 04/23/26(Thu)11:19:29 No.108668978

my gpu fans are starting to rattle. the end is near

Anonymous
04/23/26(Thu)11:19:45 No.108668983

Anonymous 04/23/26(Thu)11:19:45 No.108668983

>>108668948
get out! >>108653190

Shankism on Discord
04/23/26(Thu)11:21:22 No.108668989

Shankism on Discord 04/23/26(Thu)11:21:22 No.108668989

>civitai split between red boards and blue//green board

Anonymous
04/23/26(Thu)11:29:40 No.108669029

Anonymous 04/23/26(Thu)11:29:40 No.108669029

QRD on Ernie? Is it a meme or can it actually save local?

Anonymous
04/23/26(Thu)11:30:43 No.108669034

Anonymous 04/23/26(Thu)11:30:43 No.108669034

>>108669029
infographic generator

Anonymous
04/23/26(Thu)11:31:22 No.108669037

Anonymous 04/23/26(Thu)11:31:22 No.108669037

>>108668948
that's a lot of inpainting and many hours in gimp

Anonymous
04/23/26(Thu)11:35:27 No.108669058

Anonymous 04/23/26(Thu)11:35:27 No.108669058

File: image.png (32 KB, 373x418)

32 KB PNG

>>108668972
chinks shill army nothing new
they are also shilling chink models in r/localllama right now

Anonymous
04/23/26(Thu)11:36:38 No.108669064

Anonymous 04/23/26(Thu)11:36:38 No.108669064

File: 1760920978918124.png (26 KB, 119x119)

26 KB PNG

>>108668954
the room was prompted to be bathed in warm light with dusty color pallete because it looks cozy
>>108669037
facts. i really like what it did with groks coffee cup

Anonymous
04/23/26(Thu)11:38:09 No.108669070

Anonymous 04/23/26(Thu)11:38:09 No.108669070

Why do ai images look like ai? I can't see the exact reason. How can you then make your gens look less than ai?

Anonymous
04/23/26(Thu)11:42:27 No.108669088

Anonymous 04/23/26(Thu)11:42:27 No.108669088

>mfw Resource news

04/23/2026

>ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control
https://shelley-golan.github.io/ParetoSlider-webpage

>DynamicRad: Content-Adaptive Sparse Attention for Long Video Diffusion
https://github.com/Adamlong3/DynamicRad

>Normalizing Flows with Iterative Denoising
https://github.com/apple/ml-itarflow

>LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
https://github.com/inclusionAI/LLaDA2.0-Uni

>Illustrious XL & NoobAI-XL Style Explorer
https://github.com/ThetaCursed/Illustrious-NoobAI-Style-Explorer

>AI Model & ‘MAGA’ Influencer Emily Hart Unmasked as Indian Man
https://www.yahoo.com/news/articles/ai-model-maga-influencer-emily-091027504.html

04/22/2026

>Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Models
https://github.com/cvims/EMBEDDING-ARITHMETIC

>Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation
https://github.com/CompVis/patch-forcing

>TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
https://github.com/Hong-yu-Zhang/TS-Attn

>AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
https://yutian10.github.io/AnyRecon

>SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing
https://github.com/vivoCameraResearch/SmartPhotoCrafter

>Soft Label Pruning and Quantization for Large-Scale Dataset Distillation
https://github.com/he-y/soft-label-pruning-quantization-for-dataset-distillation

>Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
https://github.com/AMAP-ML/EMF

>Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting
https://github.com/YonseiML/dpw

>IR-Flow: Bridging Discriminative and Generative Image Restoration via Rectified Flow
https://github.com/fanzh03/IR-Flow

Anonymous
04/23/26(Thu)11:43:05 No.108669089

Anonymous 04/23/26(Thu)11:43:05 No.108669089

>>108669070
put "AI" in the negative prompt

Anonymous
04/23/26(Thu)11:43:27 No.108669090

Anonymous 04/23/26(Thu)11:43:27 No.108669090

>mfw Research news

04/23/2026

>Image Generators are Generalist Vision Learners
http://vision-banana.github.io

>Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens
https://randdl.github.io/viewtoken_control

>Hallucination Early Detection in Diffusion Models
https://arxiv.org/abs/2604.20354

>Wan-Image: Pushing the Boundaries of Generative Visual Intelligence
https://arxiv.org/abs/2604.19858

>MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings
https://arxiv.org/abs/2604.19902

>Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing
https://arxiv.org/abs/2604.20258

>Amodal SAM: A Unified Amodal Segmentation Framework with Generalization
https://arxiv.org/abs/2604.20748

>FluSplat: Sparse-View 3D Editing without Test-Time Optimization
https://arxiv.org/abs/2604.20038

>HumanScore: Benchmarking Human Motions in Generated Videos
https://arxiv.org/abs/2604.20157

>Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
https://arxiv.org/abs/2604.20730

>Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation
https://arxiv.org/abs/2604.20366

>Cognitive Alignment At No Cost: Inducing Human Attention Biases For Interpretable Vision Transformers
https://arxiv.org/abs/2604.20027

>X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
https://arxiv.org/abs/2604.20289

>Self-supervised pretraining for an iterative image size agnostic vision transformer
https://arxiv.org/abs/2604.20392

>Efficient INT8 Single-Image Super-Resolution via Deployment-Aware Quantization and Teacher-Guided Training
https://arxiv.org/abs/2604.20291

>From Diffusion to Flow: Efficient Motion Generation in MotionGPT3
https://arxiv.org/abs/2603.26747

Anonymous
04/23/26(Thu)11:43:43 No.108669092

Anonymous 04/23/26(Thu)11:43:43 No.108669092

>>108669037
that's basically what image 2 is doing.
it's a second pass that projects the text onto the genned image. the easiest way to spot it is on clothing, the X for example, it's just sitting on her dress. it's actually almost pixel perfect with the X on the laptop.

Anonymous
04/23/26(Thu)11:44:00 No.108669093

Anonymous 04/23/26(Thu)11:44:00 No.108669093

>>108669070
Hire an artist to paint over it.

Anonymous
04/23/26(Thu)11:46:35 No.108669107

Anonymous 04/23/26(Thu)11:46:35 No.108669107

>>108669092
why cant local models do that?

Anonymous
04/23/26(Thu)11:50:02 No.108669117

Anonymous 04/23/26(Thu)11:50:02 No.108669117

>>108669088
>>108669090
thanks

Anonymous
04/23/26(Thu)11:52:27 No.108669129

Anonymous 04/23/26(Thu)11:52:27 No.108669129

>>108669107
they probably can but nobody is developing the tooling for it, at least not in a user friendly way

Anonymous
04/23/26(Thu)11:53:28 No.108669135

Anonymous 04/23/26(Thu)11:53:28 No.108669135

File: Untitled-1.png (191 KB, 586x664)

191 KB PNG

>>108669107
probably because they don't care, it's a parlor trick to impress indians and boomer investors. sorry to pull the curtain back.
point in case, the gen has uses the same X, it just has a slight skew on the dress. same with the openAI logo, it's just sitting on her shirt.

Anonymous
04/23/26(Thu)11:54:15 No.108669137

Anonymous 04/23/26(Thu)11:54:15 No.108669137

>>108669117
Put the phone away fag

Anonymous
04/23/26(Thu)12:02:44 No.108669182

Anonymous 04/23/26(Thu)12:02:44 No.108669182

>>108669093
Gay
>>108669089
There is no way it's that simple. But now that I think of it putting tags like "masterpiece" seem to help

Anonymous
04/23/26(Thu)12:03:44 No.108669190

Anonymous 04/23/26(Thu)12:03:44 No.108669190

File: image.png (44 KB, 687x567)

44 KB PNG

>>108669137
?

Anonymous
04/23/26(Thu)12:08:56 No.108669231

Anonymous 04/23/26(Thu)12:08:56 No.108669231

>>108669190
api image thread is here >>108653190

Anonymous
04/23/26(Thu)12:09:29 No.108669237

Anonymous 04/23/26(Thu)12:09:29 No.108669237

>>108669182
>masterpiece in the positives helps make outputs not look ai
Erm..... Anon? When's the last time you saw an optometrist??

Anonymous
04/23/26(Thu)12:09:34 No.108669238

Anonymous 04/23/26(Thu)12:09:34 No.108669238

>>108669070
Can we talk about this pls?

Anonymous
04/23/26(Thu)12:10:36 No.108669243

Anonymous 04/23/26(Thu)12:10:36 No.108669243

>>108669135
its a cool trick honestly
hopefully the chinese will be able to reverse engineer it for local models

Anonymous
04/23/26(Thu)12:14:47 No.108669272

Anonymous 04/23/26(Thu)12:14:47 No.108669272

>>108669231
?

Anonymous
04/23/26(Thu)12:18:34 No.108669294

Anonymous 04/23/26(Thu)12:18:34 No.108669294

>>108669243
honestly i think a random person could figure out a better implementation in a few days, local has a lot more head room to fuck around. there are 3d models, i assume they have some kind of texture projection.
you could probably jury-rig something from preexisting nodes. convert a masked area into a plane or 3d topology, project text or an image onto it, then lay it on top of the gen.

Anonymous
04/23/26(Thu)12:36:59 No.108669426

Anonymous 04/23/26(Thu)12:36:59 No.108669426

>I haven't checked in on /ldg/ in a while what are they up to
>Thread gets diverged again
Still at it huh

Anonymous
04/23/26(Thu)12:40:27 No.108669449

Anonymous 04/23/26(Thu)12:40:27 No.108669449

File: _AnimaPreview3_00155_.jpg (382 KB, 1248x1608)

382 KB JPG

Anonymous
04/23/26(Thu)12:41:45 No.108669455

Anonymous 04/23/26(Thu)12:41:45 No.108669455

File: 1748109684850279.jpg (647 KB, 2048x1359)

647 KB JPG

aight, you can now use NAG on Anima
https://github.com/BigStationW/ComfyUI-NAG-Extended
https://github.com/BigStationW/ComfyUI-NAG-Extended/blob/main/workflows/NAG-Anima-ComfyUI-Workflow.json
https://civitai.com/models/2560840/anima-turbo-lora

Anonymous
04/23/26(Thu)12:42:32 No.108669462

Anonymous 04/23/26(Thu)12:42:32 No.108669462

>>108669455
>bigstationw
im sorry i dont use vibesharted code :")

Anonymous
04/23/26(Thu)12:42:42 No.108669464

Anonymous 04/23/26(Thu)12:42:42 No.108669464

What kind of hardware and software/driver combinations do you guys use to generate images and videos and what not?

Anonymous
04/23/26(Thu)12:42:42 No.108669465

Anonymous 04/23/26(Thu)12:42:42 No.108669465

File: _AnimaPreview3_00156_.jpg (387 KB, 1248x1608)

387 KB JPG

Anonymous
04/23/26(Thu)12:43:17 No.108669475

Anonymous 04/23/26(Thu)12:43:17 No.108669475

File: 1563934765591.png (5 KB, 52x44)

5 KB PNG

>turbo lora for a 2b model

Anonymous
04/23/26(Thu)12:43:19 No.108669476

Anonymous 04/23/26(Thu)12:43:19 No.108669476

>>108669455
Thanks king. Does left also use a negative prompt tho?

Anonymous
04/23/26(Thu)12:43:50 No.108669477

Anonymous 04/23/26(Thu)12:43:50 No.108669477

>>108669476
>Does left also use a negative prompt tho?
left can't use a negative prompt, it's at cfg 1

Anonymous
04/23/26(Thu)12:46:07 No.108669492

Anonymous 04/23/26(Thu)12:46:07 No.108669492

>>108669455
https://github.com/pamparamm/ComfyUI-ppm
I've been using this for negative weights while at CFG 1.0, works great, you just have to get used to the fact that you are putting negative weighted tags in the positive prompt instead of writing in the negative prompt. This has worked better for me than NAG ever did.

Anonymous
04/23/26(Thu)12:46:26 No.108669495

Anonymous 04/23/26(Thu)12:46:26 No.108669495

File: 1765410005510165.jpg (34 KB, 576x588)

34 KB JPG

>>108668921
>my Roll-chan made it in the OP

Anonymous
04/23/26(Thu)12:47:06 No.108669500

Anonymous 04/23/26(Thu)12:47:06 No.108669500

>>108669475
if you're using a sophisticated sampler like ClownsharKSampler, going for cfg > 1 + 50 can be really long (like more than 2 mn on my 3090)

Anonymous
04/23/26(Thu)12:47:23 No.108669503

Anonymous 04/23/26(Thu)12:47:23 No.108669503

>>108669477
Would you be so kind as to compare non turbo lora with regular CFG vs NAG? I'm just curious

Anonymous
04/23/26(Thu)12:47:52 No.108669510

Anonymous 04/23/26(Thu)12:47:52 No.108669510

>sophisticated sampler

Anonymous
04/23/26(Thu)12:48:05 No.108669513

Anonymous 04/23/26(Thu)12:48:05 No.108669513

>>108668271
what model did you use anon, looks clean

Anonymous
04/23/26(Thu)12:49:22 No.108669518

Anonymous 04/23/26(Thu)12:49:22 No.108669518

>>108669495
God-tier aesthetics in that series. Shame there's so few images tagged "reaverbot" on Danbooru, I want to gen some fucking bots. Guess I have to train a lora...

Anonymous
04/23/26(Thu)12:49:49 No.108669522

Anonymous 04/23/26(Thu)12:49:49 No.108669522

File: _AnimaPreview3_00162_.jpg (410 KB, 1248x1608)

410 KB JPG

>>108669513
probably zimage turbo

Anonymous
04/23/26(Thu)12:50:00 No.108669523

Anonymous 04/23/26(Thu)12:50:00 No.108669523

Can someone explain why than new fancy chatgpt image thing isn't possible locally? Couldn't you just hook up something like z-image or anima to a smart LLM like Gemma with vision?

Anonymous
04/23/26(Thu)12:50:37 No.108669528

Anonymous 04/23/26(Thu)12:50:37 No.108669528

File: file.png (18 KB, 107x115)

18 KB PNG

I wonder if there is a way to automate gemma 4 with its vision capabilities as an agent + whatever model + inpainting tools to approach the result of the gpt autoregressive model.

Anonymous
04/23/26(Thu)12:50:40 No.108669529

Anonymous 04/23/26(Thu)12:50:40 No.108669529

>>108669495
baker doesnt like my anime2real images sadgely :(

Anonymous
04/23/26(Thu)12:50:47 No.108669531

Anonymous 04/23/26(Thu)12:50:47 No.108669531

>>108669503
the issue is that those NAG parameters don't work for cfg > 1, it can be used yeah but I'm just too lazy to find the right values again, I mean if you already have CFG, adding NAG on top of that is kinda useless imo (and it's slower)

Anonymous
04/23/26(Thu)12:51:50 No.108669540

Anonymous 04/23/26(Thu)12:51:50 No.108669540

>>108669528
Yes

Anonymous
04/23/26(Thu)12:52:01 No.108669544

Anonymous 04/23/26(Thu)12:52:01 No.108669544

turdbo looks so stale i have no idea how anon isnt tired of that look already
it was cool when it came out but its just a demo for ZiB
just use ZiB

Anonymous
04/23/26(Thu)12:52:43 No.108669553

Anonymous 04/23/26(Thu)12:52:43 No.108669553

>>108669449
is there a finetune for anima I didn't hear about?
since when did it do realistic?

Anonymous
04/23/26(Thu)12:52:49 No.108669555

Anonymous 04/23/26(Thu)12:52:49 No.108669555

>>108669528
>inpainting tools
replace that with an edit model like klein and you can probably do it yeah

Anonymous
04/23/26(Thu)12:53:24 No.108669563

Anonymous 04/23/26(Thu)12:53:24 No.108669563

>>108669544
ZIB cant do fine detail

Anonymous
04/23/26(Thu)12:53:48 No.108669570

Anonymous 04/23/26(Thu)12:53:48 No.108669570

File: 1763564705331420.jpg (242 KB, 850x480)

242 KB JPG

How do I anima with krita?

Anonymous
04/23/26(Thu)12:53:57 No.108669571

Anonymous 04/23/26(Thu)12:53:57 No.108669571

>>108669522
alright, thanks

Anonymous
04/23/26(Thu)12:54:06 No.108669573

Anonymous 04/23/26(Thu)12:54:06 No.108669573

>>108669563
its the same VAE yes it does

Anonymous
04/23/26(Thu)12:54:11 No.108669574

Anonymous 04/23/26(Thu)12:54:11 No.108669574

>>108669544
>>108669563
I've seen some workflow where they use ZIB to do the begining of the image (like the first 50% of steps), then switch to ZiT to make it look good

Anonymous
04/23/26(Thu)12:54:42 No.108669577

Anonymous 04/23/26(Thu)12:54:42 No.108669577

>>108669553
Since always. https://civitai.com/models/1662740/lenovo-ultrareal?modelVersionId=2882170 This lora helps a tiny bit.

Anonymous
04/23/26(Thu)12:55:07 No.108669580

Anonymous 04/23/26(Thu)12:55:07 No.108669580

>>108669574
that's what im doing

Anonymous
04/23/26(Thu)12:55:15 No.108669582

Anonymous 04/23/26(Thu)12:55:15 No.108669582

>>108669555
issue is that edit won't be able to target specific things to enhance

>>108669528
can gemma select a part of an image?

Anonymous
04/23/26(Thu)12:55:50 No.108669588

Anonymous 04/23/26(Thu)12:55:50 No.108669588

>>108669574
>switch to ZiT to make it look good
kek i guess if you enjoy distill slop then sure

Anonymous
04/23/26(Thu)12:58:14 No.108669613

Anonymous 04/23/26(Thu)12:58:14 No.108669613

>>108669582
>edit won't be able to target specific things to enhance
yes it can, edit can just modify one specific part of the image, that makes shit easier because you just have to say "hey, add a hat to that girl's head" instead of trying to automate an inpainting process

Anonymous
04/23/26(Thu)12:58:31 No.108669622

Anonymous 04/23/26(Thu)12:58:31 No.108669622

File: 646065966145594.png (1.1 MB, 1248x1824)

1.1 MB PNG

>>108669513
>>108669522
Anima -> ZIT

Anonymous
04/23/26(Thu)12:58:45 No.108669629

Anonymous 04/23/26(Thu)12:58:45 No.108669629

>>108669528
I don't know what it is but they did something else than just "look at this image and fix it".
Even SOTA API models don't really have a super great visual reasoning.
Again I don't know what precisely it is, but they are feeding ChatGPT more than a few hundred visual tokens.

Anonymous
04/23/26(Thu)13:00:34 No.108669646

Anonymous 04/23/26(Thu)13:00:34 No.108669646

i wish lodestones didn't have the attention span of a fruit fly

Anonymous
04/23/26(Thu)13:01:19 No.108669649

Anonymous 04/23/26(Thu)13:01:19 No.108669649

File: _AnimaPreview3_00173_.jpg (383 KB, 1248x1608)

383 KB JPG

>>108669553
I use loras for photography and interior. Haven't uploaded anywhere yet.

Anonymous
04/23/26(Thu)13:02:02 No.108669653

Anonymous 04/23/26(Thu)13:02:02 No.108669653

>>108669528
>>108669629
it's probably something like this
>it makes the image -> it uses its visual encoder to see mistakes -> it makes an edit prompt -> it edits the model
a gemma 4 + klein combo could definitely do the trick

Anonymous
04/23/26(Thu)13:05:41 No.108669683

Anonymous 04/23/26(Thu)13:05:41 No.108669683

Retard here, is there any reason to use Klein 9b base over distill when you do upscaling and editing? Or is it just slower without any real benefit?

Anonymous
04/23/26(Thu)13:05:43 No.108669684

Anonymous 04/23/26(Thu)13:05:43 No.108669684

>>108669622
so anime gen with anima + zit at 0.x denoise to make it realistic?

Anonymous
04/23/26(Thu)13:06:53 No.108669699

Anonymous 04/23/26(Thu)13:06:53 No.108669699

>>108669683
use base with speed lora

Anonymous
04/23/26(Thu)13:07:13 No.108669703

Anonymous 04/23/26(Thu)13:07:13 No.108669703

>>108669503
>>108669531
give me a prompt and a negative prompt, I'll try it out

Anonymous
04/23/26(Thu)13:07:32 No.108669707

Anonymous 04/23/26(Thu)13:07:32 No.108669707

>>108669613
yeah but it's not as precise for things it doesn't know about, or background things, or basically very specific things you want to target

Anonymous
04/23/26(Thu)13:09:25 No.108669722

Anonymous 04/23/26(Thu)13:09:25 No.108669722

>>108669707
>things it doesn't know about
OpenAI probably uses a tool calling to browse the internet, fetch some images and ask the model to merge those images onto the canevas

Anonymous
04/23/26(Thu)13:09:31 No.108669726

Anonymous 04/23/26(Thu)13:09:31 No.108669726

>>108669475
>he doesnt want to gen instantaneously

Anonymous
04/23/26(Thu)13:09:51 No.108669731

Anonymous 04/23/26(Thu)13:09:51 No.108669731

>>108669629
Anything can be bruteforced with enough tokens, and seeing the prices on the api sides, I'm pretty sure it's feeding a whole lot of tokens to refine the image.
The result is good though, and I'd like to see that locally done with the tools we have.

Anonymous
04/23/26(Thu)13:10:56 No.108669742

Anonymous 04/23/26(Thu)13:10:56 No.108669742

File: t5g_gallery_00125_.png (911 KB, 1216x832)

911 KB PNG

https://huggingface.co/TheRemixer/ChenkinNoobRF-T5Gemma-adapter
Neat, T5gemma adapter for Chenkin Noob!

>>108669726
Anima is x2 slower than SDXL and for train Anima loras it's between 2.3 -2.5 slower than sdxl

Anonymous
04/23/26(Thu)13:11:19 No.108669747

Anonymous 04/23/26(Thu)13:11:19 No.108669747

>>108669653
>it uses its visual encoder to see mistakes
this is probably their secret sauce (along with using agents), I think they trained specifically for "wrong looking texts" and details, which means the model is probably very good at spotting that

Anonymous
04/23/26(Thu)13:15:44 No.108669780

Anonymous 04/23/26(Thu)13:15:44 No.108669780

>Anima is x2 slower than SDXL and for train Anima loras it's between 2.3 -2.5 slower than sdxl
loving this indian reasoning

Anonymous
04/23/26(Thu)13:16:34 No.108669785

Anonymous 04/23/26(Thu)13:16:34 No.108669785

>>108668948
It’s rare, the aesthetic is kind of fried, like SD 1.5, but with better coherence. It reminds me of one of those SD 1.5 shitmixes with a lot of inpainting, regional prompter, and Photoshop.

Anonymous
04/23/26(Thu)13:20:43 No.108669812

Anonymous 04/23/26(Thu)13:20:43 No.108669812

>>108669683
In my single testcase the results differed, but base did more unwanted fiddling than distilled. YMMV.

Anonymous
04/23/26(Thu)13:27:05 No.108669856

Anonymous 04/23/26(Thu)13:27:05 No.108669856

File: 410146798802683.png (3.89 MB, 3456x1344)

3.89 MB PNG

>>108669684
Pretty much, expect I start with a realistic Anima gen.

Anonymous
04/23/26(Thu)13:32:20 No.108669896

Anonymous 04/23/26(Thu)13:32:20 No.108669896

>>108669856
>double eyelid zitslopgirl
NinTenLOL

Anonymous
04/23/26(Thu)13:33:26 No.108669907

Anonymous 04/23/26(Thu)13:33:26 No.108669907

File: 968433991998314.png (2.08 MB, 1536x1728)

2.08 MB PNG

Anonymous
04/23/26(Thu)13:33:29 No.108669908

Anonymous 04/23/26(Thu)13:33:29 No.108669908

>>108669742
>SDXL
*pukes*

Anonymous
04/23/26(Thu)13:35:46 No.108669924

Anonymous 04/23/26(Thu)13:35:46 No.108669924

Any way to stop the artist name from showing up with anima? I already have signature, artist name, twitter username, patreon username, and watermark in the negative prompt but it's still doing it.

Anonymous
04/23/26(Thu)13:42:30 No.108669980

Anonymous 04/23/26(Thu)13:42:30 No.108669980

>>108669731
I guess you can take your time trying to min-max llama.cpp params and see if it scales up well enough? I wouldn't be too hopeful but worth a shot.
Maybe 3.6 works better for this, that's also worth experimenting.

Anonymous
04/23/26(Thu)13:44:54 No.108670003

Anonymous 04/23/26(Thu)13:44:54 No.108670003

File: 1747808016100949.jpg (259 KB, 1448x1086)

259 KB JPG

>>108669653
>>108669747
you need a very good model that doesnt use a vae in order to do what gpt 2 is doing
dont waste your time trying to squeeze water out of a stone with these outdated latent diffusion models

Anonymous
04/23/26(Thu)13:47:53 No.108670022

Anonymous 04/23/26(Thu)13:47:53 No.108670022

File: absolute gpt image 2 slop.png (2.96 MB, 1254x1254)

2.96 MB PNG

>>108670003
>you need a very good model that doesnt use a vae in order to do what gpt 2 is doing
the thing is that it's obvious that gpt 2 is still using a vae, when you go for very complex images, it starts to be slopped fast and there's more and more noises and artifacts, it's probably the result of the model doing like 10 edits and at this point the vae issues starts to be really amplified

Anonymous
04/23/26(Thu)13:49:44 No.108670036

Anonymous 04/23/26(Thu)13:49:44 No.108670036

File: woman 2026 04 23 1.png (1.51 MB, 896x1152)

1.51 MB PNG

Anonymous
04/23/26(Thu)13:53:07 No.108670060

Anonymous 04/23/26(Thu)13:53:07 No.108670060

>>108670022
indians think api models are magic.

Anonymous
04/23/26(Thu)13:57:49 No.108670090

Anonymous 04/23/26(Thu)13:57:49 No.108670090

>>108669980
gemma 31B has pretty ok image understanding, and is easy to stop from moralfagging over nsfw

Anonymous
04/23/26(Thu)13:59:30 No.108670109

Anonymous 04/23/26(Thu)13:59:30 No.108670109

File: 1770711519026006.png (2.34 MB, 1024x1024)

2.34 MB PNG

>use pear-shaped figure tag
>turns her into a literal fucking pair
Kek

Anonymous
04/23/26(Thu)14:01:07 No.108670122

Anonymous 04/23/26(Thu)14:01:07 No.108670122

>>108668863
catbox?

Anonymous
04/23/26(Thu)14:01:24 No.108670126

Anonymous 04/23/26(Thu)14:01:24 No.108670126

>>108670109
canon btw

Anonymous
04/23/26(Thu)14:06:14 No.108670176

Anonymous 04/23/26(Thu)14:06:14 No.108670176

>>108670109
what a smug pear.

Anonymous
04/23/26(Thu)14:07:54 No.108670191

Anonymous 04/23/26(Thu)14:07:54 No.108670191

File: _AnimaPreview3_00224_.jpg (370 KB, 1160x1696)

370 KB JPG

Anonymous
04/23/26(Thu)14:08:26 No.108670194

Anonymous 04/23/26(Thu)14:08:26 No.108670194

>>108670022
>there's more and more noises and artifacts
This is due to overly aggressive distillation + RL. See Ernie Turbo for a local example, it is trying to "fake" detail by just having noise on everything.

Anonymous
04/23/26(Thu)14:14:33 No.108670241

Anonymous 04/23/26(Thu)14:14:33 No.108670241

>>108670191
me on the right

Anonymous
04/23/26(Thu)14:18:18 No.108670278

Anonymous 04/23/26(Thu)14:18:18 No.108670278

>>108669856
The problem with this method is that the anatomy more or less sucks.

Talking about anatomy, what model produces the most anatomically accurate gens? I have been using virt a mate + ZIT to make my gens look realistic, but that's kind of a hassle.

Anonymous
04/23/26(Thu)14:19:08 No.108670284

Anonymous 04/23/26(Thu)14:19:08 No.108670284

File: 921750962661488.png (2.26 MB, 1248x1824)

2.26 MB PNG

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.