[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Discussion and Development of Local Image and Video Models

Previous: >>108664784

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
is it over or are we back
>>
File: 1755007656454961.jpg (392 KB, 1448x1086)
392 KB JPG
>>
File: ComfyUI_10703_.png (368 KB, 1024x1024)
368 KB PNG
>>
>>108668948
why is it so brown
>>
>>108668954
ghibli niggers did this
>>
>>108668948
>GreasePT
>>
Why is civitai full of new accounts literally named "abc123abc" commenting in every single z-image lora to make an Ernie version. For fuck sake, just take a look at the Commodore64 lora for Ernie, is disgusting, makes me puke just to stare at the images.
>>
my gpu fans are starting to rattle. the end is near
>>
>>108668948
get out! >>108653190
>>
>civitai split between red boards and blue//green board
>>
QRD on Ernie? Is it a meme or can it actually save local?
>>
>>108669029
infographic generator
>>
>>108668948
that's a lot of inpainting and many hours in gimp
>>
File: image.png (32 KB, 373x418)
32 KB PNG
>>108668972
chinks shill army nothing new
they are also shilling chink models in r/localllama right now
>>
File: 1760920978918124.png (26 KB, 119x119)
26 KB PNG
>>108668954
the room was prompted to be bathed in warm light with dusty color pallete because it looks cozy
>>108669037
facts. i really like what it did with groks coffee cup
>>
Why do ai images look like ai? I can't see the exact reason. How can you then make your gens look less than ai?
>>
>mfw Resource news

04/23/2026

>ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control
https://shelley-golan.github.io/ParetoSlider-webpage

>DynamicRad: Content-Adaptive Sparse Attention for Long Video Diffusion
https://github.com/Adamlong3/DynamicRad

>Normalizing Flows with Iterative Denoising
https://github.com/apple/ml-itarflow

>LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
https://github.com/inclusionAI/LLaDA2.0-Uni

>Illustrious XL & NoobAI-XL Style Explorer
https://github.com/ThetaCursed/Illustrious-NoobAI-Style-Explorer

>AI Model & ‘MAGA’ Influencer Emily Hart Unmasked as Indian Man
https://www.yahoo.com/news/articles/ai-model-maga-influencer-emily-091027504.html

04/22/2026

>Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Models
https://github.com/cvims/EMBEDDING-ARITHMETIC

>Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation
https://github.com/CompVis/patch-forcing

>TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
https://github.com/Hong-yu-Zhang/TS-Attn

>AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
https://yutian10.github.io/AnyRecon

>SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing
https://github.com/vivoCameraResearch/SmartPhotoCrafter

>Soft Label Pruning and Quantization for Large-Scale Dataset Distillation
https://github.com/he-y/soft-label-pruning-quantization-for-dataset-distillation

>Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
https://github.com/AMAP-ML/EMF

>Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting
https://github.com/YonseiML/dpw

>IR-Flow: Bridging Discriminative and Generative Image Restoration via Rectified Flow
https://github.com/fanzh03/IR-Flow
>>
>>108669070
put "AI" in the negative prompt
>>
>mfw Research news

04/23/2026

>Image Generators are Generalist Vision Learners
http://vision-banana.github.io

>Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens
https://randdl.github.io/viewtoken_control

>Hallucination Early Detection in Diffusion Models
https://arxiv.org/abs/2604.20354

>Wan-Image: Pushing the Boundaries of Generative Visual Intelligence
https://arxiv.org/abs/2604.19858

>MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings
https://arxiv.org/abs/2604.19902

>Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing
https://arxiv.org/abs/2604.20258

>Amodal SAM: A Unified Amodal Segmentation Framework with Generalization
https://arxiv.org/abs/2604.20748

>FluSplat: Sparse-View 3D Editing without Test-Time Optimization
https://arxiv.org/abs/2604.20038

>HumanScore: Benchmarking Human Motions in Generated Videos
https://arxiv.org/abs/2604.20157

>Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
https://arxiv.org/abs/2604.20730

>Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation
https://arxiv.org/abs/2604.20366

>Cognitive Alignment At No Cost: Inducing Human Attention Biases For Interpretable Vision Transformers
https://arxiv.org/abs/2604.20027

>X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
https://arxiv.org/abs/2604.20289

>Self-supervised pretraining for an iterative image size agnostic vision transformer
https://arxiv.org/abs/2604.20392

>Efficient INT8 Single-Image Super-Resolution via Deployment-Aware Quantization and Teacher-Guided Training
https://arxiv.org/abs/2604.20291

>From Diffusion to Flow: Efficient Motion Generation in MotionGPT3
https://arxiv.org/abs/2603.26747
>>
>>108669037
that's basically what image 2 is doing.
it's a second pass that projects the text onto the genned image. the easiest way to spot it is on clothing, the X for example, it's just sitting on her dress. it's actually almost pixel perfect with the X on the laptop.
>>
>>108669070
Hire an artist to paint over it.
>>
>>108669092
why cant local models do that?
>>
>>108669088
>>108669090
thanks
>>
>>108669107
they probably can but nobody is developing the tooling for it, at least not in a user friendly way
>>
File: Untitled-1.png (191 KB, 586x664)
191 KB PNG
>>108669107
probably because they don't care, it's a parlor trick to impress indians and boomer investors. sorry to pull the curtain back.
point in case, the gen has uses the same X, it just has a slight skew on the dress. same with the openAI logo, it's just sitting on her shirt.
>>
>>108669117
Put the phone away fag
>>
>>108669093
Gay
>>108669089
There is no way it's that simple. But now that I think of it putting tags like "masterpiece" seem to help
>>
File: image.png (44 KB, 687x567)
44 KB PNG
>>108669137
?
>>
>>108669190
api image thread is here >>108653190
>>
>>108669182
>masterpiece in the positives helps make outputs not look ai
Erm..... Anon? When's the last time you saw an optometrist??
>>
>>108669070
Can we talk about this pls?
>>
>>108669135
its a cool trick honestly
hopefully the chinese will be able to reverse engineer it for local models
>>
>>108669231
?
>>
>>108669243
honestly i think a random person could figure out a better implementation in a few days, local has a lot more head room to fuck around. there are 3d models, i assume they have some kind of texture projection.
you could probably jury-rig something from preexisting nodes. convert a masked area into a plane or 3d topology, project text or an image onto it, then lay it on top of the gen.
>>
>I haven't checked in on /ldg/ in a while what are they up to
>Thread gets diverged again
Still at it huh
>>
File: _AnimaPreview3_00155_.jpg (382 KB, 1248x1608)
382 KB JPG
>>
File: 1748109684850279.jpg (647 KB, 2048x1359)
647 KB JPG
aight, you can now use NAG on Anima
https://github.com/BigStationW/ComfyUI-NAG-Extended
https://github.com/BigStationW/ComfyUI-NAG-Extended/blob/main/workflows/NAG-Anima-ComfyUI-Workflow.json
https://civitai.com/models/2560840/anima-turbo-lora
>>
>>108669455
>bigstationw
im sorry i dont use vibesharted code :")
>>
What kind of hardware and software/driver combinations do you guys use to generate images and videos and what not?
>>
File: _AnimaPreview3_00156_.jpg (387 KB, 1248x1608)
387 KB JPG
>>
File: 1563934765591.png (5 KB, 52x44)
5 KB PNG
>turbo lora for a 2b model
>>
>>108669455
Thanks king. Does left also use a negative prompt tho?
>>
>>108669476
>Does left also use a negative prompt tho?
left can't use a negative prompt, it's at cfg 1
>>
>>108669455
https://github.com/pamparamm/ComfyUI-ppm
I've been using this for negative weights while at CFG 1.0, works great, you just have to get used to the fact that you are putting negative weighted tags in the positive prompt instead of writing in the negative prompt. This has worked better for me than NAG ever did.
>>
File: 1765410005510165.jpg (34 KB, 576x588)
34 KB JPG
>>108668921
>my Roll-chan made it in the OP
>>
>>108669475
if you're using a sophisticated sampler like ClownsharKSampler, going for cfg > 1 + 50 can be really long (like more than 2 mn on my 3090)
>>
>>108669477
Would you be so kind as to compare non turbo lora with regular CFG vs NAG? I'm just curious
>>
>sophisticated sampler
>>
>>108668271
what model did you use anon, looks clean
>>
>>108669495
God-tier aesthetics in that series. Shame there's so few images tagged "reaverbot" on Danbooru, I want to gen some fucking bots. Guess I have to train a lora...
>>
File: _AnimaPreview3_00162_.jpg (410 KB, 1248x1608)
410 KB JPG
>>108669513
probably zimage turbo
>>
Can someone explain why than new fancy chatgpt image thing isn't possible locally? Couldn't you just hook up something like z-image or anima to a smart LLM like Gemma with vision?
>>
File: file.png (18 KB, 107x115)
18 KB PNG
I wonder if there is a way to automate gemma 4 with its vision capabilities as an agent + whatever model + inpainting tools to approach the result of the gpt autoregressive model.
>>
>>108669495
baker doesnt like my anime2real images sadgely :(
>>
>>108669503
the issue is that those NAG parameters don't work for cfg > 1, it can be used yeah but I'm just too lazy to find the right values again, I mean if you already have CFG, adding NAG on top of that is kinda useless imo (and it's slower)
>>
>>108669528
Yes
>>
turdbo looks so stale i have no idea how anon isnt tired of that look already
it was cool when it came out but its just a demo for ZiB
just use ZiB
>>
>>108669449
is there a finetune for anima I didn't hear about?
since when did it do realistic?
>>
>>108669528
>inpainting tools
replace that with an edit model like klein and you can probably do it yeah
>>
>>108669544
ZIB cant do fine detail
>>
File: 1763564705331420.jpg (242 KB, 850x480)
242 KB JPG
How do I anima with krita?
>>
>>108669522
alright, thanks
>>
>>108669563
its the same VAE yes it does
>>
>>108669544
>>108669563
I've seen some workflow where they use ZIB to do the begining of the image (like the first 50% of steps), then switch to ZiT to make it look good
>>
>>108669553
Since always. https://civitai.com/models/1662740/lenovo-ultrareal?modelVersionId=2882170 This lora helps a tiny bit.
>>
>>108669574
that's what im doing
>>
>>108669555
issue is that edit won't be able to target specific things to enhance

>>108669528
can gemma select a part of an image?
>>
>>108669574
>switch to ZiT to make it look good
kek i guess if you enjoy distill slop then sure
>>
>>108669582
>edit won't be able to target specific things to enhance
yes it can, edit can just modify one specific part of the image, that makes shit easier because you just have to say "hey, add a hat to that girl's head" instead of trying to automate an inpainting process
>>
File: 646065966145594.png (1.1 MB, 1248x1824)
1.1 MB PNG
>>108669513
>>108669522
Anima -> ZIT
>>
>>108669528
I don't know what it is but they did something else than just "look at this image and fix it".
Even SOTA API models don't really have a super great visual reasoning.
Again I don't know what precisely it is, but they are feeding ChatGPT more than a few hundred visual tokens.
>>
i wish lodestones didn't have the attention span of a fruit fly
>>
File: _AnimaPreview3_00173_.jpg (383 KB, 1248x1608)
383 KB JPG
>>108669553
I use loras for photography and interior. Haven't uploaded anywhere yet.
>>
>>108669528
>>108669629
it's probably something like this
>it makes the image -> it uses its visual encoder to see mistakes -> it makes an edit prompt -> it edits the model
a gemma 4 + klein combo could definitely do the trick
>>
Retard here, is there any reason to use Klein 9b base over distill when you do upscaling and editing? Or is it just slower without any real benefit?
>>
>>108669622
so anime gen with anima + zit at 0.x denoise to make it realistic?
>>
>>108669683
use base with speed lora
>>
>>108669503
>>108669531
give me a prompt and a negative prompt, I'll try it out
>>
>>108669613
yeah but it's not as precise for things it doesn't know about, or background things, or basically very specific things you want to target
>>
>>108669707
>things it doesn't know about
OpenAI probably uses a tool calling to browse the internet, fetch some images and ask the model to merge those images onto the canevas
>>
>>108669475
>he doesnt want to gen instantaneously
>>
>>108669629
Anything can be bruteforced with enough tokens, and seeing the prices on the api sides, I'm pretty sure it's feeding a whole lot of tokens to refine the image.
The result is good though, and I'd like to see that locally done with the tools we have.
>>
File: t5g_gallery_00125_.png (911 KB, 1216x832)
911 KB PNG
https://huggingface.co/TheRemixer/ChenkinNoobRF-T5Gemma-adapter
Neat, T5gemma adapter for Chenkin Noob!

>>108669726
Anima is x2 slower than SDXL and for train Anima loras it's between 2.3 -2.5 slower than sdxl
>>
>>108669653
>it uses its visual encoder to see mistakes
this is probably their secret sauce (along with using agents), I think they trained specifically for "wrong looking texts" and details, which means the model is probably very good at spotting that
>>
>Anima is x2 slower than SDXL and for train Anima loras it's between 2.3 -2.5 slower than sdxl
loving this indian reasoning
>>
>>108668948
It’s rare, the aesthetic is kind of fried, like SD 1.5, but with better coherence. It reminds me of one of those SD 1.5 shitmixes with a lot of inpainting, regional prompter, and Photoshop.
>>
>>108669683
In my single testcase the results differed, but base did more unwanted fiddling than distilled. YMMV.
>>
File: 410146798802683.png (3.89 MB, 3456x1344)
3.89 MB PNG
>>108669684
Pretty much, expect I start with a realistic Anima gen.
>>
>>108669856
>double eyelid zitslopgirl
NinTenLOL
>>
File: 968433991998314.png (2.08 MB, 1536x1728)
2.08 MB PNG
>>
>>108669742
>SDXL
*pukes*
>>
Any way to stop the artist name from showing up with anima? I already have signature, artist name, twitter username, patreon username, and watermark in the negative prompt but it's still doing it.
>>
>>108669731
I guess you can take your time trying to min-max llama.cpp params and see if it scales up well enough? I wouldn't be too hopeful but worth a shot.
Maybe 3.6 works better for this, that's also worth experimenting.
>>
File: 1747808016100949.jpg (259 KB, 1448x1086)
259 KB JPG
>>108669653
>>108669747
you need a very good model that doesnt use a vae in order to do what gpt 2 is doing
dont waste your time trying to squeeze water out of a stone with these outdated latent diffusion models
>>
File: absolute gpt image 2 slop.png (2.96 MB, 1254x1254)
2.96 MB PNG
>>108670003
>you need a very good model that doesnt use a vae in order to do what gpt 2 is doing
the thing is that it's obvious that gpt 2 is still using a vae, when you go for very complex images, it starts to be slopped fast and there's more and more noises and artifacts, it's probably the result of the model doing like 10 edits and at this point the vae issues starts to be really amplified
>>
File: woman 2026 04 23 1.png (1.51 MB, 896x1152)
1.51 MB PNG
>>
>>108670022
indians think api models are magic.
>>
>>108669980
gemma 31B has pretty ok image understanding, and is easy to stop from moralfagging over nsfw
>>
File: 1770711519026006.png (2.34 MB, 1024x1024)
2.34 MB PNG
>use pear-shaped figure tag
>turns her into a literal fucking pair
Kek
>>
>>108668863
catbox?
>>
>>108670109
canon btw
>>
>>108670109
what a smug pear.
>>
File: _AnimaPreview3_00224_.jpg (370 KB, 1160x1696)
370 KB JPG
>>
>>108670022
>there's more and more noises and artifacts
This is due to overly aggressive distillation + RL. See Ernie Turbo for a local example, it is trying to "fake" detail by just having noise on everything.
>>
>>108670191
me on the right
>>
>>108669856
The problem with this method is that the anatomy more or less sucks.

Talking about anatomy, what model produces the most anatomically accurate gens? I have been using virt a mate + ZIT to make my gens look realistic, but that's kind of a hassle.
>>
File: 921750962661488.png (2.26 MB, 1248x1824)
2.26 MB PNG



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.