[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Discussion and Development of Local Image and Video Models

Previous: >>108659074

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
File: 186950.jpg (1.11 MB, 1024x1536)
1.11 MB JPG
What happened to local?
>>
Blessed thread of frenship
>>
>>108664800
may 2024 you say
>>
>>108664800
the worst part is that Alibaba is still here, feeding the llm fags, but they seem to have abandoned us :( >>108664796
>>
Am I supposed to be upset that a company is no longer supporting local? It's happened before and every time someone else steps up.
>>
>>108664800
>>108664809
>GPT was able to deduce Alibaba's sellout plan 2 years ago, we just didn't listen
holy shit, saas is actually insanely powerful
>>
Why are the api niggers so upset at local? They mad they can't gen boobies or what?
>>
>>108664820
at this point its instigating or participating in a flame war desu
>>
>armchair jannying as if jannies give a single fuck about AI threads
>>
>>108664817
>It's happened before and every time someone else steps up.
yeah anyone crying about it is either very very new or just trolling
>>
>>108664817
>It's happened before and every time someone else steps up.
what if no one else steps up?
>>
>I am supposed to be upset she slept with another man? It's happened before
they really are localkeks after all!
>>
Are you equating a company with a woman because you have never had sex before
>>
>>108664800
sharty alert
sharty alert
sharty retard
>>
>>108664833
go on then post some nippies or vagene
>>
File: 1750235241161732.jpg (239 KB, 853x1844)
239 KB JPG
>>108664820
>They mad they can't gen boobies or what?
have you not seen the grok threads? we can get our fill of boobs whenever we want
>>
Local Diffusion?
>>
>>108664862
Then why is you so upset little nigga
>>
>>108664784
Thank you for baking this thread, anon
>>108664802
Thank you for blessing this thread, anon
>>
>mfw Resource news

04/22/2026

>Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Models
https://github.com/cvims/EMBEDDING-ARITHMETIC

>Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation
https://github.com/CompVis/patch-forcing

>TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
https://github.com/Hong-yu-Zhang/TS-Attn

>AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
https://yutian10.github.io/AnyRecon

>SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing
https://github.com/vivoCameraResearch/SmartPhotoCrafter

>Soft Label Pruning and Quantization for Large-Scale Dataset Distillation
https://github.com/he-y/soft-label-pruning-quantization-for-dataset-distillation

>Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
https://github.com/AMAP-ML/EMF

>Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting
https://github.com/YonseiML/dpw

>IR-Flow: Bridging Discriminative and Generative Image Restoration via Rectified Flow
https://github.com/fanzh03/IR-Flow

>TRELLIS.2-stableprojectorz: Trellis.2 optimized to fit inside 8GB gpus
https://github.com/IgorAherne/TRELLIS.2-stableprojectorz

>Fizgig — Klein 9B LoRA Studio
https://github.com/shootthesound/Fizgig

>Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items
https://huggingface.co/datasets/TaobaoTmall-AlgorithmProducts/Tstars-VTON

04/21/2026

>MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Style Mapping
https://jeoyal.github.io/MegaStyle

>UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models
https://github.com/Yovecent/UDM-GRPO

>Noise-Adaptive Diffusion Sampling for Inverse Problems Without Task-Specific Tuning
https://github.com/NA-HMC/NA-HMC
>>
File: 1748735612394339.jpg (326 KB, 1672x941)
326 KB JPG
>>108664868
im not upset, im happy as a clam genning these cool pics with gpt image 2
>>
>mfw Research news

04/22/2026

>Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation
https://arxiv.org/abs/2604.18215

>Diff-SBSR: Learning Multimodal Feature-Enhanced Diffusion Models for Zero-Shot Sketch-Based 3D Shape Retrieval
https://arxiv.org/abs/2604.19135

>ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis
https://arxiv.org/abs/2604.19720

>Long-Text-to-Image Generation via Compositional Prompt Decomposition
https://arxiv.org/abs/2604.18258

>HP-Edit: A Human-Preference Post-Training Framework for Image Editing
https://arxiv.org/abs/2604.19406

>Geometric Decoupling: Diagnosing the Structural Instability of Latent
https://arxiv.org/abs/2604.18804

>CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layers
https://arxiv.org/abs/2604.19632

>Allo SR $^2$: Rectifying One-Step Super-Resolution to Stay Real via Allomorphic Generative Flows
https://arxiv.org/abs/2604.19238

>Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation
https://arxiv.org/abs/2604.19234

>Deep sprite-based image models: An analysis
https://arxiv.org/abs/2604.19480

>LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Models
https://arxiv.org/abs/2604.18803

>Hierarchically Robust Zero-shot Vision-language Models
https://arxiv.org/abs/2604.18867

>Rethinking Dataset Distillation: Hard Truths about Soft Labels
https://arxiv.org/abs/2604.18811

>Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning
https://arxiv.org/abs/2604.19009

>Benign Overfitting in Adversarial Training for Vision Transformers
https://arxiv.org/abs/2604.19724

>BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation
https://arxiv.org/abs/2604.16514
>>
>>108664887
is that GW2?
>>
File: HGjBeyBbUAAoSnV.jpg (235 KB, 941x1672)
235 KB JPG
Why do people throw a fit about Grok/GPT/NBP gens? They are supported fully in ComfyUI now and integrate well into local workflows. Despite this, the same freetards continue to cry over them. This isn't linux general you boomers.
>>
>>108664901
>>108664887
>>108664862
>This isn't linux general you boomers.
it literally is, it's a local thread, if you want to spam your API garbage you're simply off topic, what's hard to understand about that? get the fuck out >>108653190
>>
imagine local living so rent free in your mind that you cant stop lurking and posting here KEK
>>
>>108664901
>>108653190
>>
File: 65236327326123.jpg (1.31 MB, 1664x2432)
1.31 MB JPG
>>
>try Images 2.0 to add some text on an image that just happens to have a shirtless man in the background
>I'm focused on the text don't even notice the man. prompt to "make it a little bigger"
>We’re so sorry, but the image we created may violate our guardrails around nudity, sexuality, or erotic content. If you think we got it wrong, please retry or edit your prompt.
local will still be needed.
>>
File: 1758771180858155.png (2.1 MB, 1086x1449)
2.1 MB PNG
>>108664993
i dont think gpt image is that strict, i just genned this a minute ago
try playing around with the prompt a little
>>
>>108665013
>>108653190
>>
>>108664901
is this really what your life has come down to? Doing le epic troll 8+ hours a day on /ldg/? Surely there are better ways of spending your limited time on this planet?
>>
>>108664993
>We’re so sorry, but the image we created may violate our guardrails around nudity, sexuality, or erotic content. If you think we got it wrong, please retry or edit your prompt.
>change the prompt to say make the text a little bigger
>just werks
maybe another day local will be needed
>>
>>108664901
>grok

Enjoy your 480p vertical videos gens lmao
>>
File: Flux2-Klein_00101_.jpg (265 KB, 1088x1920)
265 KB JPG
>>108664993
>search for court cases
>chatgpt spitting out text
>deletes reply
>Stopped searching
>ask it the same question
>I have to time it and click stop before it deletes reply
Censorship is cancer.
>>
File: 1772551419712987.png (1.45 MB, 832x1216)
1.45 MB PNG
Realism lora for anima
https://civitai.com/models/1662740/lenovo-ultrareal
>>
>>108665062
huge
>>
>>108665062
It'll be so easy to pick up a fuck ton of cred by making a very simple style lora for realism with anima. Fuck I should do that.
>>
>>108665062
>Only trained on 30 pics.
No way is that enough. I sleep.
>>
>>108665089
Where does it say 30 images?
>>
are there any good frame interpolators? RIFE just blurs between frames
>>
File: file.png (160 KB, 420x330)
160 KB PNG
ai could never

>>108665062
my crystal maidens are all chubby now
>>
>>108665115
In the lora metadata.
>>
So on tdrussell's Rutowski lora config it says 1000 epochs on a 153 image dataset.
Surely that doesn't mean anima needs 153k steps for a lora?
How many epochs did it actually go through?
>>
>>108665149
Going off epochs is for chumps. It's 2k - 4k steps depending on the style.
>>
>>108665149
Apparently I can't read the civit page
>This version corresponds to 40 epochs (120 passes over the data when considering the 3 resolutions)
120x153 steps.
>>108665161
He sets LR low so it needs more steps I think.
>>
>>108665149
I'm literally overfitting with 250 steps bs4
>>
>>108665149
Anima training is for Linux, retvrun to mugen
>>
>>108665169
Just get your LoRA to 2k steps and start saving every 500, go until 4k and pick which version you think looks the best.
>>
>>108665169
>>108665188
To clarify, I mean when using his default hyperparams. It works quite well.
>>
Has anyone experimented whether anima responds nicely to timestep shifts during training btw?
>>108665188
I am skeptical you actually trained a decent anima lora with the method you preach.
I had a very bad time trying to train lora for anima with high LR and lower step counts.
All epochs looked shit.
>>
Why did tdrusell only release his training on Linux? This is some kind of Triton bad faith move...
>>
>>108665198
>>108665192
I also tried a low LR run (coincidentally similar to his, tried before he published his example lora) with 8-10k steps (I think, I don't remember too well), that also looked bad. Maybe the 18k ballpark figure is needed, I intend to try that.
>>
File: 743725727.jpg (1.05 MB, 1664x2432)
1.05 MB JPG
>>
>>108665198
Anima trains fine with a decent dataset, his sane defaults, and somewhere between 2k and 4k steps. Rarely do I have to go up to 4k. Often around 3k is fine. I have yet to feel the need to change any params from his example LoRA.
>>
>>108665200
He's a piece of shit. Obviously gonna put his training scripts behind Linux since he knows Linux users are less artsy and more sloppers than Windows Mac artists where it just werks.
>>
>>108665205
Just looked it up.
I tried 8k steps with 0.00004 LR. I am planning to try 0.00002 or 0.00003 with 18k or slightly below that.
Oh btw I just remembered his 18k steps is with gradient_accumulation_steps = 4
So does that loosely equal 4500 "real" steps?
>>
anima is really separating the wheat from the chaff
>>
>>108665205
>>108665247
Just use his trainer and the config from his example. All I'm meaning is it just werked for me.
>>108665200
You should already be running Linux desu.
>>
File: file.png (3.68 MB, 1536x2560)
3.68 MB PNG
>>108665149
I did about 5k steps with 200 images for this stonetoss lora, using big russ's example config. Prior to that I did a couple with prodigy optimizer and they turned out ok too.

>>108665062
Seemed pretty bad when I tested it.

>>108665126
Have you tried InterpAny-Clearer? It's some stuff built on top of RIFE, I never really noticed blurring that much with RIFE, just awful artifacting with fast motion which InterpAny-Clearer fixed for me. It's a bit slower than RIFE I think.
>>
what are you guys using to train anima anyway?
>>
>>108665307
my computer
>>
>>108665200
keep seething wintroon
>>
>>108665310
This PC*
>>
>>108665062
He needs bigger dataset if he wants to go this dimly lit grainy shit route with Anima
>>
>>108665307
very funny anon
may your loras overfit and your gens fail
>>
remember all the retards claiming it couldnt be trained due to catastrophic forgetting
good times
>>
>>108665304
>I did about 5k steps with 200 images for this stonetoss lora, using big russ's example config. Prior to that I did a couple with prodigy optimizer and they turned out ok too.
Thanks for the reference point anon.
>>
>>108665374
don't forget the retard claiming it wouldnt be trained due to licensing
>>
File: _AnimaPreview3_00048_.jpg (368 KB, 1160x1696)
368 KB JPG
OneTrainer support FUCKING when
>>
>>108665129
aaaaiiiieee i'm not paying. fuck the police
>>
>>108665198
I know I can ask a chatbot but I'd rather ask a real anon with real experience. What are the benefits of timestep shifts? Or, what are they used for?
>>
File: GPT 2 Partner Nodes.png (696 KB, 1486x527)
696 KB PNG
What we gennin' tonight, GPTgods??
>>
doing anima3 -> zit, works pretty well for creating realistic images
but I'm struggling with anima3 -> klein9b, has anyone had any luck getting good outputs from this kind of workflow?
>>
i'm getting astray heads/body parts with the anima turbo lora, is there anything i can do to fix it? already tried editing the prompt and changing steps from 12 to 8 but the problem persists. it doesn't happen without the lora. it also doesn't happen if i use the highres aesthetic boost at a certain value (definitely doesn't on 1) but i dislike the aesthetics of turbo + high res, they remove too much detail together
>>
>>108665451
It comes from the SD3 paper. Shift values above 1 make the model spend more time on higher timesteps/sigmas, which improve composition for flow models (anything SD3 or newer). Going too high makes the image blurry/fucks up details so you can't crank it indefinitely. This is for genning images. If you never touched it, Comfy automatically uses the value 3 for most image models, feel free to experiment with model sampling node sometime.
I am not too well versed about precise impact for lora training, but it might help under some situations I believe.
>>
>>108665473
nevermind, i tested it and still happened with both loras at 1.0. fuck. i guess it's the turbo lora. i like the aesthetics too, i wish i could get them even without dumbing down the model
>>
>>108665477
>I am not too well versed about precise impact for lora training, but it might help under some situations I believe.
you adjust it regarding dataset complexity and training resolution

>>108665473
>>108665490
can you post image, not sure what you mean exactly
>>
File: 1776836990754323.jpg (639 KB, 1328x1640)
639 KB JPG
>>
>>108665501
So what shift value would you use for full 1024p anima lora training on circa 100 images for a decently complex style?
>>
>>108665524
No clue, I haven't done testing with anima. I'd stick to settings russel guy uses
>>
If v3 isn't the final version, what more is he going to add? Just further highres training?
If v3 is the final version, e621 injection wen?
>>
I got a 5090. What's the best model I can run?
>>
File: 999.jpg (1.94 MB, 3456x1840)
1.94 MB JPG
Ohh, I still have some of my old 1.5 gens on my PC
>>
>>108665644
More hopeful days



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.