[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Discussion and Development of Local Image and Video Models

Previous: >>108528950

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
File: _AnimaPreview2_00131_.jpg (642 KB, 1608x1160)
642 KB
642 KB JPG
>>
>>108535367
kino
>>
MYTH: api models are censored
FACT: api models are less censored than local models and are in fact trained on NSFW imagery

MYTH: api models are too expensive
FACT: it's actually quite cheap to use API through ComfyUI API Nodes. the price for api has went down in comparison to the price of hardware

MYTH: api nodes collect your data and are unsafe to use
FACT: api is safer than local because nothing is stored on your hard drive. with local models, you need to download hundreds of loras and custom nodes, any of which could be infected

MYTH: an api can pull the plug at any time, why use something like that?
FACT: everything you generate can be saved to your desktop so nothing is lost

MYTH: it's impossible to train a custom style of character with api, loras make local way better
FACT: api can learn any style or character with a single image reference, which is much faster and smarter than loras

MYTH: if i buy api credits and don't like the model, that's money wasted
FACT: comfyUI's API nodes credit system allows you to prompt hundreds of cutting-edge api models. the credits share between models so you aren't locked in to any one ecosystem

MYTH: api users are poor and from third world countries
FACT: the top hollywood productions and anime studios all use api models. api is the weapon of choice for everyone world-wide

MYTH: discussion of api models is off-topic
FACT: api models are part of the comfyui experience and are relevant to this thread. combining api models with local workflows is still local
>>
im becoming too powerful
>>
Blessed thread of frenship
>>
>thread collage has actual hand-drawn hard work from non-AI artists in it

lmao
>>
>>108535397
based anon finally switched to api
>>
File: 1764591721669627.mp4 (2.55 MB, 320x576)
2.55 MB
2.55 MB MP4
>>
>thread is baked
>anon immediately seething
How does baker do it?
>>
>>108535438
By being a retard who can't tell an AI gen from a real painting I guess.
https://x.com/Nyte_Tyde/status/1909771508697964672
>>
File: 1762205263678822.mp4 (2.77 MB, 704x1152)
2.77 MB
2.77 MB MP4
>>
>>108535453
No I meant this anon >>108535396
>>
File: _AnimaPreview2_00142_.jpg (564 KB, 1608x1248)
564 KB
564 KB JPG
>>
File: o_00884_.jpg (848 KB, 1920x1080)
848 KB
848 KB JPG
>>
File: 1768356898419593.mp4 (3.32 MB, 704x1152)
3.32 MB
3.32 MB MP4
>>
File: 1753006344182365.mp4 (3.64 MB, 960x512)
3.64 MB
3.64 MB MP4
>>108535562
>>
File: o_00885_.png (2.97 MB, 1920x1080)
2.97 MB
2.97 MB PNG
>>108535599
nice
>>
File: 1746854201773835.mp4 (1.87 MB, 960x512)
1.87 MB
1.87 MB MP4
>>108535622
>>
nice to see a 2023 ai nostalgia thread
>>
>mfw Resource news

04/05/2026

>ComfyUI-ZImage-Triton: Triton-accelerated W8A8 quantization
https://github.com/newgrit1004/ComfyUI-ZImage-Triton

>ComfyUI Assets Manager v2.4.4 update
https://github.com/MajoorWaldi/ComfyUI-Majoor-AssetsManager/releases/tag/v2.4.4

>From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI
https://blogs.nvidia.com/blog/rtx-ai-garage-open-models-google-gemma-4

>FLUX.2-klein-9B — PolarQuant Q5: 9B rectified flow transformer
https://huggingface.co/caiovicentino1/FLUX.2-klein-9B-PolarQuant-Q5

>Qwen3.5-9B-Neo-PolarQuant-Q5: 9B on any GPU with PolarQuant
https://huggingface.co/caiovicentino1/Qwen3.5-9B-Neo-PolarQuant-Q5

04/04/2026

>STAGE: Storyboard-Anchored Generation for Cinematic Multi-shot Narrative
https://github.com/escapistmost/Storyboard-Anchored-Generation

>Regularizing Attention with Bootstrapping
https://github.com/ncchung/AttentionRegularization

>LTX2.3-Multifunctional: Functionality optimization based on LTX desktop version
https://github.com/hero8152/LTX2.3-Multifunctional

>Gemma 4 31B IT NVFP4 model is quantized with NVIDIA Model Optimizer
https://huggingface.co/nvidia/Gemma-4-31B-IT-NVFP4

>AP Netflix VOID – ComfyUI Custom Nodes
https://github.com/adampolczynski/AP_Netflix_VOID

04/03/2026

>JoyAI-Image: Awakening Spatial Intelligence in Unified Multimodal Understanding and Generation
https://github.com/jd-opensource/JoyAI-Image

>Netflix VOID: Video Object and Interaction Deletion
https://huggingface.co/netflix/void-model

>OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning
https://huggingface.co/tencent/HY-OmniWeaving

>Bias mitigation in graph diffusion models
https://github.com/kunzhan/spp

>Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion
https://dedoardo.github.io/projects/control-dino

>FlowSlider: Training-Free Continuous Image Editing via Fidelity-Steering Decomposition
https://huggingface.co/spaces/dominoer/FlowSlider
>>
File: 00164-535294173.jpg (683 KB, 1664x2432)
683 KB
683 KB JPG
>>
>mfw Research news

04/05/2026

>PANDORA: Pixel-wise Attention Dissolution and Latent Guidance for Zero-Shot Object Removal
https://vdkhoi20.github.io/PANDORA

>A Benchmarking Methodology to Assess Open-Source Video Large Language Models in Automatic Captioning of News Videos
https://arxiv.org/abs/2603.27662

>Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers
https://arxiv.org/abs/2603.27666

>NeedleDB: Generative-AI Based System for Accurate and Efficient Image Retrieval using Complex Natural Language Queries
https://arxiv.org/abs/2603.27464

>Domain-Invariant Prompt Learning for Vision-Language Models
https://arxiv.org/abs/2603.28555

>MolmoPoint: Better Pointing for VLMs with Grounding Tokens
https://arxiv.org/abs/2603.28069

>AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models
https://arxiv.org/abs/2603.29410

>LivingWorld: Interactive 4D World Generation with Environmental Dynamics
https://arxiv.org/abs/2604.01641

>Efficient Inference of Large Vision Language Models
https://arxiv.org/abs/2603.27960

>Wan-R1: Verifiable-Reinforcement Learning for Video Reasoning
https://arxiv.org/abs/2603.27866

>A Robust Low-Rank Prior Model for Structured Cartoon-Texture Image Decomposition with Heavy-Tailed Noise
https://arxiv.org/abs/2603.27579

>CDH-Bench: Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models
https://arxiv.org/abs/2603.27982

>Rényi Entropy: New Token Pruning Metric for Vision Transformers
https://arxiv.org/abs/2603.27900

>HSFM: Hard-Set-Guided Feature-Space Meta-Learning for Robust Classification under Spurious Correlations
https://arxiv.org/abs/2603.29313

>LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation
https://arxiv.org/abs/2603.27693

>Jagle: Building a Large-Scale Japanese Multimodal Post-Training Dataset for Vision-Language Models
https://arxiv.org/abs/2604.02048
>>
> >108535715
thread schizo
>>
File: Image.jpg (244 KB, 719x997)
244 KB
244 KB JPG
>>108535361
https://huggingface.co/circlestone-labs/Anima/discussions/112

Really interesting and thoughtful discussion about Anima’s obvious issues, Qwen’s attention, memory and the whole artist tag dissolution debate.
Feels like it’s time to take a step back, be a bit more realistic about this model, and figure out if it’s actually worth it.
>>
>>108535731
Anima white knights will tell you that @artist tags are outdated tech like loras.
>>
don't care still using anima
>>
File: o_00889_.png (746 KB, 1536x512)
746 KB
746 KB PNG
>>
>>108535731
The only solution I can think of is for tdrusell to rebuild Anima from scratch, but make it style agnostic and move all styles into small loras. That way, he could free up memory to focus only on characters and concepts, and then extract loras from those styles so we can apply them ourselves with different weights, something like DLCs in video games.
>>
File: 1754217352034806.png (303 KB, 538x574)
303 KB
303 KB PNG
>>108535361
https://huggingface.co/circlestone-labs/Anima
>Any LoRA you train on a preview version should be considered a "throwaway" LoRA. There's no guarantee it will work well on the final version.

Any word on when this "final version" will be finished and uploaded? Or is that what preview-2 is supposed to be?
>>
>>108535855
THIS just like saas! we all know that models like GPT-2 Image use microloras for concepts, which is why they're able to accurately display the hands of an analog clock or fill a wine glass to the brim. They load up loras based on your prompts. I'd even better they're really all just running Flux.1 Dev in the background
>>
File: deNE_zi_00021_.png (3.01 MB, 1663x1164)
3.01 MB
3.01 MB PNG
>>
>>108535821
This same issue would’ve probably happened to Rouwei dev with their model that adapts the T5 text encoder to SDXL.
CLIP was a gift from God for anime models.
>>
File: 1747873913537595.png (2.27 MB, 1024x1024)
2.27 MB
2.27 MB PNG
https://huggingface.co/lodestones/Zeta-Chroma/tree/main
this is so bad lmao
>>
That's some pretty good, and thoughtful FUD desu. It matches my thoughts on blending multiple artists being a giant pain in the ass.

>>108535889
Sometimes I forget CLIP is from OpenAI, before they became a giant scam.
>>
File: _AnimaPreview2_00155_.jpg (727 KB, 1696x1160)
727 KB
727 KB JPG
>>
>>108535855
>>108535882
SaaS models probably ran into this same issue much earlier. At some point, having the model know every artist’s style all the time became unnecessary, especially since many art styles contradict each other and just end up confusing the model.
>>
>>108535890
i see why they call it pixel space!
>>
>>108535930
> many art styles contradict each other and just end up confusing the model.
You’ve got a point. It could also be that MoE (Mixture of Experts) technology from LLMs gets applied to diffusion models, where the model doesn’t always activate all its parameters, but instead uses different ones depending on the prompt.
>>
File: mypond.png (3 MB, 1824x1248)
3 MB
3 MB PNG
>>
>>108535890
Lodestone should retire.
>>
File: o_00894_.png (1.6 MB, 1152x896)
1.6 MB
1.6 MB PNG
>>
>>108535890
it's not finished. why are you judging underbaked models?
>>
File: retard.png (1.73 MB, 2100x3600)
1.73 MB
1.73 MB PNG
>>108536014
>it's not finished.
https://xcancel.com/LodestoneRock/status/2040745179372818437#m
>>
chroma will never be finished because >>107962458
>>
File: _AnimaPreview2_00162_.jpg (581 KB, 1072x1792)
581 KB
581 KB JPG
>>
File: o_00895_.png (1.46 MB, 896x1152)
1.46 MB
1.46 MB PNG
>>
>>108535890
why doesn't he try video model?
>>
>>108536051
money doesn't grow on trees
>>
File: deNE_zi_00024_.png (2.93 MB, 1663x1164)
2.93 MB
2.93 MB PNG
>>
>>108536069
Talent neither, he vibe trains models
>>
>>108536038
no model is ever truly finished because you can always make it better.
>>
in kekstone's case, you can always make it worse!
>>
File: screenshot.1775428188.jpg (221 KB, 731x735)
221 KB
221 KB JPG
SPARK Chroma is very promising even at 512 resolution. I'm looking forward to the 1024 version.
>>
>>108535731
the idea of embedding tables and removing artist string to avoid fucking up the semantics is interesting, has any model done this before?
>>
>>108536069
not starting from scratch. he can begin with loras, then merge models, and so on. it's better than spending money on new weird image models
>>
File: image-24.png (10 KB, 1034x35)
10 KB
10 KB PNG
>>108535731
Fuck...
>>
File: ComfyUI_19161.png (2.79 MB, 1500x2000)
2.79 MB
2.79 MB PNG
For the guy that wanted a "lewd" Jennie, ZIT gave me exactly one (and only one!) that wasn't too horrible... scaled down further than usual to help smooth over any flaws.

Happy Easter!
https://files.catbox.moe/7kr6oy.png

>>108535890
...aaaaaaand DONE!
>>
>>108536181
I wish he made gguf versions too
>>
>>108536181
>SPARK Chroma is very promising even at 512 resolution.
can you showcase some images?
>>
>>108536200
why don't you make them yourself?
>>
quick rundown of why CLIP is at the same time outdated and outperforming other encoding methods?
>>
>>108536194
??
>>
>>108535731
Anything that isnt clip will have similar issue, its not Anima specific.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.