[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


User Error Edition

Discussion and Development of Local Image and Video Models

Previous: >>108524999

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
>mfw Resource news

04/04/2026

>STAGE: Storyboard-Anchored Generation for Cinematic Multi-shot Narrative
https://github.com/escapistmost/Storyboard-Anchored-Generation

>Regularizing Attention with Bootstrapping
https://github.com/ncchung/AttentionRegularization

>LTX2.3-Multifunctional: Functionality optimization based on LTX desktop version
https://github.com/hero8152/LTX2.3-Multifunctional

>Gemma 4 31B IT NVFP4 model is quantized with NVIDIA Model Optimizer
https://huggingface.co/nvidia/Gemma-4-31B-IT-NVFP4

>AP Netflix VOID – ComfyUI Custom Nodes
https://github.com/adampolczynski/AP_Netflix_VOID

04/03/2026

>JoyAI-Image: Awakening Spatial Intelligence in Unified Multimodal Understanding and Generation
https://github.com/jd-opensource/JoyAI-Image

>Netflix VOID: Video Object and Interaction Deletion
https://huggingface.co/netflix/void-model

>OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning
https://huggingface.co/tencent/HY-OmniWeaving

>Bias mitigation in graph diffusion models
https://github.com/kunzhan/spp

>Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion
https://dedoardo.github.io/projects/control-dino

>FlowSlider: Training-Free Continuous Image Editing via Fidelity-Steering Decomposition
https://huggingface.co/spaces/dominoer/FlowSlider

>SafeRoPE: Risk-specific Head-wise Embedding Rotation for Safe Generation in Rectified Flow Transformers
https://github.com/deng12yx/SafeRoPE

>NearID: Identity Representation Learning via Near-identity Distractors
https://gorluxor.github.io/NearID

>Generative World Renderer
https://alaya-studio.github.io/renderer

>Universal Hypernetworks for Arbitrary Models
https://github.com/Xuanfeng-Zhou/UHN

>InTraGen: Trajectory-controlled Video Generation for Object Interactions
https://github.com/insait-institute/InTraGen

>SDXL Node Merger: A visual, node-based model merging tool for Stable Diffusion XL
https://github.com/georgebanjog/sdxl-node-merger
>>
No joke, SaaS models hoarding their weights are causing a second dark age. Just imagine how far ahead we'd be in AI if everyone actually released their stuff.
>>
>mfw Research news

04/04/2026

>PhysVid: Physics Aware Local Conditioning for Generative Video Models
https://arxiv.org/abs/2603.26285

>GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation
https://nicolasvonluetzow.github.io/GaussianGPT

>From Natural Alignment to Conditional Controllability in Multimodal Dialogue
https://arxiv.org/abs/2603.29162

>RecycleLoRA: Rank-Revealing QR-Based Dual-LoRA Subspace Adaptation for Domain Generalized Semantic Segmentation
https://arxiv.org/abs/2603.28142

>LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization
https://arxiv.org/abs/2603.28082

>ConceptWeaver: Weaving Disentangled Concepts with Flow
https://arxiv.org/abs/2603.28493

>IP-Bench: Benchmark for Image Protection Methods in Image-to-Video Generation Scenarios
https://arxiv.org/abs/2603.26154

>AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation
https://arxiv.org/abs/2603.28068

>Engineering Mythology: A Digital-Physical Framework for Culturally-Inspired Public Art
https://arxiv.org/abs/2603.27801

>ANVIL: Accelerator-Native Video Interpolation via Codec Motion Vector Priors
https://arxiv.org/abs/2603.26835

>Understanding Semantic Perturbations on In-Processing Generative Image Watermarks
https://arxiv.org/abs/2603.27513

>ObjectMorpher: 3D-Aware Image Editing via Deformable 3DGS Models
https://arxiv.org/abs/2603.28152

>LongCat-Next: Lexicalizing Modalities as Discrete Tokens
https://arxiv.org/abs/2603.27538

>On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
https://zhaoc5.github.io/DyMoE

>A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
https://riishin.github.io/pid-lvlm-iclr26

>Explaining CLIP Zero-shot Predictions Through Concepts
https://arxiv.org/abs/2603.28211

>WorldFlow3D: Flowing Through 3D Distributions for Unbounded World Generation
https://light.princeton.edu/worldflow3d
>>
>>108528957
>doomposting about saas
Just use API Nodes

MYTH: api models are censored
FACT: api models are less censored than local models and are in fact trained on NSFW imagery

MYTH: api models are too expensive
FACT: it's actually quite cheap to use API through ComfyUI API Nodes. the price for api has went down in comparison to the price of hardware

MYTH: api nodes collect your data and are unsafe to use
FACT: api is safer than local because nothing is stored on your hard drive. with local models, you need to download hundreds of loras and custom nodes, any of which could be infected

MYTH: an api can pull the plug at any time, why use something like that?
FACT: everything you generate can be saved to your desktop so nothing is lost

MYTH: it's impossible to train a custom style of character with api, loras make local way better
FACT: api can learn any style or character with a single image reference, which is much faster and smarter than loras

MYTH: if i buy api credits and don't like the model, that's money wasted
FACT: comfyUI's API nodes credit system allows you to prompt hundreds of cutting-edge api models. the credits share between models so you aren't locked in to any one ecosystem

MYTH: api users are poor and from third world countries
FACT: the top hollywood productions and anime studios all use api models. api is the weapon of choice for everyone world-wide

MYTH: discussion of api models is off-topic
FACT: api models are part of the comfyui experience and are relevant to this thread. combining api models with local workflows is still local
>>
File: 1747666648534082.png (1.66 MB, 1168x1704)
1.66 MB
1.66 MB PNG
>>
>of all the images in the previous thread, these were the ones baker selected

lmao
>>
>>108528964
>MYTH: an api can pull the plug at any time, why use something like that?
>FACT: everything you generate can be saved to your desktop so nothing is lost
this one is just retarded, pretty good otherwise
>>
>>108528988
it's important to free yourself from over-reliance on one specific model. we see it with the sdxl cult, the chromakeks, the mentally ill gpt-4 addicts, etc. the great thing about API models is when one shuts down, 3 better ones take its place. sora may be shutting down, but better models like seedance are emerging. this is why it's extremely important to use API nodes instead of subscribing directly to openAI. with API nodes, your credits are never lost and can be used on any model.
>>
Blessed thread of frenship
>>
has ran killed himself yet?
>>
ive had moderate success using OFTv2 for lora training noob. it doesnt overfit to the style and is mostly coherent on unseen characters. or maybe its because the number of epochs is low, i usually do 3
>>
Does anyone have configs to share for a wan or ltx lora? There are barely any resources for I2V loras.
>>
>>108529159
you won't find that here. we only fling feces around like monkeys in this general
>>
>we
>>
File: 1745517288762417.jpg (242 KB, 850x480)
242 KB
242 KB JPG
>try wan 2.2 image to video tutorial workflow in comfyui
>immediately starts using SWAP and slows my system down
>VRAM only at at 5GB out of 24 (according to system monitor)
>>
>>108529229
prompt?
>>
>>108529229
>starts using SWAP
Amd?
This was fixed with dynamic memory.
>>
File: 1759783423067820.png (1.76 MB, 1072x1880)
1.76 MB
1.76 MB PNG
>>
>>108529183
Thought I'd at least try but it does seem unfortunate
>>
>>108529247
>Amd
Yes
>>
File: 00126-1690926911.jpg (877 KB, 1664x2432)
877 KB
877 KB JPG
>>
>>108529489
ToT
>>
File: 1761993571768767.png (1.67 MB, 1256x1704)
1.67 MB
1.67 MB PNG
>>
>>108529352
Joever.
I don't think Comfy has any reason to bother fixing traditional memory allocation with current superior method in place, so you gotta wait until (if) he implements dynamic memory for amd.
>>
I don't know how finetuners train their models, but the real challenge is avoiding two types of narcissism:
-the ones who train for clout and overtrain their models on CivitAI slop,
-the ones who finetune based on whatever images they personally like while ignoring whether the model can actually do anything.
>>
how do i become a professional diffuser?
>>
I think I found a more objective way to judge whether a model is actually good or not, and that is by testing it with img2img without using ControlNet.
Bad or poor trained models tend to have various blind spots when come to seeing, interpret and transform an already existing image
>>
And thats where after testing i realized that WAI 16 (SDXL) and Anima are the only two anime models that can reliably convert different kinds of realistic or semi realistic illustrations into anime without falling apart or showing blind spots.

All the other popular shitnerges merges, including Chenkin and Noob, have noticeable blind spots, whether in scene composition, background consistency, or worst of all, character positioning and body parts.
>>
Noob variants struggle with img2img unless the input is already a clean anime style image. They do not understand fingers, they fail at rendering pupils, and they break down as soon as they are pushed outside the type of data they were trained on. That tells me these are still immature models.
>>
Some might argue that these models were trained only on anime, and that trying to convert realistic images to anime through img2img is obviously not what they are meant for.
But that argument does not really hold up. This is exactly where Anima proves its value. It is a model trained specifically on anime, just like WAI, yet both are capable of adapting to almost any input image, regardless of style or scene. They see the img, they understand it, they reinterpret the input in anome way without falling apart and caring about the initial image.
>>
File: 1772320734907389.png (1.59 MB, 1432x1432)
1.59 MB
1.59 MB PNG
>>
>>108529640
There's been progress on AMD support, though I think it's currently waiting on a couple of external bugs to be fixed for it to work right.

https://github.com/Comfy-Org/comfy-aimdo/pull/2
>>
File: 00548-913746885.jpg (793 KB, 1664x2432)
793 KB
793 KB JPG
main model: NoobAI-XL vpred
refiner: WAI-realism (LOL)

i mostly gen with anima now but i kinda miss that noob style.... -_-
>>
File: ComfyUI_00080_.png (3.23 MB, 1248x1824)
3.23 MB
3.23 MB PNG
>>108529944
just main pass anima and upscale with noob. You have to prompt it differently, but it can really lock in style and bring out texture.
>>
>>108529745
Do it for pleasure not for ego
>>
>>108529992
I actually do exactly that currently.
I still think Anima favors simplicity, while the last official version of Noob is somewhat overtuned and favors high detail/complexity.....
>>
>>108530101
>I still think Anima favors simplicity, while the last official version of Noob is somewhat overtuned and favors high detail/complexity.....
Can you be more specific?
>>
>>108529831
Would you be willing to share your prompt?
>>
File: 1748361619147590.png (1.8 MB, 1168x1792)
1.8 MB
1.8 MB PNG
>>108530140
it was something like https://pastebin.com/tmjA5vaJ
>>
>>108530117
Well, not being able to blend artist styles in quite the same way as SDXL is a pain in the ass. I'd like to reuse my prompts from when Noob was the hottest model here, but it's just not the same and you have to rewrite them. The seamless blending of styles was a side effect of Clip. Anima can blend styles, but it seems like one really dominates.

I think Noob still has better lighting and special effects, I use a fair amount of those. Depth of field, etc all that stuff.
>>
File: 858568782472427.jpg (1.75 MB, 1664x2432)
1.75 MB
1.75 MB JPG
daily anima gen
>>
File: 1769103042479694.png (2.98 MB, 1920x1080)
2.98 MB
2.98 MB PNG
>>
File: 0679485683583.jpg (2.46 MB, 1664x2432)
2.46 MB
2.46 MB JPG
>>
File: 1760114763583281.png (2.55 MB, 1920x1080)
2.55 MB
2.55 MB PNG
>>
File: deNS_zi_00043_.png (3.7 MB, 1663x1164)
3.7 MB
3.7 MB PNG
>>
Netayumesisters.....
>>
File: 1753426776398516.png (2.82 MB, 1920x1080)
2.82 MB
2.82 MB PNG
>>
File: 86479357835683568.jpg (1.39 MB, 1664x2432)
1.39 MB
1.39 MB JPG
>>
This is probably a massive leap of faith, but is there any good, local, 3d model generation? One that can even generate animatable stuff such as characters, and animations for it.
>>
File: 1754896974466774.png (3.01 MB, 1920x1080)
3.01 MB
3.01 MB PNG
>>108530734
not really
>>
File: 1766412509021832.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
>>108528965
>>108529314
>>108529831
Cute
>>
File: 1763746277223535.png (3.16 MB, 1920x1080)
3.16 MB
3.16 MB PNG
>>
>>108530777
why are you praising yourself? are you schizophrenic?
>>
File: 1771821502546281.png (1.45 MB, 1024x1024)
1.45 MB
1.45 MB PNG
Is there a good tag autocomplete for comfy? Typing out character \(series\) is tedious, especially for Fate shit.

>>108530813
I'm a different anon thoughever.
>>
>>108531056
Perhaps https://github.com/newtextdoc1111/ComfyUI-Autocomplete-Plus
>>
File: 1770267914494594.png (287 KB, 562x805)
287 KB
287 KB PNG
Gemma 4 is definitely competent and based in captioning, please google, give us a local model as well :( >>108531320
>>
>>108528965
>>108529314
>>108529530
>>108529831
This style is so sick. What model and loras are you using?
>>
>>108530513
>>108530514
>>108530568
Daily anima fud then
>>
File: 1764997542240478.webm (3.68 MB, 1920x1056)
3.68 MB
3.68 MB WEBM
Babe wake up, they released a decent local world model
https://xcancel.com/Skywork_ai/status/2039305679966720411
>>
File: schizo.png (140 KB, 1150x312)
140 KB
140 KB PNG
>>108531438
Daily Anifart fud then



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.