[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Shift Scheduling Edition

Discussion and Development of Local Image and Video Models

Previous: >>108807440

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, & Upscalers
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
>mfw Resource news

05/13/2026

>AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
https://huggingface.co/nvidia/AnyFlow-Wan2.1-T2V-14B-Diffusers

>RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation
https://github.com/ShmilyQi-CN/RealDiffusion

>OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
https://zghhui.github.io/OmniNFT

>Logit-Attention Divergence: Mitigating Position Bias in Multi-Image Retrieval via Attention-Guided Calibration
https://github.com/brightXian/LAD

>Design Your Ad: Personalized Advertising Image and Text Generation with Unified Autoregressive Models
https://github.com/JD-GenX/Uni-AdGen

>Elastic Attention Cores for Scalable Vision Transformers
https://github.com/alansong1322/VECA

>LychSim: A Controllable and Interactive Simulation Framework for Vision Research
https://lychsim.github.io

>Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation
https://image2code.github.io/vision2code

>ComfyUI-ppm Implements NegPip on the Z-image series
https://github.com/BigStationW/ComfyUI-ppm#clipnegpip

>DreamX-World: A General-Purpose Interactive World Model
https://github.com/AMAP-ML/DreamX-World

>FLUX Identity Adjuster
https://github.com/Magirad/Flux_ID_Adjuster

05/12/2026

>Pixal3D: Pixel-Aligned 3D Generation from Images
https://ldyang694.github.io/projects/pixal3d

>SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
https://github.com/ShanwenTan/SWIFT

>Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models
https://zju-jiyicheng.github.io/Forcing-KV-Page

>Masked Generative Transformer Is What You Need for Image Editing
https://weichow23.github.io/EditMGT

>Pixal3D: Pixel-Aligned 3D Generation from Images
https://ldyang694.github.io/projects/pixal3d

>Micro-Defects Expose Macro-Fakes
https://zbox1005.github.io/MDMF-project
>>
>mfw Research news

05/13/2026

>CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives
https://yihao-meng.github.io/CausalCine

>EPIC: Efficient Predicate-Guided Inference-Time Control for Compositional Text-to-Image Generation
https://arxiv.org/abs/2605.11722

>STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models
https://arxiv.org/abs/2605.11494

>UniFixer: A Universal Reference-Guided Fixer for Diffusion-Based View Synthesis
https://arxiv.org/abs/2605.12169

>Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
https://arxiv.org/abs/2605.12305

>Couple to Control: Joint Initial Noise Design in Diffusion Models
https://arxiv.org/abs/2605.11311

>MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generation
https://arxiv.org/abs/2605.12134

>Beyond Text Prompts: Visual-to-Visual Generation as A Unified Paradigm
https://yaofang-liu.github.io/V2V_Web

>One-Step Generative Modeling via Wasserstein Gradient Flows
https://arxiv.org/abs/2605.11755

>FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity
https://arxiv.org/abs/2605.11869

>UniCustom: Unified Visual Conditioning for Multi-Reference Image Generation
https://arxiv.org/abs/2605.12088

>Cluster-Aware Neural Collapse Prompt Tuning for Long-Tailed Generalization of Vision-Language Models
https://arxiv.org/abs/2605.11939

>L2P: Unlocking Latent Potential for Pixel Generation
https://nju-pcalab.github.io/projects/L2P

>Principled Design of Diffusion-based Optimizers for Inverse Problems
https://arxiv.org/abs/2605.11506

>AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
https://huangrh99.github.io/AlphaGRPO

>A Mimetic Detector for Adversarial Image Perturbations
https://arxiv.org/abs/2605.11492
>>
File: 1768733169304002.jpg (1.76 MB, 1560x2280)
1.76 MB JPG
>>
>>108816682
Nice crow.
>>
>>108816682
Tell me about Crow, why does he wear the randoseru?
>>
kill ani in real life, intercept him on his way home and stab him multiple times
>>
>>108816725
What the fuck?
>>
>>108816764
Pretty sure it's a mental patient sending threats to himself
>>
Blessed thread of frenship
>>
Someone bake the artemis photos lora
>>
>>108816976
Artemis photos? Elaborate
>>
>>108817094
the moon mission
>>
>>108816976
I'm making one for Z Turbo
>>
File: ComfyUI_00137_.png (2.33 MB, 1072x1376)
2.33 MB PNG
>>
File: Juggernaut_Z_V1_00086_.jpg (746 KB, 1344x1728)
746 KB JPG
>>
File: Juggernaut_Z_V1_00090_.jpg (690 KB, 1344x1728)
690 KB JPG
>>
File: ComfyUI_15420.png (3.45 MB, 1728x1152)
3.45 MB PNG
>>108817334
>dat grill
>>
https://github.com/resemble-ai/DramaBox

Holy shit, this thing is the best TTS I've ever tried. It beats absolutely everything.

It's based on LTX-2.3, handles NSFW content perfectly, and can do just about anything you can imagine with near-perfect cross voice cloning. Even though it was released as English-only, it handles my language perfectly.

SAM-Audio can extract nearly any voice from any series or whatever.

Now I’m going to start building my perfect dataset with emotions for a single-voice model using Qwen3-TTS as the backbone, which handles all emotions for real-time inference.

I thought I’d have to wait another 1–2 years for this moment. What a time to be alive.
>>
File: debo_ccsw_anima_00027_.png (3.15 MB, 2048x1117)
3.15 MB PNG
>>
>>108817334
Nice
>>
>>108817360
>Holy shit, this thing is the best TTS I've ever tried. It beats absolutely everything.
What are the newest other models that you tried? Fishaudio S2P, Kokoro, Chatterbox, Qwen 3.5?
>>
>>108817360
>It's based on LTX-2.3
interesting. i already knew ltx was good at voices since i used it to generate a low quality talking videos just to extract the audio from it. but can it generate only spoken audio? i would like something that can generate sound effects as well
>>
>>108817423
None of them are expressive, generate sound effects, ambiance, BGM, etc.
I honestly don't know why there are so many pure TTS models but little T2A models.
>>
>>108817360
Upload some examples?
>>
>>108817496
qwen's default voices and fishaudio s2p are expressive, omnivoice IIRC also had a few tags

sure, t2a models should have appeal
>>
Big Russ is GONE and we are ALONE
>>
King Russ sits permanently on the throne in my heart
>>
anima4...
>>
File: 1773445986291252.jpg (823 KB, 1040x1520)
823 KB JPG
>>
v3 is unironically all you need
>>
File: 1760150424246545.jpg (691 KB, 1040x1520)
691 KB JPG
>>
>>108817511
There are some on their hf pages:
https://huggingface.co/ResembleAI/Dramabox
The output quality is shit but the tone is really good.
>>
>>108817360
VibeVoice is better. DramaBox has better non spech controls but the quality is awful. Sounds like they're in a tin can, which isn't surprising since it's based on LTX.
>>
>>108818172
It's based on LTX2, not even 2.3, so no surprise it sounds really bad.
>>
File: 1771568740099485.jpg (3.28 MB, 2048x3072)
3.28 MB JPG
>>
y no flo from progressive
>>
>>108817360
There's also this one.
https://github.com/ScenemaAI/scenema-audio
>>
File: ComfyUI_00370_.png (862 KB, 896x1152)
862 KB PNG
>>
>>108818393
Kino gen idea but she looks way too plastic.
>>
anyone got a SIMPLE ltx eros i2v workflow lying around?
>>
still having fun with Anima
>>
File: 1753349501286808.png (567 KB, 832x1216)
567 KB PNG
>>
File: ComfyUI_00001_.jpg (2.83 MB, 3584x4608)
2.83 MB JPG
behold my opus
>>
>>
>>
>>
>>108811965
>>108811978
>>108812292
I just got newest Convrot Int8 quant for Chroma and it runs at the same speed as nunchaku. (half of q8)
Quality seems much better although I won't make comparisons with bf16 (I guess I might for something faster like anima)
Int8 inference for diffusion seems to have matured significantly recently. (Surprising how long we've been sitting on it and did nothing worthwhile with it during all these years int8 acceleration had been available.)
This post on plebbit brought the newest int8 technique to my attention:
https://www.reddit.com/r/StableDiffusion/comments/1tazxqz/int8_in_the_age_of_mxfp8_an_investigation_into/
Seems worth considering if you are 3000 or 2000 series.
>>
>>
>>108818610
ooh la la
>>
>>
Any suggestion on how, in ComfyUI, to quickly turn a turbo lora on/off?

I want to use Anima Turbo for quick prototyping, but it is messing up the style I like, so I want to quickly turn it off to get a quality gen.
I can bypass the LORA loader, but I also need to change the sampler setting. Way to do it with a switch woud be great.
>>
>>108818778
People use switches all the time in workflows but I don't know how it is precisely done from memory neither.
All I can say is that it can easily be done by pressing a single toggle. (Bypass node + change cfg and step count)
>>
>>108818818 >>108818778
the easy to understand method many use is to just put the respective nodes into one group / subgraph each and group bypass one or the other
>>
>>108818818
* my personal recommendation would be to also have the other cfg and step count in each of the group or subgraph you toggle. YMMV.
>>
>>108818731
>if you are 3000
That's-a me. Thanks for the info.
>>
>>108818778
just ask claude code/codex to build a subgraph for you inside your current workflow. You'll end up with something totally insane like this, but somehow it will work.
>>
>>108818778
if you just want on/off
>>
File: 1476296753829.png (1.38 MB, 729x1156)
1.38 MB PNG
We might actually be back.
>https://arxiv.org/html/2605.12964v1
>https://hanshengchen.com/asymflow/
>>
>>108818955
And how does switching change sampler setting?
>>
>>108818962
Cool
>>
>>108818962
that does look pretty cool also based on the sample images

but I barely get what they describe they did achieve... a better technique to relate 256*256px space to the actual latent space?
>>
>>108818962
Busy now only briefly skimmed but seems interesting.
Also interesting is that they did this on 9b with the more iffy license.
Will probably post more thoughts later after I finish reading their paper.
>>
>>108818962
hol up
>>
>>108819035
* demo gen on their demo site not yet THAT amazing, but perhaps I'm testing for the wrong capabilities
>>
>schizophrenic situation
>>
https://huggingface.co/spaces/Lakonik/AsymFLUX.2-klein
>generate one (1) image
>You have exceeded your ZeroGPU quota
>>
>>108819058
>ZeroGPU
they call it that for a reason
>>
>>108819058
seems like they released the model? https://huggingface.co/Lakonik/AsymFLUX.2-klein-9B/tree/main
>>
>>108818962
>Finetuning Latent Models into Pixel Models
>Hansheng Chen, Jan Ackermann, Minseo Kim, Gordon Wetzstein, Leonidas Guibas
so which one is lodestone
>>
>>108819069
>707mb
obviously do not download this
>>
>>108819069
That's the adapter
>>
>>108819085 >>108819076
isn't that they thing they made for use with klein 9b?
>>
>>108819071
They don't even consider what he does actual research lmao.
Never seem his stuff pop up in papers.
>>
>>108819089
Yes. There is a usage example in the model card. You load this adapter with the base flux2 klein 9b model
>>
File: asymflux.png (3.43 MB, 1425x1821)
3.43 MB PNG
>>108818962
>>
>>108819100
yes, as far as I can tell that's how you'd use asymflow for local inference right now?

i don't get the complaints >>108819085 >>108819076

i mean sure, you can wait and see how the comfyui implementation will be done but I also wouldn't be surprised if they just kept the klein9b+adapter setup
>>
>>108818731
I also made a bf16 baseline comparison.
I would say it's holding up reasonably well for quanting a just 2B model. You are going to get less divergence from half precision baseline with larger models. And possibly larger speed boost if you need offloading for the bf16/fp16.
You have to enable dynamic lora option and take a 10-15% speed penalty over usual int8 speeds when using loras though. Otherwise loras have very minimal effect.
>>
my niggas
is it possible to train a zimage lora on a 6 gb, 32 gb machine?
thinking of doing the oneitis
>>
>>108819146
Should be indeed possible with offloading, paged 8bit optimizer memes, maybe going for fp8, int8 or something for the weights and gradient accumulation, any combination of these until it's enough.
The quality and speeds being desirable is another question.
Desu you can get 5090 on vast for half a buck an hour.
I would rather train on that rather than heavy VRAMlet workarounds.
>>
>>108819146
i think so, with offloading to RAM

maybe you even just need to change one offloading slider on onetrainer or ai-toolkit or whatever you use
>>
>>108819176
not sure you need to massively quantize, i think just taking the speed hit from offloading to system probably will do?

the quality then shouldn't be worse, it's just not going to be a fast training
>>
>>108819190
I do not know how much speed hit you will take for the significant offloading you are going to do.
If you are not dipping significantly slower than 10 seconds per step, that's still overnight training run while you are sleeping territory.
I guess you can try and see what you get.
I would temper expectations though.
>>
>>108818984
laziest way would be chaining two ksamplers together and just bypassing the one you don't want when you toggle turbo on/off
https://files.catbox.moe/7gva2i.png
>>
>>108818731
>2000 series
*cough cough*... wake up, my ol' betsy...
>>
>>108819205
usually it's a linear type of slowdown with not too extreme a slowdown factor, which is not ideal but also not prohibitive

i'd actually recommend to try it, it's probably not a lost cause unless you have a large training data set and want to train it all at full resolution and all the other stuff that can also pump up the requirements
>>
>>108819237
I have to admit I haven't tried how much impact offloading makes for training speeds.
I will keep this in mind next time I decide to train a lora for a larger model.
>>
File: 1771177556393406.jpg (3.25 MB, 2048x3072)
3.25 MB JPG
>>
>>108819409
holy plastic
>>
File: 1760199990411198.jpg (605 KB, 1840x1328)
605 KB JPG
>>
File: hidream_o1.jpg (438 KB, 2048x2048)
438 KB JPG
not too great at prompt adherence, but i don't hate the hidream o1 randomized anime 1girls
>>
File: hidream_o1_2.jpg (233 KB, 2048x2048)
233 KB JPG
>>
>>108818776
holy kino
>>
>>108819642
>>108819658
is there some noise pattern or am I imagining things?
>>
>>108819642
They lied by colossal margins about what the model is.
It's like someone promising you to buy you a brand new 5090 and then brings a second hand RX 6600 that needs to be repasted and get its broken fan replaced.
It's fast enough to get some curiosity from experimenting with it I suppose.
But also don't zoom in to your HiDream gens.
Once you see it you won't stop seeing it.
>>108819662
It has 32x32 patch artifacts.
>>
>>108819662
i think it has either jpeg flaws trained or some other artifacts, not at all sure if it's just my settings or the model

>>108819666
>They lied by colossal margins about what the model is.
could be, I didn't actually hear the promises before release

but i think it's not bad for 1girl, questionable rating
>>
it's 2026 and people still use wan and illustrious
>>
ltx ooms on my 8gb gpu
>>
>>108819690
because they're nicely trained for anime 1girl, among other things.

>>108819692
probably just offload more?
>>
>>108819694
>probably just offload more?
isn't it offloading automatic already
>>
>>108819680
>i think it has either jpeg flaws trained or some other artifacts,
Pixel space diffusion is done in patches of 32x32 pixels. You need to smooth out transitions between different patches some way.
Well it seems they simply didn't bother to for this garbage.
Zeta-Chroma also has them (alongside its million other issues)
llada also has them.
GLM, despite its shit quality, is the only local pixel space model I know of that doesn't have them.
>not at all sure if it's just my settings or the model
It's the model. Well, maybe they KNOW a way to prevent them, but they didn't bother to include it in the inference code, so it's still the model.
>>
>>108819700
maybe with some tools/workflows? idk.

but since you OOM clearly something didn't work so either reserve moar RAM on whatever automatic mode you're using or just decide manually how much is offloaded in advance of running.
>>
File: ComfyUI_00016_.jpg (2.99 MB, 3584x4608)
2.99 MB JPG
waku waku
can't wait to see how final version of Anima turns out to be



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.