[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Discussion and Development of Local Image and Video Models

Previous: >>108629083

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
it's not over
>>
File: ComfyUI_22149.png (2.58 MB, 1920x1080)
2.58 MB PNG
>>108638173
>>108638184
So... faster, less resources or both?
>>
>>108639162
we don't have anything to talk, why did you bake?
>>
>>108639228
he only cares about made up schizo drama
>>
>mfw Resource news

04/19/2026

>ZPix: Local AI image generator and editor powered by open image models.
https://github.com/SamuelTallet/ZPix

>Comfy Canvas: Local inline layer based image editor
https://github.com/Zlata-Salyukova/Comfy-Canvas

04/18/2026

>Rose: Range-Of-Slice Equilibration PyTorch optimizer
https://github.com/MatthewK78/Rose

04/17/2026

>ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling
https://yjx-research.github.io/ControlFoley

>TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens
https://research.nvidia.com/labs/toronto-ai/tokengs

>MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
https://aka.ms/mm-webagent

>Qwen2D-VAE
https://huggingface.co/Anzhc/Qwen2D-VAE

>ComfyUI HY-World 2.0 — WorldMirror 3D
https://github.com/AHEKOT/ComfyUI_HYWorld2

>Anima Style Explorer: A free web tool for ComfyUI styles
https://anima.mooshieblob.com

>Stanford AI Index Report 2026
https://hai.stanford.edu/assets/files/ai_index_report_2026.pdf

04/16/2026

>Motif-Video 2B: A micro-budget text-to-video diffusion transformer from Motif Technologies
https://motiftech.io/videoshowcase

>HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
https://huggingface.co/tencent/HY-World-2.0

>ErnieTurbo_extracted_lora
https://huggingface.co/GuangyuanSD/ErnieTurbo_extracted_lora/tree/main

04/15/2026

>DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching
https://huggingface.co/tencent/DisCa

>Lyra 2.0: Explorable Generative 3D Worlds
https://research.nvidia.com/labs/sil/projects/lyra2

>AniGen: Unified S3 Fields for Animatable 3D Asset Generation
https://github.com/VAST-AI-Research/AniGen

>T2I-BiasBench: A Multi-Metric Framework for Auditing Demographic and Cultural Bias in Text-to-Image Models
https://gyanendrachaubey.github.io/T2I-BiasBench
>>
File: downfall-asset-1054100487.jpg (171 KB, 1920x1088)
171 KB JPG
where am I supposed to get my nsfw resources from now that civitai went full retard, are we back to shady businessmen in back alleys?
>>
>mfw Research news

04/19/2026

>Boosting Robust AIGI Detection with LoRA-based Pairwise Training
https://arxiv.org/abs/2604.12307

>A Unified Conditional Flow for Motion Generation, Editing, and Intra-Structural Retargeting
https://arxiv.org/abs/2604.13427

>Decoupled Similarity for Task-Aware Token Pruning in Large Vision-Language Models
https://arxiv.org/abs/2604.11240

>Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models
https://arxiv.org/abs/2604.12582

>One-shot Compositional 3D Head Avatars with Deformable Hair
https://yuansun-xjtu.github.io/CompHairHead.io

>Crowdsourcing of Real-world Image Annotation via Visual Properties
https://arxiv.org/abs/2604.14449

>Chaotic CNN for Limited Data Image Classification
https://arxiv.org/abs/2604.14645

>HTDC: Hesitation-Triggered Differential Calibration for Mitigating Hallucination in Large Vision-Language Models
https://arxiv.org/abs/2604.12115

>Degradation-Consistent Paired Training for Robust AI-Generated Image Detection
https://arxiv.org/abs/2604.10102

>On The Application of Linear Attention in Multimodal Transformers
https://arxiv.org/abs/2604.10064

>Reasoning Resides in Layers: Restoring Temporal Reasoning in Video-Language Models with Layer-Selective Merging
https://arxiv.org/abs/2604.11399

>Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models
https://arxiv.org/abs/2604.14888

>Benchmarking Deflection and Hallucination in Large Vision-Language Models
https://arxiv.org/abs/2604.12033

>Why MLLMs Struggle to Determine Object Orientations
https://arxiv.org/abs/2604.13321

>Quality-Aware Calibration for AI-Generated Image Detection in the Wild
https://grip-unina.github.io/QuAD

>Reward Design for Physical Reasoning in Vision-Language Models
https://arxiv.org/abs/2604.13993

>Seeing Through Circuits: Faithful Mechanistic Interpretability for Vision Transformers
https://arxiv.org/abs/2604.14477
>>
File: ComfyUI_temp_xnbin_00011_.png (2.83 MB, 1602x1024)
2.83 MB PNG
>>
>>108639162
sarah peterson status?
>>
>>108639287
Yeah, let me tell you
>Sarah Petersons BBC Holding Dildo FT15
https://civitai.red/models/466318/sarah-petersons-bbc-holding-dildo-ft15
>Sarah Petersons Black Bred Magazine cover
https://civitai.red/models/717113/sarah-petersons-black-bred-magazine-cover
>Sarah Petersons BBC Spoon FT15
https://civitai.red/models/185076/sarah-petersons-bbc-spoon-ft15
>Sarah Petersons BBC Gangbang Kneeling surrounded
https://civitai.red/models/537775/sarah-petersons-bbc-gangbang-kneeling-surrounded

Happy BBCunday ^^!
>>
>>108639287
in shambles, Indian GDP dropped by 2%
>>
File: ComfyUI_temp_xnbin_00022_.png (3.69 MB, 2756x1024)
3.69 MB PNG
>>
>>108639316
so based..
>>
I haven't been ITT since Z Image and Kleins dropped, what's the current meta? Are the threads still under assault by anus? Is lodestones still a retard?
>>
File: ComfyUI_temp_xnbin_00030_.png (2.47 MB, 2476x1050)
2.47 MB PNG
>>
>>108639162
good boy tran
>>
>>108639351
Anima shows promise for anime stuff, and became Ani's latest target. It's a little smaller than SDXL and much slower, but can do both tags and natural-language prompting. There's even a WaiAnima v1 now that noticeably improves high-res results.
>>
>>108639372
Aaahhh
>>
>>108639372
Wat prompt anon
>>
what the fuck is ERNIE
>>
>>108639351
Kekstone is training his last model on pics of his own poop with disposable camera. Sounds promising...
>>
>>108639493
an another nothingburger
>>
>>108639493
the fastest milkman in the west
>>
>>108639351
Klein-9B-KV was released, which used kv-caching to speed up edit gens by a lot.
>>
>>108639496
>Kekstone is training his last model on pics of his own poop with disposable camera
sounds retarded enough to be true
>>
>>
>>108639572
wtf I want to die for Israel now??
>>
File: 1886981.png (15 KB, 709x86)
15 KB PNG
so its over? owarida?
>>
I keep seeing some fucking crazy NSFW videos on DeviantArt with multi-shot character consistency and audio. How are people doing it? No way it's LTX-2.3
>>
>>108639493
The husband of HERNIA
>>
>>108639634
link
>>
trying image editing for the first time with klein 9b on my 8gb vram, absolute magic
>>
File: ComfyUI_temp_xnbin_00043_.png (2.54 MB, 2476x1024)
2.54 MB PNG
>>108639478
A character sheet multi-view photo 3x3 grid of the woman for dataset creation, white seamless background,
>>
>>108639219
very nice
>>
>>108639653
ye once you get the hang of how to prompt klein for edit it's quite good for the size / speed
>>
>>108639518
>Klein-9B-KV
is it better in other regards too or just faster
>>
>>108639856
how do i prompt Klein to make me a canny filter accurate and not change the style?
>>
>>108639938
Worse but faster imo.
>>
File: [044861].jpg (227 KB, 1300x1300)
227 KB JPG
>>108639698
>>108639372
what model
>>
tdrusell are you here?
>>
File: 74567237272.jpg (377 KB, 1344x768)
377 KB JPG
>>
>>108639962
Is the quality even supposed to be different? The description sounds like it just avoids redundant recomputes by reusing the part that doesn't change.

https://github.com/black-forest-labs/flux2/blob/main/docs/flux2_klein_kv_cache.md
>>
>>108640016
im in my ferrari sports car training v4 but whats up
>>
its up
https://www.youtube.com/watch?v=B6dq0Q5UAaE



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.