[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


Discussion and Development of Local Image, Video, and Music Models

Previous: >>109140721

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
SDWebUI: https://rentry.org/ldg-lazy-getting-started-guide#the-stable-diffusion-web-ui-lineage
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, & Upscalers
https://huggingface.co/models
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner

>Krea 2
https://huggingface.co/krea/Krea-2-Raw
https://huggingface.co/krea/Krea-2-Turbo

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
https://animadex.net

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2.3
https://huggingface.co/collections/Lightricks/ltx-23

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
File: 1767725877651063.mp4 (3.59 MB, 832x1216)
3.59 MB
3.59 MB MP4
>>
>mfw Resource news

06/26/2026

>OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
https://correr-zhou.github.io/OmniShow

>Adobe to Acquire Topaz Labs
https://news.adobe.com/news/2026/06/adobe-to-acquire-topaz-labs

>LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing
https://live-edit.github.io

>PhysRAG: Enhancing Physics-Awareness in Video Generation via Retrieval-Augmented Generation
https://github.com/sediment1024/PhysRAG

>SAM2Matting: Generalized Image and Video Matting
https://henghuiding.com/SAM2Matting

>Unison: Benchmarking Unified Multimodal Models via Synergistic Understanding and Generation
https://github.com/FudanCVL/Unison

>ComfyUI-AppleSilicon-FP8 - a compatibility layer custom node for Apple Silicon
https://github.com/pawel-mazurkiewicz/ComfyUI-AppleSilicon-FP8

06/25/2026

>Bernini-R — GGUF (high & low noise experts)
https://huggingface.co/neuregex/Bernini-R-GGUF

>Physics Question Scene Graph: Fine-grained Evaluation of Physical Plausibility in Text-to-Video Generation
https://github.com/atinpothiraj/pqsg

>VPA-Guard: Defending and Benchmarking Image-to-Video Generation Against Visual Prompt Attacks
https://huggingface.co/datasets/CSU-JPG/VVA-Bench

>Minimalist Preprocessing Approach for Image Synthesis Detection
https://github.com/vohoaidanh/adof

06/24/2026

>Krea-2-Turbo Training Adapter
https://huggingface.co/ostris/krea2_turbo_training_adapter

>Vera: A Layered Diffusion Model for Content-Preserving Video Editing
https://vera-layered-diffusion.github.io

>Advancing WordArt-Oriented Scene Text Recognition: Datasets and Methods
https://github.com/YesianRohn/WATER

>DramaDirector: Geometry-Guided Short Drama Generation
https://github.com/iLearn-Lab/DramaDirector

>PG-MAP: Joint MAP Optimization for Inference-Time Alignment of Diffusion and Flow-Matching Models
https://github.com/sophialanlan/PG-MAP

>Safe Few-Step Generation via Velocity Editing
https://uzn36.github.io/VESFlow
>>
>mfw Research news

06/26/2026

>From Celebrities to Anyone: Characterizing AI Nudification Content, Technology, and Community Dynamics on 4chan
https://arxiv.org/abs/2606.27234

>LearniBridge: Learnable Calibration of Feature Caching for Diffusion Models Acceleration
https://arxiv.org/abs/2606.26778

>LCG: Long-Context Consistent Image Generation with Sparse Relational Attention
https://arxiv.org/abs/2606.26171

>Disco-LoRA: Disentangled Composition of Content, Style, and Motion for Multi-concept Video Customization
https://arxiv.org/abs/2606.26668

>ResilPhase: Plug-and-Play Phase Mapping for Diffusion Acceleration
https://arxiv.org/abs/2606.26769

>NaviCache: Test-Time Self-Calibration Caching for Video Generation
https://arxiv.org/abs/2606.26795

>DanceDuo: Bridging Human Movement and AI Choreography
https://arxiv.org/abs/2606.26507

>PhyEditBench: A Real-World Multi-Stage Benchmark for Physics-Aware Image Editing
https://arxiv.org/abs/2606.26551

>TMP: Tree-structured Mixed-policy Pruning for Large-scale Image Generation and Editing
https://arxiv.org/abs/2606.27089

>DanceOPD: On-Policy Generative Field Distillation
https://danceopd.github.io

>Do Image Editing Models Understand Lighting?
https://arxiv.org/abs/2606.26738

>Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation
https://arxiv.org/abs/2606.26907

>Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation for Text, Vision, and VLMs
https://arxiv.org/abs/2606.26566

>Safe Autoregressive Image Generation with Iterative Self-Improving Codebooks
https://arxiv.org/abs/2606.27147

>SpatialFlow-GRPO: Where Spatial Credit Drives Image Editing
https://arxiv.org/abs/2606.26872

>Ask, Solve, Generate: Self-Evolving Unified Multimodal Understanding and Generation via Self-Consistency Rewards
https://arxiv.org/abs/2606.27376

>Scaling Multi-Reference Image Generation with Dynamic Reward Optimization
https://arxiv.org/abs/2606.26947
>>
File: 5456456121.jpg (750 KB, 2279x2104)
750 KB JPG
>>109143792
>>109143828
It may have done alright in those cases, but the main issue with it is that it fucks up too often. Bad hands, etc... Also it was terrible at styles, worse than Z-Image, and obviously way worse than Krea 2. I'm not doing anything wrong here, I don't think
>>
>mfw API news

>ByteDance launches Seed Audio 1.0 Unified AI Audio Generation for Speech, Music and Ambient Sound Creation
https://fal.ai/models/bytedance/seed-audio-1.0

>Midjourney goes from generating cat images to full-body ultrasound scans
https://www.theverge.com/ai-artificial-intelligence/952011/midjourney-medical-ai-ultrasound-scan

>Alibaba releases HappyHorse 1.1 Available on Alibaba Cloud
https://www.alibabacloud.com/blog/happyhorse-gets-stronger-motion-expressiveness-higher-generation-consistency-and-enhanced-visual-quality_603293

>ByteDance's New AI Video Model Can Make 30-Second Clips From a Single Prompt
https://www.cnet.com/tech/services-and-software/bytedance-introduces-new-seedance-2-5-video-model/

>Luma Introduces Ray3.2 Model & API: Complete Creative Control for Video Generation
https://lumalabs.ai/news

>The Layout Bet — Reve 2.0
https://blog.reve.com/posts/the-layout-bet

>Introducing Gemini Omni — Google’s multimodal video creation/editing model
https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/

>Nano Banana 2 and Nano Banana Pro are generally available via Gemini Enterprise Agent Platform
https://cloud.google.com/blog/products/ai-machine-learning/nano-banana-2-and-nano-banana-pro-are-generally-available

>Grok Imagine 1.5 Preview
https://x.ai/news/grok-imagine-1-5

>Seedance 2.0 in Runway API
https://docs.dev.runwayml.com/api-details/api_changelog/
>>
>>109143898
Krea 2 is good at styles except the faces. Every oil painting gen I've seen from it has an out of place Dreamshaper XL face
>>
File: 2026-06-27_krea2_09.jpg (1.71 MB, 2160x3840)
1.71 MB JPG
I know absolutely nothing about anime genning but I'm having fun.
>>
>>109141439
>I like to believe 4chan appeals to individuals who value open discussion of ideas above all else.
Not really.
I have a hard time getting people to actually post sources for their claims. I've been on 4chan for years and getting people to act intellectually honest is nearly impossible in some discussions. might be a /pol/ thing to some extent but i see it on other boards too.

>>109141614
I'd say its just the barrier to entry on reddit is lower. most normies dont like being insulted over trivial things or some of the more negative aspects of 4chan.

people exaggerate how bad the janitors are on reddit. its probably better for discussion in a lot of areas than here.

i think it also might be that tech savvy people on reddit tend to be more anti-AI than on here.
>>
>>109143928
looks good anon. keep it up. i like the colors
>>
File: 1782015577993142.jpg (292 KB, 832x1216)
292 KB JPG
>>
>>109143930
Whatever problems this site has, they're much worse on reddit. It's a cesspool of toxicity.
>>
File: brv9ky.png (832 KB, 1024x1024)
832 KB PNG
>>
File: remember noobai-vpred?.jpg (897 KB, 1840x1432)
897 KB JPG
>>
>>109143957
Classic SDXL 1.0 vibes.
>>
>>109143930
/pol/ is almost entirely schizos and government agencies talking to each other spreading misinformation on purpose. it's an entirely useless board for conversation of any kind. most people on 4chan are just retard social rejects in general, though.
>>
File: ComfyUI_krea.png (1.13 MB, 1024x1024)
1.13 MB PNG
Krea 2 is the first local model I used that could oneshot this without inpainting or finetuning.
>An Anime screenshot of Hatsune Miku taking a selfie. Her eyes are wide opened and move independently. Her left eye is rolling upwards, and her right eye is rolling downwards.
>>
>>109144045
But her left eye is not rolling upwards, and her right is not rolling downwards.
>>
File: 1769448958035549.jpg (3 KB, 180x51)
3 KB JPG
is this a fucking joke
>>
>>109144060
>gay man makes and runs civitai
>surprised he supports gay people
>>
>>109144058
I'll let it slide and take the W. Every model I've tried just makes them stare at the camera, kek.
>>
>>109143930
reddit is way more corporate and censored
>>
File: 1773783396603212.png (1.33 MB, 1368x768)
1.33 MB PNG
krea is pissing me the fuck off
>>
>>109144085
pick up a pencil then
>>
File: debo_k_00077_.png (2.81 MB, 1792x977)
2.81 MB PNG
>>
>>109144066
i wouldn't even be mad but he logged me out and i have no idea what my login details were
god AI is so hard
>>
>>109144085
whatll make you feel better is sperging out about it for the whole thread and calling anyone who says anything good about it at all a shill

then you could pivot into calling local a sham or whatever but only after youve done the first thing
>>
>>109144060
were you under the impression that CivitAI was run by /pol/tards or something? Why is this surprising in any way whatsoever lmao
>>
>>109144103
no he'll start talking about the VAE and crying about it all day.
>>
>>109144105
normal people exist too bruh
it's not just "gigantic faggots" and "le heckin' other side"
>>
>>109144108
i do understand why it uses qwens vae but i cant say it doesnt make me a bit sad. not sad enough to constantly harp on it tho
>>
>>109144045
I'm not committed enough to that prompt to try it out more than once on a couple of models so I'll take your word for it
>>
>>109144111
normal people don't whine about sharing spaces
>>
>>109143837
What does "kino" and "sovl" mean to you, anon?
>>
File: 1778703930724069.png (3.06 MB, 1780x1004)
3.06 MB PNG
>>109144091
and do what shove it in your asshole, asshole?

>>109144103
i was hoping to get some helpful tips on how to make my gens more close to what they look like in my imagination. i'm not the vae asshole
>>
>>109144139
make the white guy charlie kirk
>>
File: 1764432404615793.png (3.81 MB, 2368x1328)
3.81 MB PNG
>>109144150
doesn't know him :(
>>
anons how do i change my password on civitai
>>
>>109144170
Tell me your old password and new password and I'll update it for you
>>
>>109144170
I haven't used it in months but don't you login through an email verification code?
>>
>>109144177
i have no idea lmao
>>109144178
yeah but any logical person would give the option to change password afterwards. does he really want me to make a new account every time he decides to log me out
>>
>>109144184
retard
>>
File: image.png (701 KB, 720x480)
701 KB PNG
>>109144060
this is just silicon wafer reflection, bro, chill
>>
>>
>>109144224
fucking fag chips
>>
>>109144111
Heh you aren't clearly black pilled
>>
File: ComfyUI_00002_.png (1.79 MB, 1536x944)
1.79 MB PNG
>>
>>109144269
I WILL NOT EAT YOUR FROZEN MEALS GORDON
>>
File: debo_k_00081_.png (2.31 MB, 1792x977)
2.31 MB PNG
>>
File: 1755907850267515.png (1.34 MB, 1368x768)
1.34 MB PNG
is it possible to gen the exact image you imagined inside your head? has anyone ever done this?
>>
>>109144226
dont care. still not paying your subscription altman
>>
>>109144301
Yes, just write
>>
File: debo_k_00083_.png (2.6 MB, 1792x977)
2.6 MB PNG
>>
>>
File: ComfyUI_Chroma_Krea_.png (1.55 MB, 1152x1152)
1.55 MB PNG
New experiment. Chroma 1 HD Flash for initial composition, then taken through Krea for refinement, in a hybrid Chroma-Krea workflow. Same prompts. The results aren't half bad, oneshot fixed the incoherent background on Chroma https://files.catbox.moe/oh4w8f.png
>>
New experiment. Just use Krea2.
>>
>>109144600
The chroma faggot is back
>>
File: debo_k_00087_.png (2.52 MB, 1792x977)
2.52 MB PNG
>>
>>109144600
>The results aren't half bad, oneshot fixed the incoherent background on Chroma
Agreed but I enjoy the look of Choma's more.
>>
>>109144616
The faggot is back
>>
File: Krea2-_00142_.png (2.06 MB, 1120x1792)
2.06 MB PNG
>>
File: 1773867751302649.png (3.73 MB, 2712x1160)
3.73 MB PNG
>>109144362
stop lyieing, it's literally impossible without bounding boxes
>>
how much more like does DiT have in it really
>>
>>109144622
you can spot ideogram from the fucked up eyes. good gen though.
>>
File: 1768061629625267.png (3.74 MB, 2712x1160)
3.74 MB PNG
113th iteration and it's NOT exactly the image inside my head
>>
>>109144600
Did something similar a while back, used Chroma's great understanding of different cinematic and photographic styles, generated lots of ~640 res images, then used cherry-picked ones with img2img ~1280 res on Z-Image Turbo to get detail and nice anatomy and trained a set on the images, results were really good as far as purely synthetic sets go
>>
File: debo_k_00089_.png (2.61 MB, 1792x977)
2.61 MB PNG
>>109144636
its krea2
>>
File: 1759757000222091.png (3.78 MB, 2712x1160)
3.78 MB PNG
what drives people to lie on the internet?
>>
>>109144301
Like generative music i.e. eurorack-esque, it's more about setting conditions that produce interesting or novel outcomes.

>>109144642
Nice gen.
>>
>>109144616
he's been here, he's the only one shilling krea. for someone obsessed with blurry analog ''realism'' he somehow always picks plastic models to shitgen with
>>
>>109144659
Ideogram is dead, go cry in a corner
>>
>>109144667
considering you're the only jeet left shilling chroma in 2026 i'd say you have a pretty bad track record for picking models
>>
File: 1767062357905962.png (3.98 MB, 2712x1160)
3.98 MB PNG
>>109144657
yeah, it's painful getting something precise with krea right now. the future of local ai will be like making little 3d dolls and objects in blender, or some other program like that, posing/composing them exactly how you want, then give that to comfy somehow, or take a screenshot, and put bounding boxes around them describing what they are so that a edit model can turn it into a real picture. at least for getting things from our imagination perfectly right.
>>
File: civitaitrainer.png (22 KB, 957x535)
22 KB PNG
huh, CivitAI trainer has a lot of models now
>>
>>109144690
>the future of local ai will be like making little 3d dolls and objects in blender, or some other program like that, posing/composing them exactly how you want, then give that to comfy somehow, or take a screenshot, and put bounding boxes around them describing what they are so that a edit model can turn it into a real picture. at least for getting things from our imagination perfectly right.
this is already possible
>>
>>109144611
This works extremely well. It's like a finetune with no LoRA. You will quickly find small nuances Krea does not understand and can tune them.

>>109144611
Try doing picrel (unblurred soles) with just Krea. Turns out this idea works extremely well. Until a Chroma style finetune comes around this is the best way to experience a hypothetical Chroma Krea. This was the initial img https://files.catbox.moe/xegzqc.png

For all you haters complained about Chroma, you sure aren't very creative. It also brings all of Chroma's styles to Krea, all of which just needed some further inaccessible form of tuning to look half decent.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.