[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


File: collage.jpg (2.53 MB, 5000x2637)
2.53 MB JPG
Discussion and Development of Local Image, Video, and Music Models

Previous: >>109099286

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
SDWebUI: https://rentry.org/ldg-lazy-getting-started-guide#the-stable-diffusion-web-ui-lineage
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, & Upscalers
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
https://animadex.net

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>Wan
https://github.com/Wan-Video/Wan2.2

>LTX-2.3
https://huggingface.co/collections/Lightricks/ltx-23

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
>>109101107
>>Maintain Thread Quality
https://rentry.org/LDG_vital_info
>>
Gaming time
>>
>inb4 n*gbo
>>
File: Boogu-Image.jpg (76 KB, 1024x1024)
76 KB JPG
>>
What is the best oriental female lora for ZiT?
>>
>mfw Resource news

06/20/2026

>One Node · FLUX.2 [klein]
https://github.com/yanokusnir-ai/one-node-flux-2-klein

06/19/2026

>FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining
https://github.com/Blue2Giant/FreeStyle

>JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising
https://siang1105.github.io/JanusMesh.github.io

>Linear Recurrent Unit with Semantic Modulation for Image Super-Resolution
https://github.com/MingyuChoi-run/LSM

>LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation
https://github.com/KevinZ0217/LEAP

>StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs
https://hf.co/datasets/shaghayegh/stylistic-bias-dataset

>musubi-tuner adds support for ideogram 4 lora training
https://github.com/kohya-ss/musubi-tuner/blob/dev/docs/ideogram4.md

>KupkaProd Music Video Pipeline
https://github.com/Matticusnicholas/KupkaProd-Music-Video-Pipeline

>Midjourney goes from generating cat images to full-body ultrasound scans
https://www.theverge.com/ai-artificial-intelligence/952011/midjourney-medical-ai-ultrasound-scan

>TeleStyle V2: Beyond Content-Preserving Style Transfer with Self-Distillation and Distribution-Matching-Distillation
https://github.com/Tele-AI/TeleStyleV2

06/18/2026

>UniTemp: Unlocking Video Generation in Any Temporal Order via Bidirectional Distillation
https://lzhangbj.github.io/projects/unitemp

>Reasoning as Intersection: Consensus-Frame Alignment for Visual Focus in Video-MLLMs
https://github.com/1Pansy/VideoCFR

>Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance
https://hustvl.github.io/Moebius

>From Bounding Boxes to Visual Reasoning: An On-Policy Data Annotation Tool for Vision-Language Models
https://github.com/WnQinm/Annotator

>Boogu-Image-0.1-Edit GGUF
https://huggingface.co/realrebelai/Boogu-Image-Edit_GGUFs
>>
>>109101121
Interesting image. Gummyworms
>>
>>109101107
Desu collages are better when they are like 10 images max.
>>
>>109101121
Help! I was reincarnated as a slime!
>>
>mfw Research news

06/20/2026

>Is AI ruining our skills? Early results are in — and they’re not good
https://www.nature.com/articles/d41586-026-01947-1

>Revealing Artifacts via Noise Amplification: A Novel Perspective for AI-Generated Video Detection
https://arxiv.org/abs/2606.16742

>TriFlow: Generating Artist-Like 3D Mesh Topology via Nearest-Vertex Vector Fields
https://arxiv.org/abs/2606.20131

>Addressing Detail Bottlenecks in Latent Diffusion for RGB-to-SWIR Image Translation
https://arxiv.org/abs/2606.19961

>Timestep Rescheduling in Diffusion Inversion
https://arxiv.org/abs/2606.15389

>Human-in-the-Loop Atlas-Based 3D Asset Segmentation for Interactive Content Workflows
https://arxiv.org/abs/2606.17824

>SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction
https://spatialwalk.github.io/SpatialAvatar-0

>ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL
https://arxiv.org/abs/2606.19103

>HiRo: A Compact Four-Directional Hierarchical Reservoir Token-Mixer for Efficient Image Classification
https://arxiv.org/abs/2606.15151
>>
>>109101137
>>109101147
ALERT! Bot spamming possibly infected links! Take care, anon!
>>
>>109101137
>>109101147
Fuck off debo
>>
>>109101113
>please don't look at the things I've done over the years
>I'm so lonely
lol suffer
>>
File: Flux2-Klein_00020_.png (2.24 MB, 896x1152)
2.24 MB PNG
Big Asp 3 apparently finished main training.
Only RL left.
The outputs are a bit rough honestly.
But given this guy's track record I am going to trust the plan and give him the benefit of the doubt.
>>
>>109101147
>>109101137
Can you please stop spamming this every thread?
You don't even add new content you spam the same shit every thread
>>
>>109101137
>>109101147
thanks!
>>
>>109101170
Can you please stop spamming this every thread?
You don't even add new schizo content you spam the same schizo shit every thread
>>
File: Flux2-Klein_00024_.png (1.82 MB, 896x1152)
1.82 MB PNG
>>
File: 1776179177433697.png (612 KB, 1200x630)
612 KB PNG
>>109101137
>>109101147
>>
>It's a homebrew agentic program that uses URL/search/image context and kling/heun/google/grok apis and stitches the result. You can do the same in a google workspace or with claude's MCP functionality.
>I should clarify; Heun is an implicit sampler method (unlike euler which is explicit) meaning it be be used to generate partial image results without knowing the subjects are and then be merged with other methods. So it's very useful for generating virtual 3d spaces and then populating them with objects/characters.
>Most flux models use it. I just have the agent scrub git repos and huggingface for public flux apis with that method
an anon in another board said this the other day, what the hell is he talking about
>>
>>109101183
Looks a lot like Wardour Street in London.
>>
>>109101189
Why are you posting this in every thread?
>>
>>109101196
so you can answer
>>
>>109101202
No this is another fucking ritual post if nobody is responding actually wait to post it, why are you being annoying for the sake of being annoying?
>>
File: 00503-3170671165.png (1.73 MB, 768x1280)
1.73 MB PNG
>>
File: 00510-1183324257.png (1.65 MB, 896x1152)
1.65 MB PNG
>>
>>
>>109101343
Women should look like this
>>
File: ComfyUI_00851_.png (614 KB, 830x703)
614 KB PNG
>>109101107
>two of my gens made it into the general collage

:D

Also I forgot how good Anima's prompt adherence truly is, even with 2 loras attached:


>Wappah \(Artist\), @Wappah01, correct proportions, correct anatomy, 1girl, futanari, solo, light-brown skin, curly hair, pink gradient background, standing, dark-green turtleneck, bottomless, cum drip, viewed from below, half-erect penis, speech bubble saying "ugh why do i feel so hot?" and "and why are you still here?"


https://civitai.red/models/447945/wappah-style-and-characters-anima-or-ill-or-pony
>>
>>109101170
imagine this guys family die
imagine his sorrow
mh..... i get rock hard just imagining it
>>
File: ComfyUI_00853_.png (674 KB, 832x1216)
674 KB PNG
>>
>>
https://www.reddit.com/r/StableDiffusion/comments/1ub4jpk/ltx_director_20_update_a_free_open_source/

neat node for ltx 2.3
>>
File: debo_csa_fia_00043_.png (2.46 MB, 1792x977)
2.46 MB PNG
>>
>>109101383
big if true.
>>
>>109101383
that looks interesting but every time i try video gen i leave disappointed
>>
>>109101362
>AI_Art_Factory
Can you please buy an ad?
>>
>>109101362
>Also I forgot how good Anima's prompt adherence truly is, even with 2 loras attached:
You're friends with Rusell or something? You always show up to do damage control. Just so you know, prompt control and LoRAs depend more on the people making them than on the model itself. And statistically, Anima tends to lose prompt adherence once you stack more than one LoRA. Stop acting like that's some special strength of the model.
>>
>>
File: ComfyUI_temp_lacrb_00020_.png (3.22 MB, 1272x1808)
3.22 MB PNG
>>
>>109101427
rent free
>>
>>109101387
Little kid, debo gen alright
>>
File: ComfyUI_00859_.png (1.09 MB, 832x1216)
1.09 MB PNG
>>109101382
>>
>>109101454
that apartment smells like Norwegian fish farm
>>
File: ComfyUI_00737_.png (1.34 MB, 832x1216)
1.34 MB PNG
>>109101427
>You always show up to do damage control.
for what/who? Please learn to be happy and just post gens.No one cares about this dumb drama (you) keep fanning the flames of. see >>109101113
>>
File: ComfyUI_00859_.png (844 KB, 960x576)
844 KB PNG
>>
File: ComfyUI_temp_lacrb_00031_.png (1.97 MB, 1016x1664)
1.97 MB PNG
>>
File: ComfyUI_00858_.png (782 KB, 576x704)
782 KB PNG
>>
Has Anima been finetuned properly yet?
Did they implement the GRPO lora training thing?
>>
>>109101536
finetuned for what?
>>
I have been sitting with this feeling for a while and I think it's worth saying out loud:

I don't feel fulfilled by Anima the way I used to with Illustrious, I genuinely feel alienated from my own genning tools
>>
>>109101454
>>109101435
>>109101343
Even the earlier Chroma 1 epochs looked better than that
>>
>>109101544
Non-anime art
>>
>>109101545
Babbys first Marx theorist arrived
>>
>>109101536
>Has Anima been finetuned properly yet?
turns out the catastrophic forgetting is real and every attempt turned to dogshit
>>
>>109101536
>GRPO lora training thing?
what?

>>109101561
there's a realism tune on civitai iirc
>>
>>109101545
Anima it's all cold, the gens don't feel like mine anymore. Nothing does, my prompt gets bloated by an LLM, I lose any control over composition, and I just wait for an output. It doesn't feel like creating anymore I'm basically an API endpoint for myself, I submit a request and wait for a response.
I hate Anima, I hate Anima so much there is something deeper about what got lost in this shift.
>>
>>109101560
proof?
>>
File: debo_csa_fia_00048_.png (2.35 MB, 1792x977)
2.35 MB PNG
>>
Ideogram4 can't even generate a loaf of bread.
>>
Before Anima I would spend hours on a single gen, and when it was done, I actually felt like I put a piece of myself into that AI generated image.
Anima is sterile, type a prompt, some LLM bloats it into soup, the machine spits out a result, repeat.
Am I supposed to feel happy about this? No, not at all, I hate this, I hate it with all my might.
>>
>>109101623
Kys, it's now or never
>>
>>109101561

>>109101506
>>109101454
>>109101435
>>109101382
>>109101343
all anima
https://civitai.red/models/2409949/sam-anima-realistic?modelVersionId=3017757
>>
>>109101586
https://github.com/yifan123/flow_grpo
>there's a realism tune on civitai iirc
I meant Western art.
>>
>>109101454
Catbox anon
>>
>>109101545
>>109101595
are you using underscores? don't.
are you using turbo lora? don't, you'll have less control. but it's good for quick gens.
are you using enough steps? you'll get gibberish otherwise.
don't use latent upscale, use higher base res with higher steps.
use the right sampler & scheduler.
>>
>>109101638
https://files.catbox.moe/nugd6r.png
sorry for the mess.
>>
>>109101629
why link 2.0 instead of 2.1
>>
Reflect please, instead of counting what we gained with this new model, think about what we stopped doing, or what was quietly taken from us under the guise of progress.
>>
>>109101644
He's just trolling it's safe to ignore.
>>
>>109101651
Because you will notice 2.1 is only turbo.
>>
File: clip_Double_00002.mp4 (1.33 MB, 960x512)
1.33 MB
1.33 MB MP4
Oh shit! A skeever
>>
File: 1299308529291.png (10 KB, 432x494)
10 KB PNG
>>109101663
>>
>Was it really so important that the machine understood prose?
You lost CLIP.
>Was it really so important that the machine understood relationships between objects?
You lost regional prompter.
>Was it really so important that the machine got it right on the first pass?
You lost hiresfix.
>Was it really so important that the machine gave you five fingers?
You lost inpainting.
>Was it really so important that the machine stopped melting eyes?
You lost adetailer.

You traded your tools for cosmetic fixes you could have just done yourself and you thanked tdrusell for it.
>>
>>109101699
You didn't lose any of these things you fucking incompetent.
>>
>>109101699
imagine watching your family die a gruesome violent graphic death
i get rock hard just thinking of how it would traumatize you
>>
i trained a lora with anima trainflow and it works well but i think it would need some more time in the oven. this is the first time making a lora so i'm not super experienced, is there any way to take an existing checkpoint and use that as a base to continue training with more steps? I have the dataset and everything I used to train it with
>>
File: 1760792807697108.png (30 KB, 804x447)
30 KB PNG
>>109101818
>anima trainflow
>is there any way to take an existing checkpoint and use that as a base to continue training with more steps
yes. stop using this dogshit.
https://github.com/67372a/LoRA_Easy_Training_Scripts
>>
>>109101728
img2img kinda sucks with anima tbdesu
>>
>>109101818
>>109101857
resume in sd-scripts will attempt to load optimizer states. If you actually finished the previous run that's useless. You want --network_weights option if you want to do more training than previously planned.
No idea how that GUI wrapper handles that.
>>
>>109101908
>If you actually finished the previous run that's useless
why? it loads optimizer states, not lr scheduler states
>>
from docs/train_network_advanced.md:
>* `--network_weights=\"<weight file>\"`: Starts training by loading pre-trained LoRA weights. Used for fine-tuning or resuming training. The difference from `--resume` is that this option only loads LoRA module weights, while `--resume` also restores Optimizer state, step count, etc.
>>
>>109101946
--network_weights is for when you don't have optimizer states from previous run, and resuming training without optimizer states is worse, it doesn't mean you should only use --network_weights when you want to do more training
>>
Are non-explicit furry gens accepted around here? I don't like the /trash/ thread very much desu
>>
>>109101964
no
>>
>>109101974
Shame.
>>
The creator of anima should've created a model with more parameters.
>>
>>109101964
Officially? No. Unofficially... depends....
>>
>>109098840
> ACEStep XL with a LoRA goes from being Suno tier to being better than Udio.
Lol, you again.

>Junmin Gong praised the release of SA3 on his X
You even know their chink names.
>>
File: Untitled.jpg (332 KB, 704x1024)
332 KB JPG
>>
with fire
>>
>>109102024
>Junmin Gong
nta, but he is very important, because he released Ace Step 1.5 XL SFT, which I use every day.
>>
>>109101964
Of course you are welcome to post anything.
>>
>>109101168
oh that's that one hot chick that did the snow white movie last year right?
>>
>>
>>
https://files.catbox.moe/ueazag.mp3

I'm changing style, but I like this output, so sharing. ace step 1.5 xl sft.
>>
Just trained an SA3 DoRA using default SAI recommended settings.

Damn... It's just like the target music, holy shit. I honestly can't believe my ears, a few secs on my GPU yields some of the greatest music in the world. Wtf. It learned the style and composition perfectly with short captions. I had to check my dataset like 200 times to make sure it wasn't overbaked or anything like that. I'm mindblown by this thing.

This is the first model that is "truly dangerous" to the music industry in the sense that it puts real musicians and artists at risk. This thing is too good, I'm surprised SAI released it. AI art is just like a toy compared to this. A single HQ song is much more valuable than an obviously slopped image, only analogy is like if we got something capable of making lossless HQ videos indistinguishable from the real deal from start to finish.

You no longer need to learn music theory, sound design, and software navigation as a beginner. And you certainly no longer need 3 to 6 years of consistent practice to produce radio-ready EDM tracks as resources online say. The entire process from ideation, arrangement, composition, mixing, mastering can be thrown out the window. We're in interesting times.

>>109102024
I'm the one who shared samples from previous threads, which btw I found on AceStep discord which is the only place discussing it. Guess how I found out about SA3? Through Junmin Gong. We both have him to thank.
>>
>>109102024
anyway, has anyone released a teen pop lora?
>>
>>109102108
Also, I may have been wrong about needing to be extra precise. This DoRA I trained was with lazy captions, doesn't get better than that.
>>
Your idea about spamming these threads non-stop is delusional.
>>
>>109102108
does sa3 use a 5hz lm that can be turned off? I don't like "computer grid" music.
>>
I'm an unc.
>>
>>109102108
I don't think AI changes anything. If you are not a musician you don't understand what I mean and therefore...
>>
>>109102118
>5hz lm
The model doesn't need an LM like ACEStep, though there's an optional prompt enhancer model (which I haven't and had no need to touch). ComfyUI I think bakes this prompt enhancer in there if that's what you mean.
>>
File: debo_csa_fia_00058_.png (2.15 MB, 1792x977)
2.15 MB PNG
butt
>>
>>109102140
It's too square. ofc computer music is always that way, it aligns to the digital grid.
>>
>>109102138
t. failed musician
Get with the times grandpa
https://github.com/gantasmo/theDAW

I no longer need to spend thousands of dollars on plugins.
>>
>>109102154
You sure are insecure. I don't think a hobbyist needs to spend thousands of dollars on plugins.
>>
File: 1777250193751541.png (253 KB, 1000x714)
253 KB PNG
how do I learn how to make ComfyUI workflows?
I'm shoot brainlet
>>
>>109102163
I have done this too, but if the wf is decent your real task is git gud at prompting.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.