Discussion and Development of Local Image and Video ModelsPrevious: >>108756500https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, & Upscalershttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.info>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/musubi-tunerhttps://github.com/tdrussell/diffusion-pipe>Zhttps://huggingface.co/Tongyi-MAI/Z-Image>Animahttps://huggingface.co/circlestone-labs/Animahttps://tagexplorer.github.io/>Qwenhttps://huggingface.co/collections/Qwen/qwen-image>Kleinhttps://huggingface.co/collections/black-forest-labs/flux2>LTX-2https://huggingface.co/Lightricks/LTX-2>Wanhttps://github.com/Wan-Video/Wan2.2>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>Illustrioushttps://rentry.org/comfyui_guide_1girl>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkCollage: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/r/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debohttps://rentry.org/animanon
cocky go boing boing
>>108757904progress report anon?
>mfw Resource news05/05/2026>Decision Boundary-aware Generation for Long-tailed Learninghttps://github.com/keepdigitalabc-svg/DBG>Motion-Aware Caching for Efficient Autoregressive Video Generationhttps://github.com/ywlq/MotionCache>SIFT-VTON: Geometric Correspondence Supervision on Cross-Attention for Virtual Try-Onhttps://github.com/takesukeDS/SIFT-VTON>Linear-Time Global Visual Modeling without Explicit Attentionhttps://github.com/LeapLabTHU/WeightFormer>Local Dream 2.4.3 - SDXL support, tag autocomplete and morehttps://github.com/xororz/local-dream/releases/tag/v2.4.3>Sora’s downfall signals broader problems with AI’s creative utilityhttps://theconversation.com/soras-downfall-signals-broader-problems-with-ais-creative-utility-28001305/04/2026>UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priorshttps://houyuanchen111.github.io/UniVidX.github.io>BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesishttps://github.com/MaxRondelli/BlenderRAG>It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Modelshttps://akoepke.github.io/divgen/index.html>Walkyrie 1.3B — Text-to-Image https://huggingface.co/kpsss34/Walkyrie-1.3B-v1.0>Caption Creator: Fst and portable tool for image captions and tagshttps://github.com/Merserk/Caption-Creator>VulkanForge: Vulkan-based LLM inference engine in Rusthttps://github.com/maeddesg/vulkanforge>FastSDCPU release v1.0.0-beta.301https://github.com/rupeshs/fastsdcpu/releases/tag/v1.0.0-beta.301>Deepbooru TagWalkerhttps://github.com/Elliezrah/deepbooru-tagwalker05/02/2026>Sulphur 2: An uncensored video generation model based on LTX 2.3https://huggingface.co/SulphurAI/Sulphur-2-base05/01/2026>Representation Fréchet Loss for Visual Generationhttps://github.com/Jiawei-Yang/FD-loss>Caption Generator Prohttps://github.com/CoolGenius-123/Caption-Generator-Pro
>mfw Research news05/05/2026>TrajShield: Trajectory-Level Safety Mediation for Defending Text-to-Video Models Against Jailbreak Attackshttps://arxiv.org/abs/2605.01761>SteeringDiffusion: A Bottlenecked Activation Control Interface for Diffusion Modelshttps://arxiv.org/abs/2605.01653>Disciplined Diffusion: Text-to-Image Diffusion Model against NSFW Generationhttps://arxiv.org/abs/2605.01113>ScribbleEdit: Synthetic Data for Image Editing with Scribbles and Texthttps://arxiv.org/abs/2605.01135>AttnRouter: Per-Category Attention Routing for Training-Free Image Editing on MMDiThttps://arxiv.org/abs/2605.01480>SwiftPie: Lightning-fast Subject-driven Image Personalization via One step Diffusionhttps://arxiv.org/abs/2605.01510>CSGuard: Toward Forgery-Resistant Watermarking in Diffusion Models via Compressed Sensing Constrainthttps://arxiv.org/abs/2605.01479>MOC-3D: Manifold-Order Consistency for Text-to-3D Generationhttps://arxiv.org/abs/2605.01743>VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animationhttps://yukinonooo.github.io/VAnimProject>Skipping the Zeros in Diffusion Models for Sparse Data Generationhttps://arxiv.org/abs/2605.01817>Unifying Deep Stochastic Processes for Image Enhancementhttps://arxiv.org/abs/2605.01568>MIRL: Mutual Information-Guided Reinforcement Learning for Vision-Language Modelshttps://arxiv.org/abs/2605.01520>GEASS: Training-Free Caption Steering for Hallucination Mitigation in Vision-Language Modelshttps://arxiv.org/abs/2605.01733
>>108763579>>108763585thanks!
Russell abandoned us
>>108763550You dropped >>108758016
>>108763536>old 1.2meaning? Anyways, I don't speak Japanese either, but this hobby introduced me to the music of those two particular artists and I like them. I always follow along with romaji translations, but now I have some motivation to actually learn Japanese.>the vocals are heartfelt. anyway, what's the theme of this one?Theme is whatever Claude gave me a few days ago to test the LoRA kek, here's the lyrics, then followed by romaji and its English translation (Given to the model entirely in standard Japanese, which either the merge or the LoRA has improved). https://files.catbox.moe/9ors79.txtIn that case it skipped a line or two, but in my testing it's actually very rare now compared to how often it happens in the Turbo model.
big russ... please, come back...
20 seconds on a 4gb gpu but at what cost
>>108762944
>>108763702why is she outside the building
>>108763636>>108763689Let him cook
>Curious to see if Sogna artist is alive.>The artist is alive>Is a woman.Didn't expect that.
>>108763636tdrustedformerly trusted
>>108763667sorry, I meant the old ace step 1_3_5.>>108763350>>108763365tencent was involved in songbloom. https://github.com/tencent-ailab/SongBloomIt could be kind of nice. xmas gen:https://files.catbox.moe/pxu1ia.mp3The reference audio was iirc Mariah Carey.The lyrics are "problem: it basically was really unpredictable. Prompting was a wishlist.I wouldn't want to go back lol. BUT, as you can hear, they're not jokers.
>>108763782>The lyrics are "based on God Rest Ye Merry Gentlemen
>>108763767pretty style
>>108763530>>Sulfur can't do pussyIt's a base model, LoRA trainer will improve this, also Sulfur 2.5 waits for funding
warning! a diss track:https://files.catbox.moe/21bzys.mp3
>>108763893
>>108763906I didn't want anyone to take it personally lol
>>108763893fuck you nigger faggot
ayo I is gittin oppress
waiting the API schizo this level of detail, this gen is 1 year old btw.
>>108763971(thumbs up)
>>108763827
anyone else firing up the kino factory?
>>108763550>https://huggingface.co/circlestone-labs/Anima>405k downloads last month woaw
>>108763782They made a slightly better one actually, it was called https://github.com/tencent-ailab/SongGenerationIt still left so much to be desired and was basically ACEStep 1.0 tier. ACEStep had a 1.5 version which was much better than that, and then it was iterated on and 1.5 XL is even better than standard 1.5. We are very fortunate to have ACEStep. Local audio models were in a very sad and dire state beforehand.
>>108764079neat. Still getting updates.
>>108764079have you tried running your generations through a mastering model? i remember someone suggesting that in a past thread. never tried it
>>108764115>a mastering modelis there one that actually works? I have 5 tracks that needs cleaning
>>108764115All the generations I posted yesterday (Fate Gear, Zutomayo, Miku) are mastered actually. I do it like second nature before sharing gens, because Turbo just is slightly worse at capturing details. I'm the one who recommended it after finding out about Matchering 2 thru Discord. Some songs are better than others for mastering so I just rotate around an album and it usually only takes 1-2 tries. The base merge does not need as much mastering as Turbo anymore though, as in the initial output is not as noisy and the voice is crisp, but that only means it sounds way better when mastered so I go ahead and do it anyways.
Nofap day 10, honestly, keep remembering that motorcyclist's cleavage.
>>108764147>I have 5 tracks that needs cleaningMatchering 2https://github.com/sergree/matchering
>>108764170>Matchering 2I've tried it actually, just wasnt super happy with results. Perhaps I used bad reference song.
>>108764154can you post a before and after?
>>108764177I recommend trying Web Audio Mastering, but Matchering 2 with a good source can sound much better than that. There are some songs with poor mastering that lowers the volume a ton, finding a good one is just research, find a song that sounds clear. Googling for well mastered songs in genre you're targetting may help. Look thru playlists. Tip: Movie scores tend to be well mastered due to insane budgets. LLMs like Gemini can give ideas as well. You'd be surprised. I don't search anymore because I have good songs and albums that apply to everything I have (just downloading albums from a few target artists is enough).
>>108764014kino
I bet that motorcycle zoomer wants a gen x nofap boyfriend.
>>108764198nofap is not supposed to turn you gay. rebuke the devil
>>108764203>:(he said show my penis to drivers on the highway
>>108764225he is trying to use you as a grim reaper
>>108764237lol
>>108763721is this local? reminds me picrel claimed to be local too
>>108764186Turbo gens since I'm AFK, these are from weeks back (before I figured out the DiT only trick so gens might be more boring than what the Turbo model can do).https://desuarchive.org/g/thread/108694497/#108695746Top is mastered, bottom is pre-mastered.Pre-masterhttps://vocaroo.com/1a4VSBjqkuSXMasterhttps://vocaroo.com/1b0F41rAgXqRDepending on the gen, it doesn't get rid of every issue, but it can make it go from unacceptable on decent headphones to okay sounding.
>>108764192Do you use 32bit float, just normal settings?
>>108764274oh, i thought it would fix the metallic sound. sounds like some basic stereo manipulation. someone needs a model that actually rebuilds the full song to make it HD, kind of like asking image editing models to make something higher quality
>>10876428316 bit WAV files. Default settings on UVL, I use this desktop app>https://ultimatevocalremover.com/
>>108764300you can, it often works. i think it's easier on wan though.usually it's a matter of time and offloading to vram if yourt gpu's vram isn't so large
>>108764274what do you mean by 'mastered'? What are you using?
What do you guys actually use these generated images for? I liked the process to get image and video generation working on my system, but now I'm stumped, I really have no use for any of this. Just generating random images is a bit meh.
>>108764270Which nigga claimed that was local?(These are Nano Banana + editing in GIMP.)
>>108764398making kinos for my amusement
35
>>108764495it was in adtadt for localfuck you nigger
>>108764578adt is for anime
>>108764586for anime generated locallydo you see any mentioning of nai or other cloud shitkys
https://files.catbox.moe/x5z448.mp3The text:https://webstersdictionary1828.com/Dictionary/joyNeeded cfg scale to be 13.>>108764398similar to above. for memory stuff.
>>108764617fyi, if I had fed that into udio, at least at the time I left, it would have never worked. You had to really slave to get uneven verses working.
>>108764672ie udio, at least in the past, wants to talk instead of sing.
>>108764734Hmmm nyo
>>108764597Fuck you, /adt/ is for anime diffusion in general. Here is for local realism and western art
anyone else making ltx kinos?
>>108764781I am glitching out with Sulphur. What are your steps/cfg/distill LoRa strength?
>>108764781it can do 40 seconds?
>>108764781upload i wanna see
>>108764757nigger> do you see any mentioning of nai or other cloud shit (in the op)fuck you
>>108764787i am using the default settings on wan2gp. it seems to be 8 steps and i don't know what the rest of the settings are since it's hidden from the UI>>108764790it can go as far as you want. i'm just being meticulous with small extensions>>108764808i will once i reach the bridge after the chorus
>>108764734>>108764511
>>108763567I captioned the dataset late night and said fuck and started training. Just woke up.It's done training but preliminary test seem to show disappointing results. Maybe earlier steps are better or needs a different prompting meta than I am doing. (Cope I know but I need to test it more.)I honestly didn't expect to get it right first try, but still.I will make a training help blogpost later.
>>108764781i was trying sulphur, it still seems quite tricky compared to wan tho
>>108763550Do you know how to configure ComfyUI ZImage to work with the Hammerai website? I have the portable version for Windows. I have a problem with it not being able to find the models.
>>108765059the prompting for ltx seems to be very sequential. you have to write it out in the order in which things should be seen. if you mention that she has boots at the end of the prompt, the camera will aim down to look at her boots only at the end of the video
hello? retards?
>>108766065retard here how can i help
>>108766094where are you guys
>>108766065hello? is this thing on? am I all alone? is anyone there?
>>108766102>>108766065anima posters moved to anime generals
>>108766239they’re discussing loras and stuff in /hgg/
Give it a rest pal no one believes you
So here is an interesting thing about anima's @ keyword for styles.Anima understands on a fundamental level that "@" is associated with styles to the point that it will hallucinate watermarks and patreon links on occasion if you feed it a nonsensical style tag.(This was with @real photo and my lora disabled for testing comparisons.)You don't see anything similar in sdxl.When there is a watermark or link hallucinated, it is almost always gibberish and not strongly related to anything on your prompt.
>>108766300what was your captioning like, and how many steps was that?i was curious to see realism trained as an @, but that doesn't look promising.
i've received warnings for racism (what) in the past, but look at the attitude of the Chinese. they're now ignoring the users who made them famous. that's not a respectful attitude, is it?
wat is weeb labs?
After playing around with the Spark.Chroma 1024 model, my conclusions are as follows:1 - this seems like a sidegrade compared to 512 and a downgrade to preview2 - It is better at following some specific prompt details, like film grain3 - It is worse at replicating now-photographic styles (unable to do american comic books, worse at black and white manga for two examples)4 - It seems worse at generating faces on a crowd, something I didn't felt with 512 model5 - the preview model still is the best overall, making a good compromise and being better at non-photographic styles6 - Chroma is still the best for NSFW work, being the only one that is able to generate correct male genitalia (not just on the man that is penetrating, but also on males on a crowd or by themselves) and blood/gore (being able to render carcasses, body interiors, blood drips etc. much more coherently than other models)https://files.catbox.moe/5js4so.png
https://files.catbox.moe/c8bcl1.png
https://files.catbox.moe/1h72q7.png
https://files.catbox.moe/j94wjf.png
https://files.catbox.moe/ao3ew5.png
>>108766338I already posted a bit about the captioning last thread but here is an example caption:>@real photo. A young White woman in her early 20s with vibrant, wavy red hair leans back against the thick, gnarled roots of a large tree in an autumnal forest. She gazes thoughtfully upward and away from the camera, her face framed by her bright hair and accented by dark, plum-colored lipstick. She wears a black long-sleeved turtleneck sweater and white pleated trousers, with her arms crossed comfortably over her chest. Small, white flower-shaped earrings are visible on her ears, and her fair skin has a soft, rosy glow on her cheeks. The massive, grey tree roots cradle her body, while the ground behind her is covered in a thick layer of fallen orange and brown leaves. The background trees and the leaf-covered slope are softly blurred, creating a shallow depth of field that keeps the focus entirely on her.I had 5 epochs with 1300 images so 6500 steps. Fuck it might as well as turn this into a help me post. Batch size 2, 1024p, AdamW, LR 0.00003, cosine, WD 0.001, betas=0.9,0.99 96 rank 48 alpha, 0.05 dropout, sigmoid_scale 1.3 max_grad_norm 1.0Any ideas how to proceed? As embarrassing as it is, here is the same image with the lora. There are better images and worse ones.
comfy will pay you minimum wage to make workflowshttps://docs.google.com/forms/d/e/1FAIpQLSdCrLN2UBKjeqz30__wMjTAXoVqhypTTq1Gl08y2nvKaEf98A/viewform
>>108766637Posting a slightly better example to feel less shame.
https://files.catbox.moe/oak48j.png
>>108766637desu i just use the diffusion-pipe defaults
https://files.catbox.moe/qqx56k.png
https://files.catbox.moe/0kv33c.png
>>108766681I mean I don't think this deviates strongly from the settings in the rutkowski lora.
https://files.catbox.moe/rlv0j1.png
>spend a few hours carefully curating more data for my lora>train next version>its worse than the precious version
https://files.catbox.moe/38yj18.png>>108766782many such cases
>>1087666376500 steps doesn't seem like nearly enough steps with 1300 images. 1300 is kind of a huge amount of images for a lora what a single concept.Just try baking it for longer. If you have a pretty diverse data set you might wanna take it all the way to like 20000 steps, just make sure you're saving checkpoints often.
>>108766637
what the fuck is the obsession with making realistic loras for an anime model? makes no sense, it looks like shit
what the fuck is the obsession with making an anime model by using cosmos a realistic model? makes no sense!
>>108766786Are the tall dudes invisible to the bottom dudes?Also second dude from left has an arm overlapping the middle cape.
>>1087668146500 takes close to 8 hours on my GPU.I guess I can train for a day after taking so long to curate the dataset.But I wonder if I can pull off faster convergence without frying with higher LR?>>108766880Troll post but I want to use its character and NSFW knowledge for realism.
>i-it's a troll post if i get told the truth
>>108766880it's a small model, it has great prompt adherence, it's easy to train, nsfw out of the box, it has a lot of concepts trained into it that negate the need for specific character/style loras, and it's fun to fuck around.plus a lot of people are looking for something to replace legacy models, anima will be that model, so may as well get a jump on it now.
>>108766786i wish we had z-anima, the parameter count for anima is just too small to do scenes like this in it.
>>108766880The real question you should be asking is why hasn't anyone made a 3DPD porn model even though it has the most original and high quality data on the internet. It's been years and not even a hint of one being made. Very sad for pig enjoyers.
>>108766924preview 3. no sense in wasting compute until the model is done.
>>108766910>anima will be that modelalmost every lora for anima has a barren sample section, nobody is using the model and the people that use it are making the exact same slop they did with sdxl so creativity has peaked
>>108766907What model do you use when you want to see a photo of a fictional character taking multiple cocks into her asshole? >inb4 Chroma
>>108766971>taking multiple cockseven in your fantasies, you watch.
If a use a distilled model which uses 8 steps does it mean the image has reached its full convergence? If it’s so then why increasing steps it will still change? As an example using ZIT or Ernie around 10% of the image will still change between 8 and, let’s say, 20. It will also produce some kind of patchy textures, why? Using other models like CHROMA FLASH this will be even more pronounced, again why? Also all these models should in theory work at best when using ODE samplers and a simple noise scheduler, then why SDE samplers seem to work good?