Discussion and Development of Local Image and Video ModelsPrevious: >>108563476https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUIre/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneoSD.Next: https://github.com/vladmandic/sdnextWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, Upscalers, & Workflowshttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.infohttps://openart.ai/workflows>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/musubi-tunerhttps://github.com/tdrussell/diffusion-pipe>Zhttps://huggingface.co/Tongyi-MAI/Z-Imagehttps://huggingface.co/Tongyi-MAI/Z-Image-Turbo>Animahttps://huggingface.co/circlestone-labs/Animahttps://tagexplorer.github.io/>Qwenhttps://huggingface.co/collections/Qwen/qwen-image>Kleinhttps://huggingface.co/collections/black-forest-labs/flux2>LTX-2https://huggingface.co/Lightricks/LTX-2>Wanhttps://github.com/Wan-Video/Wan2.2>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>Illustrioushttps://rentry.org/comfyui_guide_1girl>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-oneTxt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkCollage: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/r/realistic+parody>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debohttps://rentry.org/animanon
Blessed thread of frenship
>>108569547>>108566619>>108569190>>108568213Don't you feel like a piece shit posting anime in a general where everybody pretends and nobody cares about it? Don’t you have a bit of remorse for being part of this big farce?
>>108569578>nobodydon't talk on behalf of everyone, freak
>mfw Resource news04/09/2026>MAR-GRPO: Stabilized GRPO for AR-diffusion Hybrid Image Generationhttps://github.com/AMAP-ML/mar-grpo>HybridScorer: Score, sort, and cut large sets down fast with GPU-accelerated AI reviewhttps://github.com/vangel76/HybridScorer04/08/2026>OrthoFuse: Training-free Riemannian Fusion of Orthogonal Style-Concept Adapters for Diffusion Modelshttps://github.com/ControlGenAI/OrthoFuse>MIRAGE: Benchmarking and Aligning Multi-Instance Image Editinghttps://github.com/ZiqianLiu666/MIRAGE>Few-Shot Semantic Segmentation Meets SAM3https://github.com/WongKinYiu/FSS-SAM3>PoM: A Linear-Time Replacement for Attention with the Polynomial Mixerhttps://github.com/davidpicard/pom>RS Nodes for ComfyUI: Cmprehensive custom node pack focused on LTXV audio-video generation, LoRA training and post-processinghttps://github.com/richservo/rs-nodes>FLUX.2 Small Decoder: Distilled VAE decoder for faster decoding and lower VRAM usagehttps://huggingface.co/black-forest-labs/FLUX.2-small-decoder>Nvidia snaps up AI chip packaging capacity as TSMC expands in U.S.https://www.cnbc.com/2026/04/08/tsmc-nvidia-advanced-packaging-intel.html04/07/2026>Anima preview3 releasedhttps://huggingface.co/circlestone-labs/Anima#preview3>FrameFusion Image Interpolation: Compact image interpolation model for generating in-between frameshttps://github.com/BurguerJohn/FrameFusion-Model>An Inside Look at OpenAI and Anthropic’s Finances Ahead of Their IPOshttps://www.wsj.com/tech/ai/openai-anthropic-ipo-finances-04b3cfb9>PrismML debuts energy-sipping 1-bit LLM in bid to free AI from the cloudhttps://www.theregister.com/2026/04/04/prismml_1bit_llm>ComfyUI Hires Fix Ultra - All in Onehttps://github.com/ThetaCursed/ComfyUI-HiresFix-Ultra-AllInOne>ATSS: Detecting AI-Generated Videos via Anomalous Temporal Self-Similarityhttps://github.com/hwang-cs-ime/ATSS
>mfw Research news04/08/2026>GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videoshttps://onethousandwu.com/GenLCA-Page>Grounded Forcing: Bridging Time-Independent Semantics and Proximal Dynamics in Autoregressive Video Synthesishttps://arxiv.org/abs/2604.06939>Evolution of Video Generative Foundationshttps://arxiv.org/abs/2604.06339>VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesishttps://arxiv.org/abs/2604.07210>Controllable Generative Video Compressionhttps://arxiv.org/abs/2604.06655>Not all tokens contribute equally to diffusion learninghttps://arxiv.org/abs/2604.07026>FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matchinghttps://arxiv.org/abs/2604.06757>Holistic Optimal Label Selection for Robust Prompt Learning under Partial Labelshttps://arxiv.org/abs/2604.06614>Towards Robust Content Watermarking Against Removal and Forgery Attackshttps://arxiv.org/abs/2604.06662>PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editinghttps://arxiv.org/abs/2604.07230>Noise Constrained Diffusion (NC-Diffusion) Framework for High Fidelity Image Compressionhttps://arxiv.org/abs/2604.06568>RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Detailshttps://limuloo.github.io/RefineAnything>Visual prompting reimagined: The power of the Activation Promptshttps://arxiv.org/abs/2604.06440>MoRight: Motion Control Done Righthttps://research.nvidia.com/labs/sil/projects/moright>Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLMhttps://arxiv.org/abs/2604.06832>DesigNet: Learning to Draw Vector Graphics as Designers Dohttps://arxiv.org/abs/2604.06494>FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scalinghttps://arxiv.org/abs/2604.06916>When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don'thttps://arxiv.org/abs/2604.06422
MYTH: api models are censoredFACT: api models are less censored than local models and are in fact trained on NSFW imageryMYTH: api models are too expensiveFACT: it's actually quite cheap to use API through ComfyUI API Nodes. the price for api has went down in comparison to the price of hardwareMYTH: api nodes collect your data and are unsafe to useFACT: api is safer than local because nothing is stored on your hard drive. with local models, you need to download hundreds of loras and custom nodes, any of which could be infectedMYTH: an api can pull the plug at any time, why use something like that?FACT: everything you generate can be saved to your desktop so nothing is lostMYTH: it's impossible to train a custom style of character with api, loras make local way betterFACT: api can learn any style or character with a single image reference, which is much faster and smarter than lorasMYTH: if i buy api credits and don't like the model, that's money wastedFACT: comfyUI's API nodes credit system allows you to prompt hundreds of cutting-edge api models. the credits share between models so you aren't locked in to any one ecosystemMYTH: api users are poor and from third world countriesFACT: the top hollywood productions and anime studios all use api models. api is the weapon of choice for everyone world-wideMYTH: discussion of api models is off-topicFACT: api models are part of the comfyui experience and are relevant to this thread. combining api models with local workflows is still local
>>108569589>>108569593>>108569597fuck off faggot
>>108569597MYTH: you are not a cuntFACT:
>>108569597i know it's just a shitpost but >api models are less censored than local modelsalways gives a good chuckle
>>108569680you can do AI porn with grok, and the quality is miles ahead what local can dohttps://www.reddit.com/r/Grok_Porn/
>>108569709maybe for low-tier gooners but i have pristine taste anon
>>108569715fair enough, but I don't like where this is going, it's obvious that civitai is trying to separate themselves from NSFW, at some point they'll completly remove the porn loras, the writing is on the wallhttps://civitai.com/articles/28369
>>108569715he says, while using wan 2.2
>>108569709>you can do AI porn with grokyou could make ai porn with grok until jeets ruined it.
>>108569734>until jeets ruined it.many such cases...
>>108569724api-non-frens are so far behind they don't even realize itacting like someone playing minecraft with 2k*2k textures and bragging about graphics
https://youtu.be/i_S615aKLfI
>>108569709>slow motion>slopped to hell and backLol, that is just Wan tier slop. If they had anything close to Seedream 2 but uncensored there's no way they would allow NSFW with it.
>>108569762really nigga?
>>108569709you can tell it's better than wan because it doesnt turn into airbrushed vaseline plastic after the first frame kek
>>108569773>If they had anything close to Seedream 2 but uncensored there's no way they would allow NSFW with it.dude, I don't think the world is ready for the day we'll get a local model as good as Seedance 2.0... it's gonna be great
>>108569778It starts out as plastic though.
>>108569784by the time that happens (2050), the rest of us will be living in a full neurolinked metaverse world with API nodes, but you can keep jerking it to outdated videos
>>108569784>it's gonna be greatDo you seriously we're gonna give you something this good gweilo?
>>108569778And I prompted for the vaseline.
>>108569798it's 2026, have you not noticed the api pattern yet? >hey here is our new model, look at how great it is! and then 3 weeks later they cripple it and hope most people won't notice(they won't because most of their user base is brown) and then sit back and count money while people burn credits trying to gen the same slop they genned on day one.
>>108569825>and then 3 weeks later they cripple itSeedance didn't wait 3 weeks before crippling it, they crippled it before they deployed their API to the rest of the world lmao, at least Sora had the decency to be cool to play around with at the very begining, I know the bar is low as fuck but it is what it is
wait, all these api outputs are crippled? yet they're still better than local?? oh nononono api has been holding back this whole time, just imagine how far ahead they REALLY are. it's so over
give it a rest lilbro
>>108569845I don't get this meme. No one is saying local is ahead of cloud right now
>>108569845>all these api outputs are crippled>they're still better than locagrim, even Mike Tyson with one leg could destroy me, so yeah, a crippled API service is still better than what local shit is producing (and I hope local will step up its game one day, and no, finetuning SDXL for the 14th billionth time won't do it)
>still crying about SDXL 3 years later
>>108569723Why are civittards such entitled little shits? If Visa is cutting off credit card payments, your business is done. You go under, you cease to exist. WTF is Civit supposed to do against that? I'm not defending any of the other bullshit about the platform but for this war with payment processors it seems like they found the least bad option.
Local falling behind means there is no reason to waste money on the current overpriced hardware shortage. if Nvidia releases the 6000 series you'll have no reason to buy it because even if the compute power per dollar was insane, there are no good models to fully take advantage of it anyway.API cucking local and withholding even outdated video models like wan 2.5 is saving you money. do you know how much money you'd be wasting on this hobby if local had all the good models to choose from? do you know how much debt you'd go into if you could run SORA locally? these companies are saving you from yourself by not making these models open source, and doing the right thing by destroying them instead. You're welcome.
>>108569914we're not angry at civitai because they got cucked by visa, we know they can't do anything against (((them))), what we don't like is the gaslighting, they're not honest at all about what's really goin on, people just don't like being lied to, shocker I know
>>108569578Based, until tdrusell stop ignoring our anime threads, we will continue to protest! >:^(
>>108569723that poster's a whiny bitch
>>108569916unironically this. models like wan 2.5, seedance 2, seedream etc don't fit on local hardware, and quantcoping is just sad. anima is 2b parameters yet it's slower than sdxl which is bigger. and these api models are easily 16b+ minimum, with video ones easily reaching 100b. cumfart cried and threw a tantrum over hunyuan releasing a model too big for localpoors to run, so now all of china realized that local doesn't want these models anyway because they're too big. comfyorg unironically saved local from having to buy H200s
is the fud posting going as planned
>>108569974BASED! API nodes saved local from debt. Plus we can still use these models in ComfyUI anyway through the Partner Nodes program
>>108569974>cumfart cried and threw a tantrum over hunyuan releasing a model too big for localpoors to run, so now all of china realized that local doesn't want these models anyway because they're too big.based comfy, no one will care if they can't even run it in the first place
>>108569989This. SDXL hasn't even been fully explored, we're still discovering new ways to use 'old' tools. Keep that bloated useless WAN crap on the API.
>>108569974>seedance 2i forgot the global release was today. any API chads gen some kino?
anima won. Local won. The gay cuck by the name of ani lost
So... are we gonna get some images with these API-glazing shitposts, or is this guy a fucking poor-ass promptlet?Thrill me with your 10kW gens, you faggot.
>>108570015Tried to gen some cute anime 1girl farts and got filtered. It's pretty useless.
>>108570015>any API chads gen some kino?they're gonna kill the golden goose by censoring it like that, what's the point of making such an incredible model if you don't allow people to make fun things with it? I will never understand this
Why tdrusell ignores:/hdg//udg//edg//adt//vtai/The pokemon one/hgg/The /d/ generals where they explore extreme fetishes and tagsWhy he invest his time here?
https://civitai.com/models/2383017/anima-cat-tower>massive changes to Anima's default style (albeit, slopped to high hell)>improvements to anatomy>improvements to consistency>same or better character knowledgeI thought Anima was untrainable and forgot all its base knowledge if you so much as sneezed on the weights
>greatest SOTA API model of all time releases to the deafening sound of crickets
>>108570040SAAAAR WHY HE IGNORE IT!@!!!!!!!!!!!
>>108570040jealous that we got his attention but you don't?
>>108570045Anon is going to reply to this calling cattower slop, which it is, but it still proves that training more than simple LoRAs works well. It was always a farce pushed by retards using SDXL hyperparams.
>>108570040Comfy will never love you
>>>108570040And >>>/jp/2huai?
>>108570015>any API chads gen some kino?so that anon only reposts stuff from twitter and reddit unfortunately
>>108570050Because not only is it censored but it's fucking expensive. Every API service locks it behind one of their highest sub tiers so no one but Youtube grifters are using it.
>>108570050too censored to be useful, people tasted Sora 2 at its best, hard to go back to something more cucked
>>108569606
Acestep.cpp is insane. It does not consume all my VRAM all at once, only fills it up when I run it, so I can run comfyUI in conjunction with it. Plus, it's ultra fast. Unlike every other iteration of ACEStep UIs, it also allows seamless switching between XL Turbo and XL SFT.XL Turbo 80s Jap groove genhttps://vocaroo.com/1hpzg5IVZxPeThe prompt is everything, it makes a huge difference in output quality, so like image gen it makes sense to try different ways and styles to prompt same thing, and remove tokens if something sounds off.
>>108570280but can i run it with 4gigs vrams?
>>108570280what does a music prompt even look like? >80's style jap groove, 120bpm, "lyrics"i've never even looked at music genning, is it all prompt or is it more like a DAW with a prompt?
>>108570316>what does a music prompt even look like?Depends completely on the model and what kind of language it was trained on, just like image models. Udio worked extremely well with rateyourmusic tags because that's what it was trained on (until it started giving you fucking moderation errors every fucking time if you copy pasted the tags from an album you like). https://ace-step.github.io/ace-step-v1.5.github.io/#XLDemosJudging by their example prompts it sounds like it was trained on natural language, but I intend to test RYM tags just in case. >>108570280>Acestep.cpp is insane.Works better than ComfyUI?
>>108570023just imagine 10kW jennies
>>108570289>but can i run it with 4gigs vrams?You should be able to, Q4 is below 4GB in size (and as long as the total GB is less than your VRAM it should all fit).https://www.serveurperso.com/temp/acestep.cpp-win64/models/>>108570316There's two separate prompts, a caption and a lyric portion. The LM turns them into codes that the model understands, which then outputs the codes for the song and translated to either mp3/FLAC/WAV. In this case for the caption I use>A groovy 80s synth-pop track featuring sultry female vocals, blending English and Japanese lyrics with flirtatious call-and-response delivery. The timbre pulses with a funky slapped bassline, shimmering arpeggiated synths, gated reverb snare drums, and electric piano stabs. The emotion is playful liberation, infectious joy, and cheeky rebellion. Human sounds include syncopated finger snaps, ecstatic "Ha!" shouts from both vocalists, and layered harmonies during the chorus.I use LLMs to enhance the caption (could be done right thru acestep cpp itself, and it can also be done with Grok/Gemini).Like with API, you can technically just lazy prompt it straight thru the UI with the built in prompt enhancer, though I like flexibility outside of that.Lyrics were https://files.catbox.moe/8xof5r.txtThey can be provided in a variety of ways, but I always adhere to ACEStep's instructions for them. Like image gen, there's things that can be modified like BPM, duration, keyscale, which adjust speed and style of the song, as well as CFG which adjusts prompt adherence and creativity between gens.
>>108570356>Works better than ComfyUI?I don't think the Comfy ACEStep implementation has ever been without issues, dev on this seems to have completely halted. It has more features which I will test soon, two separate cover modes with cover-nosfq apparently being highest quality, and back when I used the first ACEStep 1.5 on Comfy, it was quite slow when a generation had some kind of change to the caption, so I think this is even better.
ltx 2.3 distilled is pretty fun (and fast)https://files.catbox.moe/i2t7bx.mp4
>>108570201can you make her look real tho
>>108570547with gpt image 2 launching soon, yes
https://civitai.com/models/1277670/janku-trained-chenkin-and-noobai-rouwei-illustrious-xl?modelVersionId=2786084I still think illustrious is best for animu more or less. this one has the regular illustrious style but also the deeper colors of base noobAI.
>>108570672
>>108570445>I don't think the Comfy ACEStep implementation has ever been without issues,I'm trying it right now. You weren't kidding, this shit is jank. For some reason the "thinking" step is using my CPU instead of GPU so it's slow as FUCK. Thankfully, it seems that you can skip that step. But in order to do so I have to use a different set of nodes. This shit is weird and jank and confusing and now I'm considering trying the .cpp setup like you said.
>>108569916Local is still thriving, just not on video yet. This is obviously due to video being most prohibitive to train in a style that ClosedAI and Bytedance have done, but one can hope some AI lab makes a breakthrough with so many (including BFL) thrown at the problem.
>>108570672>>108570678As someone who uses only base Noob models and their derivatives, I can assure you that many use the Noob name for marketing without understanding what the model do when merging. Also, most 4chan gens aren't from base Noob but from the WAI/Janku branch. Few people know how to prompt these models properly, and the results you're imagining likely aren't from Noob.
Wow, gothic metal music now sounds absolutely insane out of the boxhttps://vocaroo.com/1Q4Llaeb3gi2>>108570733Yep, absolutely is for some reason. The .cpp is counter-intuitively (due to using ggml) the fastest, most lightweight and cleanest version of ACEStep, because no python UI exists for ACEStep that is good. Plus .cpp is compatible with every feature plus more, and good attempt would be to just port the .cpp straight into ComfyUI as a custom node.
>>108570817I had a previous version saved, downloaded the latest version, seems decent.but I also have base noob 1.0 cause it's good to have something without any merges or whatever.
>>108570847And this is not surprising, seems like every python UI that's not Comfy for ACEStep is vibecoded, including the actual official UI, and actual devs like lllyasviel or Auto1111 are not available to work on proper UIs. As for Comfy, I'm guessing it's just not too compatible with the current architecture.
>>108570672>>108570678>>108570857I judge a model based on how well it can do toilet sitting+undies down. you'd be surprised at how hard it is for many models to get right.
>>108570857>>108570875yeah, but in general illustrious/noob based models do great, weve come a long way since pony which needed tons of tinkering to get okay anatomy.
>still no happysamefacehorseit's over
forbidden technique
>>108570847>https://vocaroo.com/1Q4Llaeb3gi2Sounds like it keeps changing its mind whether a woman or a man is singing at the part where it gets loud, kek. Also, the quiet parts remind me of Let Us Cling Together by Queen.
>>108570918trust the plan saaaaris 48gb seedance level local tomorrow in comfyui will be optimized FAST
>>108569773Crazy part is that seedance 2.0 in the testing phase had full on nudity with no issues