Discussion and Development of Local Image, Video, and Music ModelsPrevious: >>109028009https://rentry.org/ldg-lazy-getting-started-guide>UIComfyUI: https://github.com/comfyanonymous/ComfyUISwarmUI: https://github.com/mcmonkeyprojects/SwarmUISDWebUI: https://rentry.org/ldg-lazy-getting-started-guide#the-stable-diffusion-web-ui-lineageWan2GP: https://github.com/deepbeepmeep/Wan2GP>Checkpoints, LoRAs, & Upscalershttps://civitai.comhttps://civitaiarchive.com/https://openmodeldb.info>Tuninghttps://github.com/spacepxl/demystifying-sd-finetuninghttps://github.com/ostris/ai-toolkithttps://github.com/Nerogar/OneTrainerhttps://github.com/tdrussell/diffusion-pipehttps://github.com/kohya-ss/sd-scriptshttps://github.com/kohya-ss/musubi-tuner>Zhttps://huggingface.co/Tongyi-MAI/Z-Image>Animahttps://huggingface.co/circlestone-labs/Animahttps://tagexplorer.github.io/https://animadex.net>Qwenhttps://huggingface.co/collections/Qwen/qwen-image>Kleinhttps://huggingface.co/collections/black-forest-labs/flux2>Wanhttps://github.com/Wan-Video/Wan2.2>LTX-2.3https://huggingface.co/collections/Lightricks/ltx-23>Chromahttps://huggingface.co/lodestones/Chroma1-Basehttps://rentry.org/mvu52t46>MiscLocal Model Meta: https://rentry.org/localmodelsmetaShare Metadata: https://catbox.moe | https://litterbox.catbox.moe/Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusionArchive: https://rentry.org/sdg-linkCollage: https://rentry.org/ldgcollage>Neighbors>>>/aco/csdg>>>/b/degen>>>/gif/vdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/trash/slop>>>/vt/vtai>>>/u/udg>Local Text>>>/g/lmg>Maintain Thread Qualityhttps://rentry.org/debohttps://rentry.org/animanon
gm saars
>>109034950Is it possible to fuse the power of data center models with local freedom? Perhaps ComfyCloud is the answer. It's clear that everyone has abandoned local computing, GPUs are intentionally stagnant with pitiful consumer amounts of VRAM. It's clear "chinese ram" is also a meme, and even if it wasn't they'd sell out to datacenters just like they sold local out to API. With local compute completely dead, a hybrid approach might be the solution.
first test pretty good, it doesn't take the source video background like WAN animate does
>
>>109035016the face looks very weird as she gets closer. it's like the focal length isn't changing
https://github.com/Comfy-Org/ComfyUI/pull/14373
Is kijai a member of comfyorg? oh of course not, because he actually develops for local models.maybe he should start kijaiUI instead of doing free labor for comfy's api adware.
>>109035016What is this testing
oddly high levels of seetheposting today for some reason not sure why desu
>>109035016FYI the workflow from the pull request defaults to 65 frames on the first segment and 81 on the second for some reason. If you change the first segment to be 81 as well, you get an extra second of video.In theory it's possible to dupe the extend section indefinitely to gen whatever length videos you want, but I'm too stupid to figure out how to do that.
>>109035038To make deepfakes for pedo socialite class.
>>109035034>Model releases>It's unusable dogshit on native comfy nodes>KJ nodes make it actually workA tale as old as time
idk, I posted this also in lmg, I'm not sure where to put music lolAce Step 1.5 XL SFThttps://files.catbox.moe/n5tow1.mp3
>>109035043>>109035027Link prev images frames and video frames offset. Then combine video up top.Note: 5 frames repeat is used as anchor during extensions according to KJ. So calculate accordingly.
>>109035027Hows it compare to wan 2.2 animate practically speaking?
>mfw Resource news06/11/2026>i1: A Simple and Fully Open Recipe for Strong Text-to-Image Modelshttps://zlab-princeton.github.io/i1>AnchorEdit: Maintaining Temporal Consistency in Multi-turn Image Editing via Causal Memoryhttps://github.com/xuhang07/AnchorEdit>Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Modelshttps://github.com/elmma/mllm-reroute>ComfyUI-BerniniStudiohttps://github.com/CCpt5/ComfyUI-BerniniStudio>Ideoprompt: plain English to Ideogram 4 structured JSON prompt https://github.com/cocktailpeanut/ideoprompt>Orion4D FXMax for ComfyUIhttps://github.com/orion4d/Orion4D_FXMax>JoyAI-Echo — GGUF (for low-VRAM ComfyUI)https://huggingface.co/realrebelai/JoyAI-Echo_GGUF06/10/2026>EvoQuality: Self-Evolving VLM for Image Quality Assessmenthttps://huggingface.co/ByteDance/EvoQuality>ComfyTV: Turn ComfyUI into a TapNow / LibTV-style canvas apphttps://github.com/jtydhr88/ComfyTV>PathRelax: Parallel-Path Relaxed Speculative Jacobi Decoding for Accelerating Auto-Regressive Text-to-Image Generationhttps://github.com/Haodong-Lei-Ray/PathSpec>SSR-Merge: Subspace Signal Routing for Training-Free LoRA Merging in Diffusion Modelshttps://github.com/nagara214/SSR-Merge>SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioninghttps://teal024.github.io/SCAIL-2>IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoderhttps://github.com/Row11n/IDEAL>Image to Prompt: Web app to turn an image into Ideogram 4 JSON prompthttps://github.com/cocktailpeanut/image-to-prompt>Simple Diffusion XS (sdxs-2b alpha version)https://huggingface.co/AiArtLab/sdxs-2b>Bernini-R: Repackaged model files for ComfyUIhttps://huggingface.co/Comfy-Org/Bernini-R06/09/2026>SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioninghttps://teal024.github.io/SCAIL-2>BLM-SGANhttps://github.com/haidy-maher/BLM-SGAN-Text-to-Image-Generation
>mfw Research news06/11/2026>A Comprehensive Ecosystem for Open-Domain Customized Video Generationhttps://arxiv.org/abs/2606.11783>ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generationhttps://arxiv.org/abs/2606.11670>SpecLoR: Spectral Lookahead Rectification for Motion-Coherent Text-to-Video Generationhttps://arxiv.org/abs/2606.11969>Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Groundinghttps://arxiv.org/abs/2606.11838>FitVTON: Fit-aware Virtual Try-On via Body-Garment Size Controlhttps://zenoning.github.io/FitVTON>ISAP-3D: Identity-Slot Aligned Part-Aware 3D Generationhttps://arxiv.org/abs/2606.12099>VOID: Defeating Unauthorized Mimicry in Latent Diffusion Modelshttps://arxiv.org/abs/2606.12263>MultiToP: Learning to Patch Visual Tokens to Mitigate Hallucinations in Video Large Multimodal Modelshttps://arxiv.org/abs/2606.11792>A Scalable PyTorch Abstraction for Multi-GPU Gaussian Splattinghttps://arxiv.org/abs/2606.11390>InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoninghttps://arxiv.org/abs/2606.12195
>>109035100Wan22 Animate couldn't do this Lapwing video test. The video cut to another shot abruptly multiple times. The open pose pre process cannot guess half a body nor can it pre process eyes or emotions. So overall, Wan21_SCAIL-2 seems more capable out of the box.
>>109035139>.gif (3.97 MB, 320x222)wat
Wonder how bernini stands up against scail 2. Scail 2 would allow using wan 2.1 loras? Heard bernini is somewhat wan 2.2 related so those loras may work?
>>109035118Does it need a sam 3 node or is it sort of automatic/baked in?
>>109035155It performs better with SAM3 it seems. Without it, the character is hallucinated or prompted in.
>>109035147i have to keep jannies on their toes
>>109032022>>109034612>nano shillsThe fact you still to this day have to put >"DONT MAKE AN IMAGE GIVE ME A PROMPT" At the end of every gemini input, is truly a sign of how shot AI devs and engineers are.
>>109035168why are there nano banana shills? I have nano banana and grok imagine, just the non-decoy tier. But they're just not local.
nice moves
>>109035235oops wrong one
Do negative prompts work on ideogram?
>>109035118Seek the elden ring, become the elden lord.Note: The reference aspect ratio should match the video, that seems to improve accuracy.
cozy breasd
shameless repost for myself>>109034840>>109034927ok thanksi am making a significant amount of progressmy main gripe rn is that although my base img is quite crisp, i cant keep that detailwhen i look at clips posted on civitai, so many of them are so crisp and clear with the motion while keeping the detail i feel like i am missing something
>>109034986
>>109035332Depending on what you use. My experience is with Wan2.2. Increasing steps 4 high/4 low improve clarify. Increasing resolution also does that, but you need high VRAM. If you meant smoothness of motion, then you need to increase FPS or interpolate nodes to generate extra frames.
>>109035355im using ltx eros at the moment and im comparing my stuff to the stuff i see on the related eros civitai pages, so i know its capable of it i just wish i could pull workflows from peoples videos on civit like you can of images
spare some ram for little old me?
SCAIL is giving me kino, but it seems like it's a 50/50 whether it keeps the background from the original video or hallucinates an entirely new one
>>109035438>>109035451Videochads are eating good.
>>109035451neck twist
>>109035451try that masking thing anon posted in last thread
it can mimic facial expression (and tongue) way better
>>109035494do this but with bill gates
way better hair rendering, but worst boobs jiggle
wan animate
>>109035544what's the original from?
Previously
>>109035486I don't think it's a masking issue, since SCAIL automatically masks both the video and input imageLike with this, the first time I tried it I gave it a dogshit low res image and it just made up a new person, but it kept the background of the original video. I try again with a high res image and the background gets deleted.(ignore the last ~second of weirdness, that's just because I'm bad at math)
>>109035561We're so back.wan2.1_14B_SCAIL_2_fp8_scaled.safetensorsUsage:60/128 GB Sys Ram13.6/31.5GB VRAM
>>109035571do this but with jeff bezos
>>109035572That's really good. Workflow?
>>109035524
>>109035494what about two people interacting?
>>109035596Need to git pull this request.https://github.com/Comfy-Org/ComfyUI/pull/14373
>>109035611Sweet thanks. KJgod delivers again
>>109035451how u get the source video background into the output video?
>>109035685From my usage, it seems white background for the reference is ignored and you get the video's background. Black/gray bg generate new background
>>109035572can it also reasonably do multiple character/object references or just one?
So will these ideogram bbox be possible to be implemented into klein workflows?
Scail looks really good. I was skeptical but it looks genuinely good
>>109035702Nta but in my experience with wanimate it’s one character per pass. You can do one character in the scene then a second one on a second pass
Why do you read the Comfy pull requests like they are the news or some Reddit post tier content? How empty is your life? Pathetic...
>>109035708No but Klein is an edit model so you can just mask areas and tell it to put stuff there
How to know which epoch is better? I can't make up my mind by just the outputsWhat prompts should I use to check?
>>109035600>>109035725It can identify multiple people; the theoretical limit is 64. The quality of their interactions will be dependent on how much that specific interaction exists in the training data.>>109035685There are two modes, one where the video animates the input image, and one where the character(s) in the input image replace those in the video. For some reason, it's a coin toss whether the latter keeps the video background or hallucinates a new one. Consider>>109035572where it's partly kept it, and >>109035571where it's gone completely.
>>109035734Satan in positive prompt to check which epoch has the most protective effects.
>>109035734I took a half year break because I ran into this as wellthere is no way outjust accept that you are now FUCKED
>>109035745>It can identify multiple people; the theoretical limit is 64. The quality of their interactions will be dependent on how much that specific interaction exists in the training data.nice. maybe my storage space won't suffer as much then. guess I'll see when I can test both bernini and scail-2.
>>109035734>I can't make up my mindI suspect this problem arises frequently in other areas of your life in addition to your epoch predicament.
this seems to do the trick. replace character in source video
>>109035803okay how do you know?
>>109035734vibecode a blind test. If you can't tell why do you care.
>>109035823the lack of jiggling leaves a lot to be desired but the consistency maintained when it changes angle is impressive
>>109035014>Is it possible to fuse the power of data center models with local freedom?what kind of retarded question is this? local models are that fusion, it's a miracle that they even exist in the first place>GPUs are intentionally stagnant with pitiful consumer amounts of VRAM6 years ago, large amounts of VRAM were almost unilaterally considered a price-hiking strategy for almost any consumer application except semi-professional video production, and VRAM upselling over the years has caused a continuous decline in software quality and resulted in catastrophically worse performance of operating systems, basically all software, the entire video game industry, because what was at the time uselessly high memory bandwidth was relied on to "forgive" bad programmingwe are now still probably well within the early adoption phase of local AI and 90% of consumers continue to not give a shit about it at all, which is apparently fertile soil for you to crumple up reality into this little cum napkin you've created after watching a gamers nexus video >a hybrid approach might be the solutiona solution to fucking what? if it's not local, then it's not fucking local, there is no "hybrid" between local and remote, because that would be fucking remote
>>109035734Diverse prompts help, anatomic positions besides 1girl, standing help, text helps, different styles help.Let's say you are training a lora from the photos of a 1girl. I would use something like this:1) Just 1girl, no elaborate conditioning, see how it swings when given reins2) 1girl, lying on sand, from above, tongue out, black bikini3) 1girl, jumping, on air, flying kick4) 1girl, closed eyes, smile, holding a sign that says: "IS MY LORA FRIED NOW?"5) 1girl, painting, Renaissance painting, outdoors, forest
it also replaced the old man in the background. how to retain the old man?
>>109035896Use white background on reference character..
>>109035760After a bunch of testing, the segmentation tool can identify and isolate objects, but SCAIL can't replace them, it can only replace people.It can handle replacing multiple people in one pass, but it seems to struggle a bit as you add more and more people
how do you organize all your gens?
>>109035974badly
>>109035974i just do %datewhen i feel like i have too much bloat, i go back and delete all the gacha rolls, usually just keeping 1-2 of each kind of gen i didi have a seperate area of folders where i specifically copy outputs in that i want to keep for reference reasons
>>109036007ok this is getting pretty good, can you put this asian bitch in very out of place scenes, like have her replace a girl in a game of thrones clip
>>109036007Would be good “imagine if we imported these ppl instead” meme gifs
>>109036017if you can supply the clip
7 is a primeas are 9, 11 and 13
>>109035945Tried sequential replacement, still lost some data on the first char. Still. Wan2.1 has high potential. I'm done testing for the night.
>>109036046I think I need to segment video number two to feed to video 3 to clamp down the masked area. Gonna try that later.
>>109036046curious, what's the missing data?
>>109036059Her yellow dress is gone on second pass.
>>109035945 >>109036046hey, that's not bad at all! thank you very much for trying this
>>109036069I see, thx
>>109036150damn that looks good
>>109035572> wan2.1_14B_SCAIL_2_fp8_scaled.safetensorsOnly one checkpoint? No high, no low?
>>109036150fix moviesfix tv shows
>>109036215Are you just using the workflow from the pull request?I can't get it to reliably keep the background from the original video, and I've confirmed it's not the 'replacement mode' toggle, if I turn that off it keeps the background from the image.https://files.catbox.moe/zmqr82.png
>>109036229no idea, i just works when I switch replacement mode = true
>>109036229I can’t believe you would edit Ronald McDonald out of video instead of editing him in
>>109036215kino
Is there a turbo lora for Z Image where the outputs aren't completely slopped
maybe I'm just a schizo but is the reason civitai hasn't implemented an ideogram4 filter category, like they do for other models, because of the license and they might, very soon, delete the loras that have been posted?
>>109036249
>>109036339i saw this in the playplace
>>109036291>Turbo lora??? Just use the turbo model.
is it me or is every wan svi workflow trash?Can't get nearly the quality I get with regular wan22 i2v
questionis the rtx upscale node in a bunch of the video workflows actually totally independant, like could i make a new workflow with just that and plug in any video and have itupscale it?
>>109036558yes
>>109036558yes plus use the rtx node from this node pack: deno-custom-nodes
>>109036589why?
>>109036530> gosling and stone
I've been using seedvr2 upscaler for images and videos for months now. Is RTX upscaling better?
>>109036600or don't
>>109036608no. it's faster. i don't upscale videos.
american hours: brown sperging and imageseuropean hours: wealthy posters and video gens
>>109036673im from australiais it peak posting quality right now?
>>109036673I don't think wealthy europeans are on 4chan at this hour. They are working.
>>109036694wealthy people don't work
I'm learning over here.
>>109036706most of them do. and hard.
>>109036709>tz txwhat did she mean by this?
>>109036731was meant to be "i can ltx now"text is for tomorrow, tonight, its more ass
>>109036865Text is not one of LTX's strong points.
>>1090349865
>>109036865oooh yeah
>>109036879i remember making a video of a person wearing a shirt that has words on it, and it seemed to work correctly when i frame injected a flat image of the shirt as well as wrote in the prompt what the shirt said. i should try more text experiments
>>109036985Yeah but I mean SDXL can do text if you want it to. But the competence is different.
>>109036608hey bruh, can you do an anima lora version? ideogram is slow like cdprojekt
things are looking exciting frensWhat are my chances of running ID4 and SCAIL on a 6 GB GPU?
>>109035852Sauce for original video?
Can you make SCAIL work off of vague animations? Like if I make some basic 3D model do janky animations, could SCAIL translate it into more natural movements?
>>109037127I've not messed around with anima besides one of the early preview versions. How well does it take to 3DCG?
>>109037141
>>109037132doubt it. you probably need at least 12 gigs.
scrolling through civitai for the past hour and can't find a single artist style that I like
>>109037177grimwish I wasn't poor
recommended tool for designing json prompts and bounding with ideogram4? doesn't have to be integrated with workflow, standalone/web is fine too
>>109037185there are people shilling them on reddit. just go pick one
>>109037185Since you're not a baby dick vramlet just fucking use a local llm
https://civitai.com/models/1662740/lenovo-ultrareal?modelVersionId=3025161
>>109037238https://civitai.red/models/2688234/realism-engine-ideogram-4
>>109037298shouldn't it be chocolate milk?
>>109037307It's fresh out the tap.
not gonna lie, being able to oneshot comic pages locally (even if the styles are horrendous) is a gamechanger
>>109037383>even if the styles are horrendousThe fix for this is as simple as finding a hentai artist you like and throwing the pages into the training script.
>>109037132I think nf4 Ideogram might work.I wouldn't bother video gen on that GPU though.
>>109037383What model are you using for this?