[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now closed. Thanks to all who applied!


[Advertise on 4chan]


Discussion and Development of Local Image, Video, and Music Models

Previous: >>109028009

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
SDWebUI: https://rentry.org/ldg-lazy-getting-started-guide#the-stable-diffusion-web-ui-lineage
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, & Upscalers
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
https://animadex.net

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>Wan
https://github.com/Wan-Video/Wan2.2

>LTX-2.3
https://huggingface.co/collections/Lightricks/ltx-23

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
gm saars
>>
>>109034950
Is it possible to fuse the power of data center models with local freedom? Perhaps ComfyCloud is the answer. It's clear that everyone has abandoned local computing, GPUs are intentionally stagnant with pitiful consumer amounts of VRAM. It's clear "chinese ram" is also a meme, and even if it wasn't they'd sell out to datacenters just like they sold local out to API. With local compute completely dead, a hybrid approach might be the solution.
>>
File: Wan21_SCAIL2_00001.mp4 (2.87 MB, 1024x896)
2.87 MB
2.87 MB MP4
first test
pretty good, it doesn't take the source video background like WAN animate does
>>
>
>>
>>109035016
the face looks very weird as she gets closer. it's like the focal length isn't changing
>>
File: Wan21_SCAIL2_00012.mp4 (3.75 MB, 1248x352)
3.75 MB
3.75 MB MP4
https://github.com/Comfy-Org/ComfyUI/pull/14373
>>
Is kijai a member of comfyorg? oh of course not, because he actually develops for local models.
maybe he should start kijaiUI instead of doing free labor for comfy's api adware.
>>
>>109035016
What is this testing
>>
oddly high levels of seetheposting today for some reason not sure why desu
>>
>>109035016
FYI the workflow from the pull request defaults to 65 frames on the first segment and 81 on the second for some reason. If you change the first segment to be 81 as well, you get an extra second of video.
In theory it's possible to dupe the extend section indefinitely to gen whatever length videos you want, but I'm too stupid to figure out how to do that.
>>
>>109035038
To make deepfakes for pedo socialite class.
>>
>>109035034
>Model releases
>It's unusable dogshit on native comfy nodes
>KJ nodes make it actually work
A tale as old as time
>>
idk, I posted this also in lmg, I'm not sure where to put music lol


Ace Step 1.5 XL SFT

https://files.catbox.moe/n5tow1.mp3
>>
File: wan21scail2 daisy chain.jpg (269 KB, 1987x1224)
269 KB JPG
>>109035043
>>109035027

Link prev images frames and video frames offset. Then combine video up top.

Note: 5 frames repeat is used as anchor during extensions according to KJ. So calculate accordingly.
>>
>>109035027
Hows it compare to wan 2.2 animate practically speaking?
>>
>mfw Resource news

06/11/2026

>i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models
https://zlab-princeton.github.io/i1

>AnchorEdit: Maintaining Temporal Consistency in Multi-turn Image Editing via Causal Memory
https://github.com/xuhang07/AnchorEdit

>Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models
https://github.com/elmma/mllm-reroute

>ComfyUI-BerniniStudio
https://github.com/CCpt5/ComfyUI-BerniniStudio

>Ideoprompt: plain English to Ideogram 4 structured JSON prompt
https://github.com/cocktailpeanut/ideoprompt

>Orion4D FXMax for ComfyUI
https://github.com/orion4d/Orion4D_FXMax

>JoyAI-Echo — GGUF (for low-VRAM ComfyUI)
https://huggingface.co/realrebelai/JoyAI-Echo_GGUF

06/10/2026

>EvoQuality: Self-Evolving VLM for Image Quality Assessment
https://huggingface.co/ByteDance/EvoQuality

>ComfyTV: Turn ComfyUI into a TapNow / LibTV-style canvas app
https://github.com/jtydhr88/ComfyTV

>PathRelax: Parallel-Path Relaxed Speculative Jacobi Decoding for Accelerating Auto-Regressive Text-to-Image Generation
https://github.com/Haodong-Lei-Ray/PathSpec

>SSR-Merge: Subspace Signal Routing for Training-Free LoRA Merging in Diffusion Models
https://github.com/nagara214/SSR-Merge

>SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning
https://teal024.github.io/SCAIL-2

>IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder
https://github.com/Row11n/IDEAL

>Image to Prompt: Web app to turn an image into Ideogram 4 JSON prompt
https://github.com/cocktailpeanut/image-to-prompt

>Simple Diffusion XS (sdxs-2b alpha version)
https://huggingface.co/AiArtLab/sdxs-2b

>Bernini-R: Repackaged model files for ComfyUI
https://huggingface.co/Comfy-Org/Bernini-R

06/09/2026

>SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning
https://teal024.github.io/SCAIL-2

>BLM-SGAN
https://github.com/haidy-maher/BLM-SGAN-Text-to-Image-Generation
>>
>mfw Research news

06/11/2026

>A Comprehensive Ecosystem for Open-Domain Customized Video Generation
https://arxiv.org/abs/2606.11783

>ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation
https://arxiv.org/abs/2606.11670

>SpecLoR: Spectral Lookahead Rectification for Motion-Coherent Text-to-Video Generation
https://arxiv.org/abs/2606.11969

>Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding
https://arxiv.org/abs/2606.11838

>FitVTON: Fit-aware Virtual Try-On via Body-Garment Size Control
https://zenoning.github.io/FitVTON

>ISAP-3D: Identity-Slot Aligned Part-Aware 3D Generation
https://arxiv.org/abs/2606.12099

>VOID: Defeating Unauthorized Mimicry in Latent Diffusion Models
https://arxiv.org/abs/2606.12263

>MultiToP: Learning to Patch Visual Tokens to Mitigate Hallucinations in Video Large Multimodal Models
https://arxiv.org/abs/2606.11792

>A Scalable PyTorch Abstraction for Multi-GPU Gaussian Splatting
https://arxiv.org/abs/2606.11390

>InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning
https://arxiv.org/abs/2606.12195
>>
File: WanimateCollage_00013.mp4 (2.41 MB, 1130x896)
2.41 MB
2.41 MB MP4
>>109035100
Wan22 Animate couldn't do this Lapwing video test. The video cut to another shot abruptly multiple times. The open pose pre process cannot guess half a body nor can it pre process eyes or emotions. So overall, Wan21_SCAIL-2 seems more capable out of the box.
>>
File: 456475.gif (3.97 MB, 320x222)
3.97 MB GIF
>>
>>109035139
>.gif (3.97 MB, 320x222)
wat
>>
Wonder how bernini stands up against scail 2. Scail 2 would allow using wan 2.1 loras? Heard bernini is somewhat wan 2.2 related so those loras may work?
>>
>>109035118
Does it need a sam 3 node or is it sort of automatic/baked in?
>>
>>109035155

It performs better with SAM3 it seems. Without it, the character is hallucinated or prompted in.
>>
File: 4775455.webm (3.86 MB, 420x291)
3.86 MB
3.86 MB WEBM
>>109035147
i have to keep jannies on their toes
>>
>>109032022
>>109034612
>nano shills

The fact you still to this day have to put
>"DONT MAKE AN IMAGE GIVE ME A PROMPT"

At the end of every gemini input, is truly a sign of how shot AI devs and engineers are.
>>
>>109035168
why are there nano banana shills? I have nano banana and grok imagine, just the non-decoy tier. But they're just not local.
>>
File: 753466.webm (3.72 MB, 420x291)
3.72 MB
3.72 MB WEBM
nice moves
>>
File: 56854.webm (3.17 MB, 420x291)
3.17 MB
3.17 MB WEBM
>>109035235
oops wrong one
>>
File: 4575376.gif (3.51 MB, 320x222)
3.51 MB GIF
>>
Do negative prompts work on ideogram?
>>
File: Wan21_SCAIL2_00018.mp4 (3.64 MB, 2016x672)
3.64 MB
3.64 MB MP4
>>109035118
Seek the elden ring, become the elden lord.

Note: The reference aspect ratio should match the video, that seems to improve accuracy.
>>
cozy breasd
>>
shameless repost for myself

>>109034840
>>109034927
ok thanks
i am making a significant amount of progress
my main gripe rn is that although my base img is quite crisp, i cant keep that detail
when i look at clips posted on civitai, so many of them are so crisp and clear with the motion while keeping the detail i feel like i am missing something
>>
File: web_img_freq_export.png (251 KB, 512x512)
251 KB PNG
>>109034986
>>
>>109035332

Depending on what you use. My experience is with Wan2.2. Increasing steps 4 high/4 low improve clarify. Increasing resolution also does that, but you need high VRAM. If you meant smoothness of motion, then you need to increase FPS or interpolate nodes to generate extra frames.
>>
>>109035355
im using ltx eros at the moment and im comparing my stuff to the stuff i see on the related eros civitai pages, so i know its capable of it i just wish i could pull workflows from peoples videos on civit like you can of images
>>
spare some ram for little old me?
>>
File: Wan21_SCAIL2_00012.mp4 (2.74 MB, 1726x2048)
2.74 MB
2.74 MB MP4
>>
File: Wan21_SCAIL2_00022.mp4 (2.53 MB, 1024x896)
2.53 MB
2.53 MB MP4
SCAIL is giving me kino, but it seems like it's a 50/50 whether it keeps the background from the original video or hallucinates an entirely new one
>>
>>109035438
>>109035451
Videochads are eating good.
>>
>>109035451
neck twist
>>
>>109035451
try that masking thing anon posted in last thread
>>
File: Wan21_SCAIL2_00016.mp4 (3.47 MB, 1786x2048)
3.47 MB
3.47 MB MP4
it can mimic facial expression (and tongue) way better
>>
File: 1772009263891184.png (303 KB, 448x640)
303 KB PNG
>>
File: 1765765141177754.png (116 KB, 256x384)
116 KB PNG
>>
>>109035494
do this but with bill gates
>>
File: 1753368550566162.png (78 KB, 384x256)
78 KB PNG
>>
File: 1767648464164387.png (132 KB, 384x256)
132 KB PNG
>>
File: Wan21 Scail2 00018(1).mp4 (3.37 MB, 2044x2048)
3.37 MB
3.37 MB MP4
way better hair rendering, but worst boobs jiggle
>>
File: Wanimate_00181.mp4 (2.82 MB, 932x1280)
2.82 MB
2.82 MB MP4
wan animate
>>
>>109035544
what's the original from?
>>
File: test7.webm (3.41 MB, 1890x992)
3.41 MB
3.41 MB WEBM
Previously
>>
File: Wan21_SCAIL2_00024.webm (3.79 MB, 1024x896)
3.79 MB
3.79 MB WEBM
>>109035486
I don't think it's a masking issue, since SCAIL automatically masks both the video and input image
Like with this, the first time I tried it I gave it a dogshit low res image and it just made up a new person, but it kept the background of the original video. I try again with a high res image and the background gets deleted.
(ignore the last ~second of weirdness, that's just because I'm bad at math)
>>
File: Wan21_SCAIL2_00022.mp4 (2.58 MB, 1636x996)
2.58 MB
2.58 MB MP4
>>109035561
We're so back.

wan2.1_14B_SCAIL_2_fp8_scaled.safetensors

Usage:
60/128 GB Sys Ram
13.6/31.5GB VRAM
>>
File: 1777691152730786.png (3.8 MB, 1536x1792)
3.8 MB PNG
>>
>>109035571
do this but with jeff bezos
>>
>>109035572
That's really good. Workflow?
>>
File: 1767036842434299.gif (198 KB, 384x256)
198 KB GIF
>>109035524
>>
>>109035494
what about two people interacting?
>>
>>109035596
Need to git pull this request.

https://github.com/Comfy-Org/ComfyUI/pull/14373
>>
>>109035611
Sweet thanks. KJgod delivers again
>>
>>109035451
how u get the source video background into the output video?
>>
>>109035685

From my usage, it seems white background for the reference is ignored and you get the video's background. Black/gray bg generate new background
>>
>>109035572
can it also reasonably do multiple character/object references or just one?
>>
So will these ideogram bbox be possible to be implemented into klein workflows?
>>
Scail looks really good. I was skeptical but it looks genuinely good
>>
>>109035702
Nta but in my experience with wanimate it’s one character per pass. You can do one character in the scene then a second one on a second pass
>>
Why do you read the Comfy pull requests like they are the news or some Reddit post tier content? How empty is your life? Pathetic...
>>
>>109035708
No but Klein is an edit model so you can just mask areas and tell it to put stuff there
>>
File: 1752961253494863.jpg (50 KB, 970x816)
50 KB JPG
How to know which epoch is better? I can't make up my mind by just the outputs
What prompts should I use to check?
>>
File: Wan21_SCAIL2_00033.mp4 (2.8 MB, 1792x1110)
2.8 MB
2.8 MB MP4
>>109035600
>>109035725
It can identify multiple people; the theoretical limit is 64. The quality of their interactions will be dependent on how much that specific interaction exists in the training data.

>>109035685
There are two modes, one where the video animates the input image, and one where the character(s) in the input image replace those in the video. For some reason, it's a coin toss whether the latter keeps the video background or hallucinates a new one. Consider
>>109035572
where it's partly kept it, and
>>109035571
where it's gone completely.
>>
>>109035734
Satan in positive prompt to check which epoch has the most protective effects.
>>
>>109035734
I took a half year break because I ran into this as well
there is no way out
just accept that you are now FUCKED
>>
>>109035745
>It can identify multiple people; the theoretical limit is 64. The quality of their interactions will be dependent on how much that specific interaction exists in the training data.
nice. maybe my storage space won't suffer as much then. guess I'll see when I can test both bernini and scail-2.
>>
>>109035734
>I can't make up my mind
I suspect this problem arises frequently in other areas of your life in addition to your epoch predicament.
>>
this seems to do the trick. replace character in source video
>>
>>109035803
okay how do you know?
>>
>>109035734
vibecode a blind test. If you can't tell why do you care.
>>
File: Wan21_SCAIL2_00036.mp4 (1.34 MB, 1500x690)
1.34 MB
1.34 MB MP4
>>
>>109035823
the lack of jiggling leaves a lot to be desired but the consistency maintained when it changes angle is impressive
>>
>>109035014
>Is it possible to fuse the power of data center models with local freedom?
what kind of retarded question is this? local models are that fusion, it's a miracle that they even exist in the first place
>GPUs are intentionally stagnant with pitiful consumer amounts of VRAM
6 years ago, large amounts of VRAM were almost unilaterally considered a price-hiking strategy for almost any consumer application except semi-professional video production, and VRAM upselling over the years has caused a continuous decline in software quality and resulted in catastrophically worse performance of operating systems, basically all software, the entire video game industry, because what was at the time uselessly high memory bandwidth was relied on to "forgive" bad programming
we are now still probably well within the early adoption phase of local AI and 90% of consumers continue to not give a shit about it at all, which is apparently fertile soil for you to crumple up reality into this little cum napkin you've created after watching a gamers nexus video
>a hybrid approach might be the solution
a solution to fucking what? if it's not local, then it's not fucking local, there is no "hybrid" between local and remote, because that would be fucking remote
>>
>>109035734
Diverse prompts help, anatomic positions besides 1girl, standing help, text helps, different styles help.
Let's say you are training a lora from the photos of a 1girl. I would use something like this:
1) Just 1girl, no elaborate conditioning, see how it swings when given reins
2) 1girl, lying on sand, from above, tongue out, black bikini
3) 1girl, jumping, on air, flying kick
4) 1girl, closed eyes, smile, holding a sign that says: "IS MY LORA FRIED NOW?"
5) 1girl, painting, Renaissance painting, outdoors, forest
>>
File: Wan21_SCAIL2_00038.mp4 (2.72 MB, 1282x1500)
2.72 MB
2.72 MB MP4
>>
File: Wan21_SCAIL2_00040.mp4 (2.71 MB, 1500x1330)
2.71 MB
2.71 MB MP4
it also replaced the old man in the background. how to retain the old man?
>>
>>109035896

Use white background on reference character..
>>
File: Wan21_SCAIL2_00051.mp4 (3.33 MB, 1792x960)
3.33 MB
3.33 MB MP4
>>109035760
After a bunch of testing, the segmentation tool can identify and isolate objects, but SCAIL can't replace them, it can only replace people.
It can handle replacing multiple people in one pass, but it seems to struggle a bit as you add more and more people
>>
how do you organize all your gens?
>>
>>109035974
badly
>>
File: Wan21_SCAIL2_00046.mp4 (3.45 MB, 1280x1500)
3.45 MB
3.45 MB MP4
>>
>>109035974
i just do %date
when i feel like i have too much bloat, i go back and delete all the gacha rolls, usually just keeping 1-2 of each kind of gen i did
i have a seperate area of folders where i specifically copy outputs in that i want to keep for reference reasons
>>
>>109036007
ok this is getting pretty good, can you put this asian bitch in very out of place scenes, like have her replace a girl in a game of thrones clip
>>
>>109036007
Would be good “imagine if we imported these ppl instead” meme gifs
>>
>>109036017
if you can supply the clip
>>
7 is a prime
as are 9, 11 and 13
>>
File: Wan21_SCAIL2_00027.mp4 (644 KB, 1548x636)
644 KB
644 KB MP4
>>109035945
Tried sequential replacement, still lost some data on the first char. Still. Wan2.1 has high potential. I'm done testing for the night.
>>
>>109036046
I think I need to segment video number two to feed to video 3 to clamp down the masked area. Gonna try that later.
>>
>>109036046
curious, what's the missing data?
>>
File: 0.png (1.43 MB, 404x924)
1.43 MB PNG
>>109036059
Her yellow dress is gone on second pass.
>>
>>109035945 >>109036046
hey, that's not bad at all! thank you very much for trying this
>>
>>109036069
I see, thx
>>
File: Wan21_SCAIL2_00054.mp4 (1.46 MB, 1500x452)
1.46 MB
1.46 MB MP4
>>
>>109036150
damn that looks good
>>
>>109035572
> wan2.1_14B_SCAIL_2_fp8_scaled.safetensors
Only one checkpoint? No high, no low?
>>
>>109036150
fix movies
fix tv shows
>>
File: Wan21_SCAIL2_00059.mp4 (2.2 MB, 1800x684)
2.2 MB
2.2 MB MP4
>>
File: Wan21_SCAIL2_00083.mp4 (3.05 MB, 1792x1408)
3.05 MB
3.05 MB MP4
>>109036215
Are you just using the workflow from the pull request?
I can't get it to reliably keep the background from the original video, and I've confirmed it's not the 'replacement mode' toggle, if I turn that off it keeps the background from the image.
https://files.catbox.moe/zmqr82.png
>>
>>109036229
no idea, i just works when I switch replacement mode = true
>>
>>109036229
I can’t believe you would edit Ronald McDonald out of video instead of editing him in
>>
>>109036215
kino
>>
Is there a turbo lora for Z Image where the outputs aren't completely slopped
>>
maybe I'm just a schizo but is the reason civitai hasn't implemented an ideogram4 filter category, like they do for other models, because of the license and they might, very soon, delete the loras that have been posted?
>>
File: Wan21_SCAIL2_00089.mp4 (2.79 MB, 592x1040)
2.79 MB
2.79 MB MP4
>>109036249
>>
>>109036339
i saw this in the playplace
>>
>>109036291
>Turbo lora
??? Just use the turbo model.
>>
is it me or is every wan svi workflow trash?
Can't get nearly the quality I get with regular wan22 i2v
>>
File: Wan21_SCAIL2_00079.mp4 (2.68 MB, 1800x1144)
2.68 MB
2.68 MB MP4
>>
question
is the rtx upscale node in a bunch of the video workflows actually totally independant, like could i make a new workflow with just that and plug in any video and have itupscale it?
>>
>>109036558
yes
>>
>>109036558
yes plus use the rtx node from this node pack: deno-custom-nodes
>>
>>109036589
why?
>>
>>109036530
> gosling and stone
>>
I've been using seedvr2 upscaler for images and videos for months now. Is RTX upscaling better?
>>
>>109036600
or don't
>>
>>109036608
no. it's faster. i don't upscale videos.
>>
american hours: brown sperging and images
european hours: wealthy posters and video gens
>>
>>109036673
im from australia
is it peak posting quality right now?
>>
>>109036673
I don't think wealthy europeans are on 4chan at this hour. They are working.
>>
>>109036694
wealthy people don't work
>>
File: %Date 00001.mp4 (1.87 MB, 720x1080)
1.87 MB
1.87 MB MP4
I'm learning over here.
>>
>>109036706
most of them do. and hard.
>>
>>109036709
>tz tx
what did she mean by this?
>>
File: Ltx 00002.mp4 (1.52 MB, 720x1080)
1.52 MB
1.52 MB MP4
>>109036731
was meant to be "i can ltx now"

text is for tomorrow, tonight, its more ass
>>
>>109036865
Text is not one of LTX's strong points.
>>
>>109034986
5
>>
>>109036865
oooh yeah
>>
File: 354686.webm (3.85 MB, 420x291)
3.85 MB
3.85 MB WEBM
>>109036879
i remember making a video of a person wearing a shirt that has words on it, and it seemed to work correctly when i frame injected a flat image of the shirt as well as wrote in the prompt what the shirt said. i should try more text experiments



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.