[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: collage.jpg (2.79 MB, 2759x5441)
2.79 MB JPG
Discussion and Development of Local Image, Video, and Music Models

Previous: >>109041690

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
SDWebUI: https://rentry.org/ldg-lazy-getting-started-guide#the-stable-diffusion-web-ui-lineage
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, & Upscalers
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
https://animadex.net

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>Wan
https://github.com/Wan-Video/Wan2.2

>LTX-2.3
https://huggingface.co/collections/Lightricks/ltx-23

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
can Klein 9B work with anime also can it do nsfw edits aka give a fat titty anime bitch a nipple ring?
Is the performance between quants really that big?
>>
>mfw Resource news

06/13/2026

>PRXPixel (text-to-image, pixel space)
https://huggingface.co/Photoroom/prxpixel-t2i

>SCAIL Auto Extend
https://github.com/Brobert-in-aus/scail-auto-extend

>MotionBricks: Scalable Real-Time Motions with Modular Latent Generative Model and Smart Primitives
https://nvlabs.github.io/motionbricks

>dyfuzor-web: turns an Excalidraw scene into an Ideogram-4 structured JSON
https://github.com/karolrybak/dyfuzor-web

>sageattention-autotune: Autotuned block sizes and other QoL improvements
https://github.com/woct0rdho/sageattention-autotune

06/12/2026

>ComfyUI-Flux2Klein-Enhancer: Conditioning enhancement and reference latent control
https://github.com/capitan01R/ComfyUI-Flux2Klein-Enhancer

>InterleaveThinker: Reinforcing Agentic Interleaved Generation
https://zhengdian1.github.io/InterleaveThinker-proj

>Experimental Anima LLLite Regional Controlnet
https://huggingface.co/Sen-sou/Anima-LLLite-Regional-Controlnet

>World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible
https://haoz19.github.io/world-tracing-page

>VietFashion: Benchmarking Sketch-Text Composed Image Retrieval for Cultural Outfits
https://hng0303.github.io/VietFashion

>Modality Forcing for Scalable Spatial Generation
https://modality-forcing.github.io

>VideoMDM: Towards 3D Human Motion Generation From 2D Supervision
https://videomdm.github.io

>EvTexture++: Event-Driven Texture Enhancement for Video Super-Resolution
https://github.com/DachunKai/EvTexture

>Budget-Constrained Step-Level Diffusion Caching
https://github.com/Westlake-AGI-Lab/BudCache

>ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation
https://github.com/Snowball0823/ECA

>InterleaveThinker: Reinforcing Agentic Interleaved Generation
https://zhengdian1.github.io/InterleaveThinker-proj

>i1-3B: A Simple and Fully Open Recipe for Strong Text-to-Image Models
https://huggingface.co/zlab-princeton/i1-3B
>>
>mfw Research news

06/13/2026

>MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold
https://orange-3dv-team.github.io/MoVerse

>Learning to Solve Generative ODEs Beyond the Linear Span
https://arxiv.org/abs/2606.08672

>Echo-Memory: A Controlled Study of Memory in Action World Models
https://arxiv.org/abs/2606.09803

>Beyond Consistency: Preserving Temporal Structure in Zero-Shot Video Editing
https://arxiv.org/abs/2606.08780

>BioVid: Autoregressive Video Generation with Biological Behavior Semantic Comprehension
https://arxiv.org/abs/2606.08674

>CSFlow: Aligning Flow Matching with Human Contrast Sensitivity
https://arxiv.org/abs/2606.08833

>MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training
https://arxiv.org/abs/2606.08788

>Where the Score Lives: A Wavelet View of Diffusion
https://arxiv.org/abs/2606.08309

>MOFA-VTON: More Fashion Possibilities with Fine-Grained Adaptations in Virtual Try-On
https://arxiv.org/abs/2606.11148

>Next Forcing: Causal World Modeling with Multi-Chunk Prediction
https://gangweix.github.io/next-forcing

>MotionEnhancer: Leveraging Video Diffusion for Motion-Enhanced Vision-Language Models
https://arxiv.org/abs/2606.06853

>Can Image Models Imagine Time? ImageTime: A Novel Benchmark for Probing Visual World Modeling Through Spatiotemporal Consistency
https://arxiv.org/abs/2606.10620

>Rethinking 3D Shape Generation: Diffusion over Superquadrics
https://arxiv.org/abs/2606.08957

>Vision-Language Asymmetry in Bistable Image Captioning
https://arxiv.org/abs/2606.08031

>Do Vision-Language Models See or Guess? Measuring and Reducing Textual-Prior Reliance with a Phrasing-Controlled Benchmark
https://arxiv.org/abs/2606.10400

>RAPID: Layer-Wise Redundancy-Aware Pruning and Importance-Driven Token Merging for Efficient ViT
https://arxiv.org/abs/2606.08156

>A Unifying View of Attention Sinks: Two Algorithms, Two Solutions
https://arxiv.org/abs/2606.08105
>>
>visted tiktok for the first time in years looking for dance videos
>99% of the results were just kling motion slop
uhhh... wtf
anyone know a good place to find reference videos?
>>
File: debo_ccg_fia_00011_.png (2.31 MB, 1792x977)
2.31 MB PNG
>>
>>109047336
>>109047344
Can you please stop spamming this, you have been caught linking anons malware multiple times which is shown in the OP.
>>
File: ComfyUI_temp_ujfqp_00002_.jpg (1.57 MB, 3072x1792)
1.57 MB JPG
All I want for christmas is a model that knows how guns and shooting works.
>>
>>109047363
2 more years
trust
>>
File: anima1_00040_.jpg (360 KB, 1152x1648)
360 KB JPG
>>
Remember to put aliasing in the negative prompt
>>
>>109047336
>>109047344
Fuck off thread schizo
>>
File: dance_gothic2.webm (2.87 MB, 960x832)
2.87 MB
2.87 MB WEBM
I wish there was a LTX Scail
or a Wan22 at least
>>
Using anima and can't get rid of dark skin on pov hands etc. I can't self insert onto this. How fix? Preferably without negatives for turbo, or I guess I'll figure out negpip finally
>>
>>109047477
Even negpip can't fix this lol
>>
>>109047358
>>109047456
Agreed, this spam has no purpose
>>
>>109047470
should I cum to this /g/?
>>
File: Wan21_SCAIL2_00138.mp4 (3.87 MB, 576x1024)
3.87 MB
3.87 MB MP4
>>109047470
just use v2v if you're so desperate for those models
>>
>>109047456
>>109047488
Why is he posting this here again?
I thought he only posted this in his containment thread?
>>
>>109047436
pleasu understand, work in progress
>>
File: 144755_00001-audio.webm (2.85 MB, 480x832)
2.85 MB
2.85 MB WEBM
>>109047495
idk, should you?
>>
File: anima1_00102_.jpg (600 KB, 1152x1648)
600 KB JPG
>>
>>109047484
There's no way of prompting around it? It's just too baked into the data? I'm messing around with negpip now and I can get them grey which is better I suppose...

Could I get around it with a lora? If so, how many images would I need? I assume fully synthetic data is bad but I guess I can just grab booru images and their captions and just edit the image to match what I need?
>>
>n*gbo is lonely again in his containment general
Fuck off
>>
>>109047470
>>109047509
cant remove the e-girl cringe even with AI
>>
>>109047550
I did
thanks anon
>>
>>109047592
Must be a sad life to be this obsessed with some illusionary nemesis.
>>
almost forgot

maintain thread quality
https://rentry.org/LDG_vital_info
>>
File: Wan21_SCAIL2_00146.mp4 (2.78 MB, 576x1024)
2.78 MB
2.78 MB MP4
>>
File: 152745_00002-audio.webm (3.83 MB, 960x832)
3.83 MB
3.83 MB WEBM
>>109047687
based
>>
>spent hours tinkering and perfecting a style with anima
>noob nails it on the first try
Is Aniam seriously meant to be the replacement for SDXL?
>>
>>109047805
what do I need to do stuff like this? man I feel like shit changed in the span of 2 weeks and I'm fully out of the loop.
can you point me in the right direction pls
>>
>>109047831
update comfy and use this workflow

https://github.com/user-attachments/files/28759255/Wan21_SCAIL2_Testing.2.json

https://huggingface.co/Comfy-Org/SCAIL-2/tree/main/diffusion_models
>>
>>109047764
How is this illusionary when the anon spreads malware?
>>
File: anima1_00135_.jpg (402 KB, 1152x1648)
402 KB JPG
>>
>>109047867
2B and a C
>>
>>109047470
left mogs
>>
>>109047823
You are unable to tell the difference between low channel and high channel VAEs?
>>
>>109047939
vae is sloppers copel, if the style is shit, the vae is shit
>>
>>109047940
>no I cannot tell the difference
Okay.
>>
File: Wan21_SCAIL2_00065.webm (3.95 MB, 1064x720)
3.95 MB
3.95 MB WEBM
>>
>>109047509
>THREE FINGERS
>>
>>109047956
gen her kissing her wife
>>
File: 152745_00002-audio2.webm (3.84 MB, 480x832)
3.84 MB
3.84 MB WEBM
>>109047867
>>
File: 155041_00002.webm (3.82 MB, 960x832)
3.82 MB
3.82 MB WEBM
>>
File: anima1_00151_.jpg (514 KB, 1152x1648)
514 KB JPG
>>109048005
smooth movement
>>
File: Wan21_SCAIL2_00015.mp4 (2.76 MB, 592x1056)
2.76 MB
2.76 MB MP4
why is my shit so fucked can someone help me?
im using the mxfp8 with default workflow
>>
>>109047867
>>109048005
weirding me out how her waist looks like a cylindrical lego piece on top of a pelvis block
>>
I am worried that this hobby has taken over my life. I don't do anything other than genning cute 1girls. All day every day.
>>
>>109048095
go outside and look at the 1girls irl
>>
>>109048095
Same. I mean I do other things as well but I'm starting to wonder if I'm addicted to genning. If my computer is turned on I'm genning all the time.
>>
>>109048100
but then he'll be charged with looking at 1girls
>>
i dream of my 1girls
>>
>>109048109
anon's from the UK?
>>
File: randy4676549.png (1.57 MB, 1079x1079)
1.57 MB PNG
>>109048118
>>
File: Wan21_SCAIL2_00164.mp4 (1.24 MB, 640x848)
1.24 MB
1.24 MB MP4
>>109048095
Tell me about, I have to meet this girl tonight and I'd rather stay at home, smoke some weed, gen some 1girls and then use SCAIL to generate coom videos
>>
>>109048055
increase the fps on the save node

or use my workflow
it picks the fps automatically

https://github.com/user-attachments/files/28759255/Wan21_SCAIL2_Testing.2.json
>>
>>109048187
>I'd rather stay at home, smoke some weed, gen some 1girls and then use SCAIL to generate coom videos
Do it. That's what all the cool anons do.
>>
File: anima1_00187_.jpg (439 KB, 1152x1648)
439 KB JPG
>>
>>
>>109047313
Is Earnie any good? Is it worth downloading and trying?
>>
File: Wan21_SCAIL2_00167.mp4 (1.93 MB, 640x1136)
1.93 MB
1.93 MB MP4
>>
>>109047564
It's not anima, you lie anon
>>
>>109048139
This guy sucked
>>
File: 1772734956793255.jpg (1.5 MB, 1248x1824)
1.5 MB JPG
>>
>>109048264
Woman appears left on chair, like horror movie
>>
File: a8subn.png (1.93 MB, 1024x1024)
1.93 MB PNG
>>
File: Wan21_SCAIL2_00173.mp4 (1.95 MB, 640x1136)
1.95 MB
1.95 MB MP4
>>
>>109048401
I have been waiting months for anon to animate that gen.
>>
File: 170314_00001-audio.webm (3.8 MB, 480x832)
3.8 MB
3.8 MB WEBM
the towel is floating
>>
File: G7mhbQ-XoAAu3kP.jpg (288 KB, 1494x2046)
288 KB JPG
>>
File: 171222_00001.webm (3.82 MB, 480x832)
3.82 MB
3.82 MB WEBM
>>
>>109048512
very original fellas
>>
File: Wan21_SCAIL2_00023.mp4 (1.1 MB, 592x1056)
1.1 MB
1.1 MB MP4
>>109048189
i swapped to the fp8 model and seemed to have fixed it
thanks for the help
>>
https://files.catbox.moe/qxl7gi.mp4

made some progress
>>
>>109048637
Anon, this is pdf territory.
>>
>>109048663
that's an actual commercial that aired on television though
>>
Why does it seem like no one cares about 3D stuff? Like I haven't seen anyone try to implement Kimodo in Comfyui except some seemingly broken node pack I found on google. I feel like Kimodo would be really useful when paired with SCAIL2
>>
>>109048677
2d = soul
3d = soulless
>>
>>109048688
retard
>>
File: 1.png (1.33 MB, 768x926)
1.33 MB PNG
Having a skill issue.
How do I prompt ideogram to get real life pokemans instead of whatever the fuck this is?
>>
File: dePC_ch50a_00038_.png (2.04 MB, 1843x1152)
2.04 MB PNG
>>109048705
now I'm tempted to revive my pokemon wildcard workflow
>>
>>109048637
shit taste, but good work. poor guy. he's gonna catch std at a young age kek
>>
>>109048759
Looks like shit
>>109048705
post the prompt and we can help you
>>
File: scail.mp4 (2.98 MB, 846x634)
2.98 MB
2.98 MB MP4
>>
File: dePC_ch50a_00057_.png (2.61 MB, 1843x1152)
2.61 MB PNG
>>109048797
no one on this board has ever been able to top my random pokemon
I know it
you know it
everyone knows it
>>
Computer generate Hebi Nyoubou from Fate/Grand Order doing a rimjob invitation
>>
>>109048835
Ok what the fuck is that thing
>>
File: 2.png (958 KB, 593x757)
958 KB PNG
>>109048797
>post the prompt and we can help you
Normal amateur photography prompt. That pic was just "vaporeon". "Real life creature or animal" created even worse monstrosities.
Now I found that "highly detailed 3D render of a Vaporeon with fur simulation" looks decent I guess.

>>109048759
gib me
>>
File: dePC_ch50a_00059_.png (2.6 MB, 1843x1152)
2.6 MB PNG
>>109048858
gen 25 pokeman
ground/poison, maybe?
>>
>>109048874
>Normal amateur photography prompt. That pic was just "vaporeon". "Real life creature or animal" created even worse monstrosities.
This isn't useful information to help you, now you're entertaining the malware spreader.
please go to /sdg/ if you keep being obtuse like this.
>>
RIP, this thread fucking sucks now with the lame ass deepfake posters.

Thankfully theres alternatives.
>>
>>109048969
thanks for you contribution, nogen
>>
File: q_y0ibii.png (956 KB, 1536x1024)
956 KB PNG
>>
>>109047844
thanks bro
also does this run on a 4090 and 64gb ram?
or would I need more?
>>
>>109047353
/wsg/
>>
File: debo_ccg_fia_00008_.png (1.92 MB, 1792x977)
1.92 MB PNG
>>
> >109049207
fuck off
>>
File: 1774517655331364.png (289 KB, 873x409)
289 KB PNG
I've asked every fucking online AI to composit this image with the happy merchant image (basically put it in the Polaroid photo's frame) and every one of them has refused to do it because it violates guardrails.

is there a way to do it locally with any of the tools?
>>
>>109049288
yeah, use GIMP
>>
feels like we're stuck in 2023. plastic skin shitgens, unironic wan2.2 posting, tardbo sdxl-tier slop. local is dead
>>
>>109049304
Be the change you want to see.
>>
>>109049294
GIMP is incredibly hard for this sort of stuff. I was hoping some AI could do it. I know ChatGPT can do this sort of shit but it refuses in this case.
>>
Anima desu :(
>>
>>109049304
Actually, you are wrong. AI has advanced by great strides over the past few years. As a pioneer of such new technologies, it may appear to be moving at a slower pace since you are more up-to-date with such news in comparison to the average person.
>>
>>109049364
it takes like 2 seconds to shoop this dude, you dont even need AI for that.
>>
>>109049364
yet edits like that have been done millions of times by hobbyists
>>
Downloading girls I know from insta and facebook and undressing them turns out to be quite some fun.
>>
>>109049383
>Downloading girls I know
I dont know any 3DPDs at all and thus have nobody to undress.
>>
>>109049372
actually it's more like API innovates while local stagnates
>>
>>109048360
By CoD rules, he has the higher K/D ratio so he pwns u n00b
>>
File: debo_ccg_fia_00009_.png (1.03 MB, 1792x977)
1.03 MB PNG
>>109049366
>>
>>109049391
Might be a skill issue, I guess. Try touching grass first.
>>
>>109049399
I understand you lack the vocabulary to properly speak your mind but do not despair! Open weights are advancing at a steady pace, if a few steps behind paid API models. Companies and open source communities alike benefit from the valuable research from open weight models, so it is highly unlikely to truly "die".
>>
>>109049399
>API innovates
Or maybe they have access to huge models? Also what I saw from GPT didn't blow my mind actually. Lots of slop too.
>>
Kijai deleted the Bermini models? What happened?
>>
>>109049453
Also it's Bernini. Don't know why I kept spelling it with an M
>>
File: q_elsby0.png (1.46 MB, 1536x1024)
1.46 MB PNG
>>
>>109049288
klein or qwen should be able to do this trivially
>>
Catjack desu :(
>>
Do you need regional prompting with anima?
I'm skeptical that it can pull it off with natural language
>>
>>109049569
Thanks so much anon!
>>
>>109047313
Anyone with 7900xtx? Can't run Ltx2.3 I2V. Getting stuck or OOM.
>>
i scail only 1girl or can it do more?
>>
File: debo_ccg_fia_00013_.png (2.21 MB, 1792x977)
2.21 MB PNG
>>109049698
anima isn't the worst with composition instruction but it'll need help if you wanna do precise/intricate stuff. theres a regional conditioning node as well as a regional cn you could play with

https://github.com/Sen-sou/Comfyui-Anima-Regional-Conditioning
https://huggingface.co/Sen-sou/Anima-LLLite-Regional-Controlnet
>>
>>109049884
Sorry but with your track record of spreading malware I will ignore your suggestions.
>>
>>109049884
Go back to your containment general debo
You're not welcome here
>>
>>109049884
catbox?
>>
>>109049930
Haven't been here in months why is he posting here?
Wasn't he happy in his own thread?
>>
>>109049884
thread schizo
>>
File: debo_ccg_fia_00017_.png (1.62 MB, 1792x977)
1.62 MB PNG
>>109049938
this mixed model mostly ignores anima artist tags, sadly
https://files.catbox.moe/2apo25.png
>>
>>109049968
thanks
>>
File: file.png (321 KB, 1747x930)
321 KB PNG
Got sick of the image segmentation not working properly so I've vibe coded a node which lets you draw points/bounding boxes around your inputs so you can specify exactly which image character maps to which video character.
>>
>>109047805
kek'd
>>
>>109047873
looks more like 2EE to me
>>
File: clip_Single_00002.mp4 (3.1 MB, 1056x592)
3.1 MB
3.1 MB MP4
>>109050133
and of course after perfectly masking out each character in both the input image and video it ignores their mask colours and just matches them left to right
>>
>>109048811
fffff
>>
>>109049288
yes there is
>>
>>109050172
how?
>>
>>109050208
carefully
>>
File: Jenny Happy Wolf.webm (3.9 MB, 720x1280)
3.9 MB
3.9 MB WEBM
>install old 2080 XC Ultra that was just sitting in storage
>6.25 slots of GPU
>4090 Strix OC just a compute node now
>no pesky config in BIOS/Windows
>both use the right amount of PCIE lanes automagically (had to move the 4090 to the lower slot so they'd both fit - you can fit a whole sheet of paper between them!))
>mrw everything just werks
Damn, multi-GPU sure is a lot better today than the last time I used it (8800GTs). Basically zero setup now.
>>
File: debo_ccg_fia_00019_.png (1.57 MB, 1792x977)
1.57 MB PNG
>>
File: 23463456.webm (3.81 MB, 420x291)
3.81 MB
3.81 MB WEBM
>>109050307
i took out my dual gpu setup because the powerful card on top was dying of suffocation. i should try gain with a riser cable so the bottom card can hang down
>>
>>109050307
horse face
>>
File: ComfyUI_01093.jpg (3.26 MB, 1500x2000)
3.26 MB JPG
>>109050320
My 4090 is about 4" closer to the two 140mm fans on the bottom of my case, so it falls back to ~32C about 30 seconds after finishing it's workload. The fans almost never turned on on the 2080 doing Windows stuff, so I'm not that worried about it having air to breath.

>>109050400
Horses wish they were that cute!
>>
File: 014459CUI_00001_.png (1.26 MB, 1152x1536)
1.26 MB PNG
>>
File: 1754996652572591.webm (512 KB, 1280x1408)
512 KB
512 KB WEBM
>>
https://files.catbox.moe/n9qo78.mp4
>>
>>109050468
wtf is this real?
>>
File: kino alert.gif (44 KB, 220x220)
44 KB GIF
>>109050449
>>
>>109050484
Yes. Teleporter was vibe coded by fable.
>>
>>109047908
left is brown
>>
>>109050492
meant to reply to >>109050468
>>
>>109050435
How did you get both?
>>
>>109050504
By prompting for both.
>>
File: 020630CUI_00002_.png (1.25 MB, 1536x1152)
1.25 MB PNG
>>109050435
https://animadex.net/?mode=characters
>>
>>109050523
oops, misquoted >>109050504
>>
File: 324643275.webm (3.88 MB, 420x291)
3.88 MB
3.88 MB WEBM
>>
>>109049698
>I'm skeptical that it can pull it off with natural language
Are you unable to try it out for yourself?
>>
>>109050519
Didn't think it actually works this way in Anima with unrelated characters from different fandoms, I'm slightly behind the curve.
>>109050529
Cool shit, thank you.
>>
File: debo_ccg_fia_00024_.png (1.22 MB, 1792x977)
1.22 MB PNG
>>
File: 022308CUI_00001_.png (1.14 MB, 1536x1152)
1.14 MB PNG
>>109050561
Just one thing. If you use character loras, there's a chance it will leak onto the non-lora character.
>>
>>109050492
meant for >>109050402
>>
>>109050492
meant for >>109050545
>>
How is Wan2_Bernini? Mogged by Wan21_SCAIL-2? Everyone seems to switch to scail2 immediately.
>>
https://files.catbox.moe/b34hy8.mp4
>>
File: 1755883482694685.mp4 (3.09 MB, 592x1040)
3.09 MB
3.09 MB MP4
>>
File: 24432735.gif (2.14 MB, 320x222)
2.14 MB GIF
kenji is getting handsy
>>
>>109050654
meant for >>109050660
>>
File: 3456345678.webm (3.92 MB, 420x291)
3.92 MB
3.92 MB WEBM
brat
>>
>>
>>109050156
ah, that sucks

so you'd even have to use the perfect mask to rearrange the characters left to right on a temp image?
>>
>>109050156
It freaks me out to think about how many of the people who made original animation died in a fire.
>>
File: clip_Double_00014.webm (3.91 MB, 1136x1280)
3.91 MB
3.91 MB WEBM
https://github.com/Brobert-in-aus/scail-auto-extend
My SCAIL-2 auto extend now has an improved SAM3 identity tracker which lets you draw bounding boxes around each subject.
It's not a perfect solution, the model only takes the masks as a suggestion, which is why it didn't order the characters as they were masked in this gen:
>>109050133
>>109050156
The main benefit of this new node is that you can force every character present to be identified and improve the chances they are accurately tracked through the video.

Technical breakdown if anyone gives a shit.
SCAIL-2 has only one reference latent, reference_latents[-1] . This means all characters live as spatial regions inside a single composited frame. Their appearance is encoded by where they are in that frame.
The colour signal is an additive embedding: x = x + patch_embedding_mask(ref_mask_latents) and scail_x = scail_x + patch_embedding_mask(sam_latents), competing against RoPE positional encoding inside full attention. When the reference composite is a horizontal row that mirrors the driving row, same-x reference tokens are positionally "closest," so position dominates the additive colour nudge. And you can't sidestep it with true per-identity references, because the model only ever consumes reference_latents[-1]
The model also requires all characters to be present in the first frame of the video. SAM3 will happily mask out late arrivals, but the model won't, again due to the single reference latent which insists there are more characters and will force them into the frame.
This also bleeds through to jump cuts. If there are two characters and they've swapped places after the jump cut (e.g. the camera jumps to the other side of them), then the spatial relationship in the reference latent will override the mask and cause a character mixup (as you can see in the attached)

>>109050921
Yep, arrange the input image characters left to right matching their intended replacements.
>>
>>109050962
Nice work anon. Seems like you can just stop genning at the cut away frame and retrack with a second group. Not an insurmountable flaw. I am glad we got Wan21SCAIL2
>>
>>109050865
I'd love a cigarette right now
>>
File: debo_ccg_fia_00026_.png (2.16 MB, 1792x977)
2.16 MB PNG
>>
>>109050962
>The model also requires all characters to be present in the first frame of the video. SAM3 will happily mask out late arrivals, but the model won't, again due to the single reference latent which insists there are more characters and will force them into the frame.
>This also bleeds through to jump cuts.
I was actually wondering about this. Good to know.

I recall bernini was better at this in terms of actually tracking identities? Haven't tried it yet (and I don't really understand the model papers, just the empirical experimentation which of course is slow).
>>
>>109051020
I did some digging through the model card and there's (very experimental) support for multiple reference images.
In theory, if each character+mask is in its own reference image then the spatial relationship goes away and the model is forced to rely on colour.
Will require some around fuckery with having either multiple input images or automatically separating by mask (IIRC the SAM3 nodes already natively support this so might not be too painful).

>>109051061
Bernini is next on my list once I'm bored of SCAIL, optimising it is tickling my 'tism so probably going to be a little while though.
>>
>>109051061
Bernini do not have extension built in. Which kind of makes worse than SCAIL2. Kijai didn't include extension function, so I assume it was not easily extendable.
>>
>>109050865
south korea is such a dirty shitty place
>>
>>109051061
I tried bernini, sometimes it outright didn't replace the subject. It simply render the same video. Huge waste of time.
>>
File: Wan21_SCAIL2_00220.mp4 (1.63 MB, 736x1344)
1.63 MB
1.63 MB MP4
>>
>>109051075 >>109051099
Maybe SCAIL-2 is overall better and perhaps it'll be easier to just work cut by cut? I guess I'll have to try at some point later.
>>
where scail2 workflow
>>
>>109051141
I think you want the extension and workflow linked as git repo in >>109050962
>>
File: clip_Double_00021.webm (3.94 MB, 1024x2016)
3.94 MB
3.94 MB WEBM
The hair error at the end is unfortunate, and caused by the male dancer being momentarily entirely occluded by the female dancer, but the footwork is flawless with no body horror, which I don't think any other method can do.
>>
>>109050962
>>109051149
is this sota?
>>
>>109051162
it's good for character replacement.

IDK if it's SOTA for it, never mind other tasks
>>
>>109051162
fuck no, not even close. api surpassed this by miles
>>
>>109051159
the arms also aren't tracked entirely exactly but yes, it's great

hair appearing on teto is a funny quirk
>>
File: Wan21_SCAIL2_00225.mp4 (1.69 MB, 1700x1004)
1.69 MB
1.69 MB MP4
>prompt: she lip bites in the end
>>
>>109051204
Proof?
>>
>>109051159
u need to increase source video FPS to get better result, but that also means shorter video
>>
>>109051226
im using infinvideo
>>
File: 3456735675.webm (3.99 MB, 420x291)
3.99 MB
3.99 MB WEBM
>>
>>109051099
Try the r2v workflow here
https://github.com/amao2001/ganloss-latent-space/tree/main/workflow/2026-06-10%20bernini
The set/get nodes were breaking the outputs for some reason so I got rid of all of those and just connected them manually.
>>
>>109051250
so benis better than scail?
>>
>>109051268
Doesn't look better than scail from my testing but they're both used differently. Bernini is used to edit videos or for image references while scail is for direct video motion transfer.
>>
File: Wan21_SCAIL2_00209_crop.mp4 (2.43 MB, 1700x766)
2.43 MB
2.43 MB MP4
Sometimes it captures facial expression perfectly, sometimes it doesn't
>>
File: debo_ccg_fia_00032_.png (1.32 MB, 1792x977)
1.32 MB PNG
>>
>>109051268
Tried both. SCAIL is the better character tracker. Bernini takes too long and max out at 81 frames. Dual Sampler of High and Low model.

SCAIL2 is based on wan2.1 single sampler and can extend indefinitely.

Maybe you can use Bernini to edit videos, but any major flaw you can just regenerate a new video anyway.
>>
ain't nobody told me if scail can do goon yet
>>
>>109051315
dude, 75% of the videos in these threads now are whores bouncing their poopers
>>
>>109051333
that's not even softcore
>>
>>109051333
>whores bouncing their poopers
vanilla fag
>>
File: clip_Double_00023.webm (3.94 MB, 1760x1168)
3.94 MB
3.94 MB WEBM
Bumping up resolution (unsurprisingly) boosts quality of the gen.
>>109051226
I'll try boosting the framerate, but wan is trained on 16 fps so it might get weird.
>>109051315
It can do goon exactly as well as base WAN 2.1, though I haven't tried adding any loras beyond lightx2v
>>
>>109051366
i dont see his reflection
>>
>>109051366
damn that's smooth
>>
>>109051268
Wrong question.
Bernini is better than VACE.
>>
>>109051366
has no one mentioned the compression like artifacts?
>>
>>109051380
what does the vp have to do with bernini?
>>
File: vampirecover.jpg (108 KB, 1280x720)
108 KB JPG
>>109051375
>>
>>109051366
>>109051290
>>109051216
>>109051159
>>109051102
>>109050962
>>109050734
so what's the use case for this?
>>
>>109051425
goonsex
>>
File: 1755010220514480.jpg (661 KB, 1408x2112)
661 KB JPG
>>
>>109051428
I mean besides goon material for jeets
>>
File: 00228(2).mp4 (3.89 MB, 1760x1168)
3.89 MB
3.89 MB MP4
>>109051366
why don't people interpolate their video? adds only few sec extra gen time
>>
>>109051438
what do you use diffusion for? show us
>>
>>109051439
workflow for interpolation?
>>
>>109051425
idk, it is a cool demonstration
economical feasibility nor practicality is not in my concern
>>
File: Wan21_SCAIL2_00179.mp4 (402 KB, 1000x374)
402 KB
402 KB MP4
>>109051425
for fun
>>
>>109051446
>>
>>109051475
thats an image
>>
File: debo_ccg_fia_00033_.png (2.23 MB, 1792x977)
2.23 MB PNG
>>
>>109051479
r u retarded
>>
>>109051443
book cover and character portraits for my webnovel series
>>109051464
>>109051453
don't you want to make money with this?
>>
>>109051496
benchod
>>
>>109051475
based
>>109051479
cringe
>>
>>109051425
Probably the best way to insert a character into a scene, but you need a reference video so it kind of sucks. Maybe if you use it on 3D animated videos you could create cinematic scenes with it.
>>
>>109051501
>making money
i'd personally be ashamed
quality is getting really close to there but not really 'there' yet
>>
>>109051519
Quality has been usable since wan 2.2. But no one here are artists so they don't know how to actually use any of this outside of gooning.
>>
>>109051501
>my webnovel series
post 1 page
>>
File: 324625676.webm (3.89 MB, 420x291)
3.89 MB
3.89 MB WEBM
trolled!!!!!!!!!
>>
>>109051529
i am an artist and idk how i can use those as-is, really
besides placeholders or brainstorming where those enable me to easily check and dispose many ideas in a clearer form until the vision seems alright
>>
>>109051541
asuka got me actin' like a migrant in berlin
>>
File: scail2.mp4 (3.56 MB, 1056x592)
3.56 MB
3.56 MB MP4
>>109050962
>My SCAIL-2 auto extend now has an improved SAM3 identity tracker which lets you draw bounding boxes around each subject.
great feature, ty!

the actual UI to define the bboxes doesn't seem to quite work right here (e.g. right click to delete a bbox, after some other UI interactions) but could anyhow be better if copied from the one in kj's ideogram prompt builder if that's feasible. tho having the additional point instead of a bbox is very nice.

also good that the v2 workflow has frame interpolation and defaults to saving in a dated subfolder
>>
File: finalconcpt1.jpg (129 KB, 773x1024)
129 KB JPG
>>109051535
>>
>>109051545
>i am an artist and idk how i can use those as-is, really
I've had 10+ projects. Used wan + conversion for simple idle animations few times. Rest is image generation/editing
>>
>>109051566
please tell me your memeing
>>
what happened to tdrussell?
>>
>>109051582
https://huggingface.co/weeblabs
>>
File: clip_Single_00023.mp4 (593 KB, 976x640)
593 KB
593 KB MP4
>>109051439
I was running it before, turned it off since I'm experimenting and just want the gens done as fast as possible.
>>109051549
thanks for the feedback, I'm deep in testing around exactly how the model handles masking and will push an update with those fixes once it's all properly working (if it works)
>>
File: 1767231125681354.jpg (2.8 MB, 3201x3317)
2.8 MB JPG
>>
>>109051608
another bit: purely usability wise it'd also be nice to have the identity tracker node include the object detection thresholds on the reference/driving images for text-based conditionings too (or split it out to the text conditioning and have inputs for one reference and one driving conditioning?)

you'd be able to the text based functionality from the v1 workflow too without rewiring nodes
>>
When are we getting realistic anima?
>>
>anima
outdated slopware. after using ideogram, anima feels like sdxl in comparison. even the basic ideogram nsfw loras on civitai are more powerful than any chromakek or boorubrown slop.
>>
>>109051625
* or just allow feeding it as driving_track_data / ref_track_data like in v1

>>109051632
people already started making loras and finetunes for that, with some success
>>
>>109051632
if you cant into training then wait for anon to bless you with an upload
>>
>>109051654
hi anon can you bless me?
>>
File: 235263765.webm (3.94 MB, 420x291)
3.94 MB
3.94 MB WEBM
based sperg out
>>
>>109051632
>he doesn't know
base model + turbo lora + negpip + klein 9b edit as a refiner, already does near-perfect realism even for nsfw
>>
>>109051714
>negpip
qrd
>>
>>109051720
you can put (some bullshit:-1) in the positive prompt and it acts like a negative, so it can be used on turbo cfg1 models. Even on normal undistilled models it's stronger and better than an actual negative.
>>
>>109051733
This one? https://github.com/BigStationW/ComfyUI-ppm
>>
Need to update the inpainting logic and add more logic of what >>109051714
is saying for better composition. I have a inpainting bug where I can't detect clothes with the florence2 so I need to fix that up. I need better adetailer logic to identify characters to for more facial refinement but.....I could just do a inpaint pass and ignore adetailer all together honestly
>>
>>109051740
yea
>>
>>109051750
thanks mane
>>
>>109051748
what is it written in? also are you using like qwen 3.6 for the chat?
pretty neat
>>
File: clip_Single_00032.mp4 (327 KB, 1216x512)
327 KB
327 KB MP4
>>109051685
>>
>>109051777
This is all in typescript/python
Also I'm using gemma-4-26B-A4B-it for speed, I might use a smaller model once I decide to use heavier models but for testing models this is enough once jail broken.
The dense gemma models are trivial to jailbreak so they won't fuss if I tell these two to start licking clits or some shit. I want to use 31B but I think I'm hitting diminishing returns for these types of task.
>>
File: Wan21 Scail2 00231.mp4 (3.57 MB, 1498x1500)
3.57 MB
3.57 MB MP4
>>
>>109046565
kys fren
>>
>>109051878
you are brown
>>
It seems that Wan2.1 and Wan2.2 LoRa may work with SCAIL2 to some degree. At least the output isn't broken.
>>
>>109051885
proof?
>>
File: debo_ccg_fia_00036_.png (1.78 MB, 1792x977)
1.78 MB PNG
>>
File: clip_Single_00041.mp4 (1.52 MB, 1056x592)
1.52 MB
1.52 MB MP4
>>109051648
>>109051625
Updated the Identity Tracker and pushed a V3 workflow, now with improved character segmentation.
If auto_detect is true, then it's allowed to add more masks beyond what bboxes exist (up to the model limit of 6) using the text conditioning.
That only applies to the reference image, the driving video is capped at the number of masks on the reference image (you can't have five image characters replacing four video characters, for example).
Creating masks via text conditioning is always optional, if it can't find any objects to mask (or all maskable objects are already inside bounding boxes) it won't create any.
>>
File: 3256556.webm (3.83 MB, 420x291)
3.83 MB
3.83 MB WEBM
>she took the forks
rip kenji
>>
File: ComfyUI_01342.jpg (3.18 MB, 1500x2000)
3.18 MB JPG
>over 1300 gens without issue
>launch Comfy
>subgraph cannot be connected to anything else
>nothing has changed and everything is still connected the same way it was since I last closed it
>unpack and repack it
>rewire everything back up exactly as it was before
>it works
Man... I don't need the random skill-check, Comfy. Fix your shit!
>>
>>109051425
Paid shill behavior. Real anons don't run around doing free promo for some model.
>>
>>109052135
>1300 gens without issue
Of Jenny?
>>
>>109052135
My rgrhtree nodes break like 20 times a day, no conflicts mentioned. It doesn't even appear as broken when inside subgraphs either.
>>
>>109052135
>subgraph
Slopper
>>
File: lodestones-zeta-chroma.jpg (583 KB, 1024x1024)
583 KB JPG
Looks like lodestones is back trying to make another pixel t2i fork from chroma called zeta. I suspect more money is being burnt in real time.
>>
>>109048034
How are you doing this on Anima? Is it a specific 2B cosplay lora or a realism finetune?
>>
File: My_WF.png (1.47 MB, 1600x1200)
1.47 MB PNG
>>109052146
Mostly, but a lot of it is just testing. Still hunting for that sweet spot where the LoRAs in use aren't blowing up the latent space (it's a lot of different scaling with all the varying steps, samplers & schedulers involved).

>>109052167
>It doesn't even appear as broken when inside subgraphs either
Yeah, it sucks that there's almost no indication of what's broken and why. I had to go through everything with a fine-tooth comb when I was building my workflow out.

>>109052172
I wanted to better group the settings I frequently touch, sorry for trying to be efficient!
>>
>>109052135
I have had this happen before. Next time it happens, try bypassing the entire subgraph, then unbypassing. No promises, but for whatever reason cycling it like that fixes it for me ~80% of the time without having to renoodle.
>>
>>109052270
I'll definitely keep that in mind, thanks!
>>
Seen people drool over Ideogram 4 on a bunch of places, but I only trust you anons. What is the catch?
>>
>>109052337
nothing but shills here bucko
use anima
>>
>>109052337
> What is the catch?
Requires good video card.
>>
>>109052392
how good? Will 4 GB not work?
>>
>>109047844
Where to get the scail mask nodes?
>SCAIL2ColoredMask
>>
>>109052337
the catch is that local is dead and you should buy claude fable
>b-b-but Fable got shut down
then ai is dead ig
>>
>>109052397
nightly comfyui
>>
File: clip_Single_00052.mp4 (2.79 MB, 1056x592)
2.79 MB
2.79 MB MP4
>>109052337
The time spent setting up the json, even with a prompt generating node and/or the nodes which let you drag and resize the bounding boxes, makes it functionally a very slow model.
You can't just set up your prompt and batch gen, you need to edit each prompt.
If you're willing to do that it's great, and indications so far are that it's very easy to train so plenty of loras will be be arriving, but it's slow even if you're not a promptlet, and if you are it's giga-slow.

Also I found an error in my workflow (the Scail-2 Autoextender one), the image sizes need to be divisible by 32 or you get the top pixels of the image wrapped around to the bottom, like in the attached, so anyone using it should grab the updated version.
>>
>>
>>
ldg approved artist tags?
>>
File: kyouki_comfyui_00001_.png (973 KB, 1152x896)
973 KB PNG
>>
File: 14-03-2026.jpg (951 KB, 1098x1605)
951 KB JPG
>>
File: ninchan_comfyui_00001_.png (796 KB, 1152x896)
796 KB PNG
>>
gov is starting to use png info scrapers so they can scan for bad words used in prompts if they get ur pc
>>
File: Ideogram__00124_.png (2.57 MB, 1552x1552)
2.57 MB PNG
>>
File: file.png (11 KB, 966x66)
11 KB PNG
>>
File: 262464765.webm (3.93 MB, 420x291)
3.93 MB
3.93 MB WEBM
crazy hamburger!!!
https://files.catbox.moe/diilw1.mp4
>>
File: zeta.jpg (517 KB, 2048x1024)
517 KB JPG
>>109052236
Left is from 6 weeks ago. It definitely produces less outright body horror. I guess we will see whether it converges to anything useful. Two more months.
>>
>>109052245
Realism lora I've been working on, does everything
>>
File: 1780467357311093.jpg (464 KB, 1472x1136)
464 KB JPG
>>
File: 1768757731188956.jpg (438 KB, 1472x1136)
438 KB JPG
>>
>>109052544
sovl vs soulless
>>
>>109052392
Alright, so it's slow. Seen ballparks of about 2min on 4090s for 2MP.
>>109052410
JSON prompts look really ass, that's for sure. But then again, it's probably worth it instead of Klein 9b base, since that's what I've used up until now.
>>
>>109049884
why do you use rtx upscale 3x then rtx upscale 2x, then downscale?
>>
File: anima1_00397_.jpg (425 KB, 1152x1632)
425 KB JPG
>>
>>109052615
this is why i got bullied on the way to school
>>
File: harepore_comfyui_00001_.png (877 KB, 1152x896)
877 KB PNG
>>
File: 3456455.webm (3.82 MB, 420x291)
3.82 MB
3.82 MB WEBM
>>
File: 093952CUI_00001_.png (2.06 MB, 1536x1152)
2.06 MB PNG
>>
File: 24625434.webm (3.96 MB, 420x291)
3.96 MB
3.96 MB WEBM
https://files.catbox.moe/yk01xe.mp4
>>
>>
>>109052254
>I wanted to better group the settings I frequently touch, sorry for trying to be efficient!
Using subgraphs is an implicit concession saying that the node design is flawed, that some nodes are bloated and try to do too much, while others are overly narrow or hyper specific and there is no coherent underlying logic governing how they were organized
>>
>>109052478
id4?
>>
>32 notifications on civitiai
>>
>>109052569
more like shit vs piss
>>
>>109052846
>>109052846



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.