[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


T-two VAES Edition

Discussion of Free and Open Source Diffusion Models

Previous: >>108433569

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
>>
>>108440192
thank you for early-baking and including off-topic links in the op, stupid nigger
>>
Blessed thread of frenship
>>
why does everything use a web ui? i thought you wanted to conserve your memory. everyone should use a real graphics api like tkinter
>>
>>108440233
then DON'T USE ONE
>>
>>108440233
because instead of maintaining a version for mac, linux, windows, android, ios, etc, etc, you can just focus on one which works on all platforms.

I think web UI has it's uses, but professional grade programs really need dedicated desktop programs like Blender/Maya/UE/Unreal.
>>
>>108440233
I am skeptical that a few dozen megabyte for the UI will matter when we are dealing with models sized in gigabytes.
>>
>>108440269
tkinter is cross platform
>>108440272
it's the browser itself that is the problem. it will continue caching things as you use the web ui and eventually you have a whole gigabyte of vram being reserved by firefox
>>
Why is nobody talking about the bg CUDA Torch Python dependency update released today in ComfyUI? EasyUse, CRYSTOOLS, inpaint nodes(Krita), RES4LIFE, all broken now.
>>
>>108440355
I don't see it
https://github.com/Comfy-Org/ComfyUI/commits/master
>>
>he updooted
>>
>>108440355
I am not seeing whatever commit you are referencing to.
>>
File: wat.jpg (50 KB, 1053x172)
50 KB
50 KB JPG
Is civitai expecting 2.5?
>>
File: 678.jpg (607 KB, 1024x1024)
607 KB
607 KB JPG
>>
>>108440447
that's been there since it was announced. i'm not sure why they added an API model.
>>
>>108440447
What does this prove? They have other API models like Sora already.
>>108440503
They are getting a small cut from people running API gens there.
>>
File: file.jpg (120 KB, 1128x974)
120 KB
120 KB JPG
>>108440509
there is no sora
>>
>>108440509
sure, but it should be removed from the models section because you cannot make loras with it.
>>
>>108440522
Look at it under search.
>>108440528
Yes but have you considered the fact that it's a slop coded bloated shithole? There are a billion shoulds when it comes to civit.
>>
>>108440540
I dunno, I can download TB's of models at full speed so i can't complain too much. it's definitely more user friendly than using huggingface.
>>
File: 1748387700580187.png (792 KB, 1337x738)
792 KB
792 KB PNG
botiful
>>
>>108440564
im so sick of everything being early access now and requiring buzz.
>>
do people still use "hires fix" (latent upscale) for minor resolution increases or is all straight to dedicated upscalers now?
>>
>>108440575
just give me the buzz sar and the bob is yours
>>
>>108440375
>>108440390
Update Comfy
And the Update Python dependencies
Good luck
>>
>>108440578
For anime latentchads win for everythig else pixeltroons
>>
>>108440355
I don't update torch or python dependencies unless I absolutely need to.
>>
we really should have more anime gens
anime website
>>
>>108440680
it's 2036
>>
>>108440687
and we still make loras
>>
>>108440680
There are already 1million threads for that.
This general is dedicated to local tech and it discussion.
If you want to post anime gens, post them as a screenshot of your workflow and make the workflow the central topic, with your anime gen as a byproduct of that, not the main focus.
>>
>>108440680
By all means anon show us how it's done
>>
>>108440726
Ultra based. Another artist tag comparison or WAI vs Noob slop thread and I'm out. I'm here for technical knowledge to improve my workflow, not subjective taste arguments or AI slop gallery. Anons who can't grasp this are missing the point and are literally troons.
>>
>>108440726
disregard this; it's nonsense.
>>
>>108440603
Why should my dependencies change when requirements.txt hasn't changed?
It's more likely some custom node fucked you over.
>>
>>108440564
you think that's bad, you should check out seaart and tensor - levels of jeetism previously thought impossible
>>
a masterpiece
https://civitai.com/images/124726637
>>
>>108440680
Yeah anon, and what do we do with all these dedicated or semidedicated anime threads?
>>>/b/degen
>>>/aco/sdg
>>>/d/ddg
>>>/g/sdg
>>>/h/hdg
>>>/trash/sdg
>>>/trash/slop
>>>/trash/bwg
>>>/u/udg
>>>/vt/vtai
>>>/wsg/aicg
>>>/vp/napt
>>>/g/adt
>>>/jp/2huai
>>>/bant/bant
>>>/r9k/aiwg
>>>/tg/slop
Do you want to be the next /adt/? And be the Ethiopia or Haiti of anime generals?
17 dedicated or semi dedicated anime difusion generals right now.
There's probably more generals than anons.
>>
File: 03153.png (556 KB, 768x768)
556 KB
556 KB PNG
>>
>>108440803
Holy fuck, with /de3/ that's 18.
I think the worst move anyone can make right now is opening another AI diffusion general at this point.
>>
>>108440818
And the workflow ULTRAFAGGOT?
>>
>>108440803
/vg/aicg and /g/aicg has great anime genners, better than the dedicated diffusion genners desu.
>>
File: 1765431472832420.png (3.34 MB, 4176x1111)
3.34 MB
3.34 MB PNG
>>108435196
>from side, exhausted, armpit focus, reclining
>>108435796
>I think I want to create a UI wrapper for Comfy
its fun and easy with vibecoding. you should do it
>>108440233
anon should just vibecode his own
>>
>>108440803
/vg/aids
>>
>>108440803
i clicked a random one (>>>/g/sdg/) and that doesn't seem to be anime
why do you hate anime anon?
>>
File: 0159.png (467 KB, 768x768)
467 KB
467 KB PNG
>>108440838
the prompt is literally just archer \(fate\) (and this in the negs sketch, score_1, score_2, worst quality, lowres, text, signature, twitter username, patreon username, fanbox username, instagram username, censored, bar censor, mosaic censoring), the real secret is the tranima lora I trained on microslop fluent emoji
>You cannot visit litterbox.catbox.moe right now because the website uses HSTS. Network errors and attacks are usually temporary, so this page will probably work later.
-ACK
https://files.catbox.moe/lrd2kk.safetensors
>>
>>108440803
Crazy how none of these are as good as ldg
>>
>>108440494
>>>/g/dalle
>>
>>108440924
The day Debo commits to genning anime with all his experience, or Lumi they're gonna literally btfo all these anime genners running around acting like hotshots.
>>
>>108440818
>6 fingers
score tags were a mistake
>>
>>108440970
always were
>>
>>108440803
>>>>/jp/2huai/
This one is very interesting.
>>
File: 6366715770.jpg (215 KB, 1536x768)
215 KB
215 KB JPG
>>108440970
the lora makes a lot of bizarre stuff, but thats a single image, not much to conclude from that, here are some other tests with the lora being resized
>>
>>108440988
his abs looks like buttcheeks.
>>
File: 0232776.png (405 KB, 768x768)
405 KB
405 KB PNG
>>108441010
kek
>>
File: 80.jpg (687 KB, 2304x1536)
687 KB
687 KB JPG
>>
So what happens now?
>>
anima full release
>>
>>108441058
stopped caring. noobRF-F2vae is king. feels good having yet another 2 weeks model before the previous botched pony (anima) model is even finished
>>
>>108441121
>noobRF-F2vae is king
never was, never will
>sdxl
>clip TE
>>
>>108441121
kekd
>>
1girl, cowboy shot
>>
>>108440603
meds
>>
>go see anima discussion page
>someone was prompting with "character from series name" instead of "character \(series name\)"
uh.... lol
>>
File: 73768377.png (162 KB, 460x460)
162 KB
162 KB PNG
what taking soo long for this faggot to add control net support for forge neo. Is there another alternative to forge neo/wan2gp that's not comfyui but has decent support. I just want to be able to use control net for qwen image 2512.
>>
>>108441459
i dont know why so many seem to think "can use NLP" means "requires NLP"
>>
>>108441494
>>108441459
in most cases it works either way desu
>>
File: 1755039621856851.png (2.61 MB, 1704x1168)
2.61 MB
2.61 MB PNG
>>
File: deCW_zi_00006_.png (2.58 MB, 1920x1033)
2.58 MB
2.58 MB PNG
>>108440965
i doubt it. real anime genners have the deep anime knowledge to elevate their gens. I'm know an anime knower
>>
File: 1752987448108839.png (2.48 MB, 1168x1704)
2.48 MB
2.48 MB PNG
>>
File: anima.jpg (162 KB, 1024x1024)
162 KB
162 KB JPG
>>108441459
as the other anons already said, this tends to work fine anyhow, and it really generally should with current text encoders
>>
File: 1753936706610093.png (2.92 MB, 1880x1072)
2.92 MB
2.92 MB PNG
>>
>>108440575
>im so sick of everything being early access now and requiring buzz.
I vibecoded a typemonkey script that delete early access shit on search, at least I won't waste my time looking at that jeet shit lol
>>
File: 1751872199360399.png (3.98 MB, 2048x1128)
3.98 MB
3.98 MB PNG
>>108442148
I just use ublock element picker, same for the ad cards
>>
>>108442151
>I just use ublock element picker
I can't anymore, chroma has now Manifest V3 and they killed ublock :(
>>
File: z-image_00652_.png (1.17 MB, 1280x720)
1.17 MB
1.17 MB PNG
>>
>>108442256
so saturated and plastic from Z-image base... now I believe Z-image turbo was a fluke, Alibaba doesn't know how to reproduce the magic again
>>
>>108441478
Your only options are to take what Haoming decides he wants to give you on the whims of Chinese Culture or bend over and sit your ass on the spikes of uncomfy and install malware nodes.
>>
>>108442178
>chrome
become a firefox chad. hopefully they dont follow v3 spec too closely...
>>
>>108442268
35 stars status?
>>
>>108442256
it perfectly captured his rotting hand
>>
>>108442259
is that the only z image output youve ever seen
>>
we're so far behind API video models it's laughable >>>/wsg/6115324
>>
>>108442304
Notice how I didn't say "use Ani's malware" anywhere in that post?
>>
>>108442178
literally use any other browser. even edge will do. nothing chad about using broken, bloated and slow piece of shit like firefox obviously but even that is better than using pure chrome.
>>
>>108442360
*posts insanely broken LTX gen with messed up sound*
you were saying? with local you can do anything you want, you are only limited by your skill
>>
>>108442383
kek
>>
>>108442383
it's going to get much worse now that AI is on hollywoods radar. every closed model will get the seedance treatment.
>>
>>108442477
The so called ""based China"" bent the knee in front of Jewllywood, sad.
>>
ltx is based doe. making tons of kinos right now
>>
>>108442487
yup, whats the future? closed models all trained on generated videos with no recognizable celebrity or character likeness.
and hollywood are already using ai, so all the major studios will have their own proprietary models in the near future, which will make them even more litigious.
>>
File: chinese ai guy tweet.png (23 KB, 583x179)
23 KB
23 KB PNG
>One hour later no ones posts Chinaman insider's announcement
Dead troon general
>>
>>108442636
i expect it to be the same old slop with purged copyright/nsfw/anime
>>
>>108442636
>literally NEW THING SOON without any preview
wow so excited!!!!!!!!!!!!!
>>
>>108442636
>only an image model
meh, the future is unified image/edit model like Klein or Qwen Image 2.0
>>
>>108442655
yeah... I can't be excited by those announcements anymore, it's always the same thing, incremental improvement + souless boring outputs void of any cool IP or styles in it
>>
>>108442636
I really hope they'll ditch the VAE at this point, the future is pixel only, come on guys, show us that you're pretending to improve the architecture at least
>>
>>108442680
gotta convert the latent to pixels somehow.
>>
>>108442636
Z illustrious animetune
>>
>>108442655
Zit had some copyrighted stuff in it.
NSFW being purged is given though.
>>108442668
The post says nothing about edit capability or the lack of it.
>>108442680
Calm down lodestone.
>>108442700
They ought to do something with the dataset they got from the Noob team, no?
>>
File: end of march.png (252 KB, 1483x510)
252 KB
252 KB PNG
>>108442636
it's gonna be Qwen Image 2.0
>>
>>108442705
>NSFW being purged is given though.
not really, that 18b russian model and HunyuanVideo had no problem making genitals and even some sexual positions, at some point you have some balls or you don't, and so far only Tencent and some random Spootniks have those
>>
>>108442707
that would cause quite the meltie, fingers crossed.
>>
File: 1751696208977079.png (2.04 MB, 1233x1682)
2.04 MB
2.04 MB PNG
>>108442692
>gotta convert the latent to pixels somehow.
no, you can go full pixel, lodestone showed us the way, if they want to be serious on edit models they have to go for lossless edits, and only pixel only can do that
https://xcancel.com/LodestoneRock/status/2034117781776699784#m
>>
>>108442723
>showed us the way
By making not one but two schizobake failbake pixel-space meme models that never converged into anything meaningful during training and output blurry broken garbage full of artifacts and errors?
>>108442713
I think NSFW is possible when it comes to anime models but I have no expectation that any company is releasing NSFW realism model anytime soon.
>>
File: ComfyUI_17607.png (3.31 MB, 1500x2000)
3.31 MB
3.31 MB PNG
>>108442636
There's nothing there aside from a vague "announcement"... forgive me if I don't cream my pants.
>>
>>108442723
what's the catch?
>>
>>108442744
>By making not one but two schizobake failbake pixel-space meme models that never converged into anything meaningful during training and output blurry broken garbage full of artifacts and errors?
yes? is it the first time you learned about research, look at the output of diffusions models when it was created in 2014, it was absolutely terrible, you always start with something atrocious and then you improve on that shit, lodestone showed us that it is possible to make something with it, now I expect rich ass companies to follow up on that, we can't go for VAE forever, especially with edit, this shit is lossy and makes the image more and more compressed with more and more edits
>>
>>108442748
>what's the catch?
it's a new technology so for the moment not a lot of people are good at it
>>
>>108442723
wouldn't you basically need to invent a new diffusion process? latent space is blurry math slop that gets decoded into pixels.
>>
>>108442764
no i mean the technical drawbacks. having a full representation of the image ran through the model can't be cheap
>>
File: 1750415583196012.png (368 KB, 694x854)
368 KB
368 KB PNG
>>108442766
>wouldn't you basically need to invent a new diffusion process?
you basically temove the latent conversion during the training process, and replace that with a full pixel only training
https://github.com/LTH14/JiT
>>
>>108442746
>There's nothing there aside from a vague "announcement"... forgive me if I don't cream my pants.
yeah, usually when a model is good, he always spams images, like when he did on Z-image turbo
>>
File: Zeta_Chroma.jpg (125 KB, 720x1280)
125 KB
125 KB JPG
>>
File: Zeta_Chroma.jpg (153 KB, 1280x1280)
153 KB
153 KB JPG
>>
File: 1_00004_.jpg (2.5 MB, 2616x3556)
2.5 MB
2.5 MB JPG
>>108442707
is it gonna be good or slop?
>>108442705
>Zit had some copyrighted stuff in it.
really?
like what? is there somewhere a list or something of people and characters and stuff that z image turbo/base knows?
>>
>>108442723
congrats anon, you summoned the pixel schizo >>108442789
>>108442794
>>
>>108442758
having it pass through a vae sucks but i think its more a problem that the model has to spit out both your edit request and the rest of the image unchanged, which i bet they struggle with
>>
>>108442789
>>108442794
can you try some realistic images anon, I wanna know how good it is at that
>>
>>108442776
how does it remove laten conversion? the work needs to be done in latent space and then converted into an image. does this remove the noise/denoise process?
calling "encode/decode" "embed/predict" doesn't really change how current diffusion works.
not being argumentative mind you, i generally believe vae is the biggest bottleneck with this technology. but it seems like a necessary evil.
>>
>>108442819
>the work needs to be done in latent space and then converted into an image.
not anymore, now they just cut the image into small pieces and train that shit separately, look the paper it's really clever shit
>does this remove the noise/denoise process?
no, this process is independant from VAEs or pixel shit
>>
>>108442758
I am not saying that it wouldn't be beneficial if they moved on to a better technology without the current drawbacks. I am disputing that lodestone proved that it, it as in good image generation, not body horror slop factory, can be done with his methods, as his models are in a completely atrocious state and the results of the training loop plateaued. I am skeptical that "just throw more compute at it with corporate bucks" will result in good models.
>>108442785
If it's already available on the API like Qwen 2 maybe he doesn't want to do that because people would recognize the model.
Yes, copium, I know.
>>108442789
>blurry
>heavy jpeg compression like artifacts
>fingers molten together
>eyes looking at different directions
>Nonsensical background that makes the character look like a giant
>Literally worse than SD 1.5
Imagine defending or even funding this shit.
>>
File: Zeta_Chroma.jpg (256 KB, 1280x1280)
256 KB
256 KB JPG
>>108442816
not yet all that great, it's in the training materials tho
>>
File: Zeta_Chroma.jpg (484 KB, 1536x1536)
484 KB
484 KB JPG
>>
>>108442851
that's rough, but I'm not seeing the patches seams anymore, looks like he improved the method, I'm sure this shit has a lot of potential, and if it can reach VAEs level, we'll be eating good because it seems like it's "3-4x faster" >>108442723
>>
>>108442827
>I am disputing that lodestone proved that it, it as in good image generation, not body horror slop factory, can be done with his methods
to be fair what Lodestone is doing is asking a model that went through millions of hours of VAE training to forget all of that and then learn the pixel only way, it'll look better if the model was trained with the pixel only method from scratch
>>
File: Zeta_Chroma.jpg (231 KB, 1536x1536)
231 KB
231 KB JPG
>>108442855
it is rough, yes
>>
>>108442863
really should have done a small model and prove that it can look good before shitting away 100k+ on fat ass ones
>>
>>108442636
>New new image model
I guess he wanted to say "local" instead of saying "new" a second time right? lol
>>
File: ComfyUI_04855_.png (1.03 MB, 832x1216)
1.03 MB
1.03 MB PNG
>>108442795
None that I know of. It can do some Nintendo stuff like Animal Crossing style, Mario, recognizes well known capeshit characters, etc. It can do "ok" likeness of some handful of celebrities too, though I don't have a list for that neither.
Pic related is some months old zit slop. Prompt only says "Asuka from Evangelion" about the character and no additional description is given.
>>
>>108442873
how good is it at small details? can it properly form a hand at a distance?
>>
File: 1768581790811853.jpg (1.84 MB, 2560x3072)
1.84 MB
1.84 MB JPG
>>108442878
>really should have done a small model and prove that it can look good
the paper did that though
https://arxiv.org/abs/2511.13720
>>
>>108442890
yeah thats nice but no model weights
>>
File: Zeta_Chroma.jpg (199 KB, 1536x1536)
199 KB
199 KB JPG
>>108442888
hands certainly aren't always correct (as usual, heh), but generally speaking i do think it's pretty good at learning fairly small details
>>
>>108442890
oh nevermind found it
>>
https://xcancel.com/bdsqlsz/status/2036377318655205725#m
ok, who's trying this to see if it works and makes shit faster?
>>
>>108442914
>i do think it's pretty good at learning fairly small details
I think it's gonna be its strength, since it can see all the pixels now, and not just a compressed alternative space
>>
>>108442878
Why doesn't he use something small like SD 1.5 as proof of concept for these schizo experiments before sinking an insane money into failbakes with larger models?
Small model + max 512p training should make the proof of concepts cheap enough before sinking fortune into the real deal if he wants to.
I am sure there are new and small enough alternatives if he thinks architecture is too old as well.
>>108442923
Unlikely to bring instantly noticeable improvements. I will check it out in a few months when everything supports torch2.11
>>
>>108442955
Also FA-4 doesn't work on consumer cards anyway.
>>
File: Zeta_Chroma.jpg (394 KB, 1536x1536)
394 KB
394 KB JPG
>>108442926
I didn't think for example qwen was doing badly either, but yes I think these pixel space (even those with some optimization tricks, it's obvious that full pixel space models would be great at it) should be strong at this too
>>
File: Zeta_Chroma.jpg (293 KB, 1536x1536)
293 KB
293 KB JPG
>>108442955
he ran many smaller experiments but you can't actually find out everything you need to know from smaller parameter counts and resolutions

and then AFAIK he ran his training on like 8 GPU. not 8 computer racks, 8 GPU. super cheap by model training standards. surely you haven't missed how much ClosedAI wants to burn in the next two years or so?
>>
>>108442636
why surprised, it's just a random twitter account
>>
>>108443056
>random
that dude is frequently invited by Alibaba to see the new shining models announcements though
https://xcancel.com/bdsqlsz/status/2035619960589042159#m
>>
>>108443056
its not random. that person has insider access and what they post is legit, even if vague.

that being said, i do not care for qwen image 2.
>>
Making WAN 2.6 local would be the win we need. LTXV is just trash and not worth using.
>>
>>108443080
>i do not care for qwen image 2.
why? for once it's a pretty small model (7b)
>>
>>108443092
whats wrong? post an example
>>
>>108440192
>>
>>108443092
I enjoy ltx2.x but I'd be happy with more video models
seems like any new wan release is dead though, won't happen
>>
>>108443112
didn't they just announce a new wan and double down on their commitment to open source?
>>
File: 1751263075442551.png (629 KB, 1212x1020)
629 KB
629 KB PNG
>>108443112
>seems like any new wan release is dead though, won't happen
but Modelscope (Alibaba's site) said there will be new local wan models!
https://xcancel.com/ModelScope2022/status/2035652120729563290#m
>>
>>108443119
>>108443118
all I see is that their latest video model is closed weights, whatever they say
>>
>>108443112
We'll never see a wan flagship model again, but they could open their old ones. It's a shitty practice but I wouldn't turn it down
>>
>>108443128
oh I'd be perfectly fine if they open source the old version each time they release a new model, it's a good compromise
>>
>>108443128
>It's a shitty practice but I wouldn't turn it down
same, I don't mind that they keep their best toys for themselves and make money from it, but their old models are now deprecated so I don't see any reason to not release it locally and get some good boi points
>>
>>108443127
closed models are shit.
people who are willing to pay for it need flexibility. they will end up using open source models and spending their money on training.
most of the people who use closed models are jeets who just bounce around different models using free credits.
>>
>>108443154
>closed models are shit.
Seedance 2.0 is goated lol
>>>/wsg/6115340
>>>/wsg/6115333
>>
>>108443167
sure it is, that is a little too closed isn't it?
maybe in a few months, or next year they will release it.
just need to remove half of the training and strengthen the guardrails lol
>>
>>108443167
none of them can do porn so who cares. SFW shit will never be relevant outside dumb memes.
>>
File: HEGxvlsaYAAzCnF.jpg (455 KB, 1616x2024)
455 KB
455 KB JPG
>>
File: 1757289830107373.mp4 (551 KB, 576x1024)
551 KB
551 KB MP4
We, AI bros, are saving lives!
>>
Does z image do image edits? if not what's a good alternative?
>>
File: HEI9L-rawAAhaKG.jpg (207 KB, 1755x2048)
207 KB
207 KB JPG
Can this style be replicated without loras in any current image model?
>>
>>108443265
no, not even the best API models know how to copy styles
>>
>>108443265
yes
>>
Babe, wake up, a new video model got released (it's a 15b model)
https://huggingface.co/GAIR/daVinci-MagiHuman
https://github.com/GAIR-NLP/daVinci-MagiHuman/

https://files.catbox.moe/quibth.mp4
>>
>>108443287
>slighty bigger than Wan
>Apache 2.0
>has sound
are we saved from the jews?
>>
>>108443287
closed source can't stop racking up the Ls
>>
>>108443190
did they say how much memory it takes? i don't think anyone here can run it
>>
>>108443315
the only people who can run it are the chinks that had access to it before they got cucked by paramount. sd 2.0 is dead in the water.
even if they release it, the second someone gens "ryan gosling doing kung fu" they will get hit with another cease and desist.
>>
File: 42567878.png (1.24 MB, 727x596)
1.24 MB
1.24 MB PNG
>>108443287
>>
File: 1765708721193054.png (173 KB, 1974x1435)
173 KB
173 KB PNG
>>108443287
>the model has only been trained on 256p, then they use an upscaler
nothingburger
>>
Why even use local models anymore when Nanobanana+Grok exist now?
>>
File: whyyy.png (703 KB, 1280x719)
703 KB
703 KB PNG
>>108443349
Wtf? this is so fucking retarded.
>>
>>108443358
what's wrong? the example video looks good
>>
>>108443366
they're just moving their mouth anon, there's a reason why they went for those ultra static shots
>>
Why do all the cfg_pp samplers break on anima? Only euler_cfg_pp works >:(
>>
>>108443349
and I thought I would have a good day ahead of me me, that's a shame...
>>
>>108443371
retard
>>
>>108443380
why are you being rude to him
>>
>>108443370
do you really need more?
>>
File: kek.png (223 KB, 400x400)
223 KB
223 KB PNG
>>108443349
>the model has only been trained on 256p
they're learning from the bests
>>
>>108443287
>2s to generate 5s video
HOLY
comfy support when?>>>?????????
>>
>>108443370
There are dance videos on their github and yeah they don't look the best.
>>
>>108443287
niiic-
>>108443349
aaaaand it's garbage
>>
File: looks bad.png (2.08 MB, 2931x1570)
2.08 MB
2.08 MB PNG
>>108443287
https://huggingface.co/spaces/SII-GAIR/daVinci-MagiHuman
retarded ass demo, that's not what I wanted lol
https://files.catbox.moe/vmghcd.mp4
>>
>>108443394
*from the beasts
ftfy
>>
File: file.png (1.55 MB, 1800x1800)
1.55 MB
1.55 MB PNG
>>108443287
>>108443349
ugh... I believed we would be saved for a second
>>
>>108443402
>prompt rewriter
>>
>>108443417
you can't get rid of that shit on the demo unfortunately
>>
>>108443402
lmfaoooooooo what a useless fucking tool
>>
>>108443419
edit the script then
>>
File: FUCK.gif (2.03 MB, 374x317)
2.03 MB
2.03 MB GIF
>>108443349
2026 is such a shit year for local AI, nothing good is happening
>>
File: gewqge.png (96 KB, 639x607)
96 KB
96 KB PNG
why are you complaining? it has the best audio generation i've heard so far
>>
>>108443456
>it has the best audio generation i've heard so far
audio is not everything, try to make videos on 256p with Wan or LTX and see how putrid the results are
>>
>>108443467
it looks pretty good, and it should be easy as fuck to train.
>>
>>108443349
This is such a waste of money, are investors fucking retarded? Why would you give money to such a project in the first place? Are they expecting us to finish the job and train the model on high resolutions or something?
>>
File: file.png (61 KB, 982x256)
61 KB
61 KB PNG
>>108443402
amazing prompt enhancement sirs
>>
>>108443473
>the jeet gets impressed by ultra static shot of people just opening their mouths
retard
>>
>>108443478
plz understand saar, our LLM has also been trained on 256 tokens
>>
>>108443480
ok sweetie, back to your seedance cope.
>>
>>108443478
Chuu~ ;)
>>
>>108443349
>downscale your image input to the point all the details are destroyed
>ask the model to make anything decent on a resolution even youtube circa 2006 would laugh at
>upscale with an AI and hallucinate the details
WHAT A GREAT IDEA
>>
>>108442636
I am dooming but it will probably be that chn-noob garbage.
>>
>>108443349
when will the upscale meme will fucking end? even the LTX fags tried to force this shit on us, but they weren't retarded enough to not train the model on high resolution and give us the choice to make videos on any resolution we want
>>
>>108443349
>>108443508
I'm tired boss...
>>
File: kino-alert.gif (577 KB, 498x498)
577 KB
577 KB GIF
>>108443287
https://files.catbox.moe/0ioe8h.mp4
>single shot 2 minute generation
>12gb card
>only degrades at the end
kino alert
>>
File: MARGE.png (35 KB, 600x800)
35 KB
35 KB PNG
>>108443512
Oh someone on the xitter replies claims it's this:
https://github.com/huggingface/diffusers/pull/13317
Probably a nothingburger, but 2B active 17B parameter model MoE looks interesting architecturally at least.
How would that work though? Different active params at different steps/sigmas? Based on conditioning?
>>
>>108443530
why is it so noisy? lol
>>
>>108443535
>but 2B active 17B parameter model MoE looks interesting
that looks intringuing yeah, but MoE models are known to be worse than dense models, I don't see why they didn't go for a dense 17b model, we can run that, a lot of people run Qwen Image for example
>>
>>108443530
The audio wasn't generated, was it? If it wasn't, what's the track? If it was, I really liked it.
>>
>>108443530
>kino alert
their hands are picassoesque, even HunyuanVideo had better hands lmao
>>
>>108443538
it's called kino sparkles
>>
>>108443566
no wonder local is dead. you would not be satisfied with the riches of heaven
>>
File: 1770605396099604.png (362 KB, 1144x1315)
362 KB
362 KB PNG
>>108443535
>it has CFG
cool, sick and tired of distilled shit, let me have my negative prompts
>>
File: sure anon.png (2.11 MB, 1916x945)
2.11 MB
2.11 MB PNG
>>108443582
>why are you not hyped by a model worse than a 2024 model??? why can't you stop asking questions and consoom product and then get excited for next product?? shinny toy = good!
jeez I wonder anon
>>
>>108443563
it was generated. i believe it is noisy because the audio and frames are part of the same latent space
>>
>>108443589
it's a free open source model, no one is going to force you to use it.
>>
>>108443595
>it was generated.
wait, really? this is fucking good the fuck? it could be used as a music model lmao
>>
>>108443530
it's t2v right? what was the resolution you went for? did you try for high resolution? I guess that won't work right?
>>
>>108443530
I can't see the video wtf
>>
>>108443595
God fucking damn. The video is fine, even if noisy, but I'm much more impressed by the audio. Reminds me of siuxie and the banshees.
>>
>>108443530
thise video model can make musics as good as suno or udio? what the actual fuck lmaooo
>>
>>108443613
>it's t2v right
yes. you can't choose the resolution, you have to upscale it afterwards
>>
>>108443617
here anon >>>/wsg/6115434
>>
>>108443634
what was the prompt?
>>
File: 1770646622385904.png (385 KB, 625x472)
385 KB
385 KB PNG
>>108443530
the video is garbage but the music, damn, unironically this is now the best local music model kek
>>
>>108441811
It's not that hard, at least in 4chan, you need an anime insider wildcard with some characters and artists. The quality of anime gens here is the lowest tier, basically on par with cloud services like Civitai, Tensor, or SeaArt.
>>
>>108443555
They are meant to balance quality and speed. A 2BA-17B model has less quality than 17B dense, but still higher than a 2B one. And as long as you can fit it into your (V)Ram, the speed is comparable to the 2B one.
So that say, unlike Qwen, you can run it fast without a step distill lora, retaining things like cfg and negative prompt.
>>
>>108443634
>you can't choose the resolution
what did you use? the model can readly sput out 2 mn of video just like that? seems sus
>>
I am still convinced that Chroma is the peak of local models and we've been going downhill from there.
>>
>>108443530
did you make a music with udio and then did some audio to video? or is the audio purely from the model itself?
>>
>>108443530
I call it bullshit, the audio quality cannot be that good, with just some simple speech you can already hear issues
https://streamable.com/vrikck
>>
>108443695
Don't you have dog penises to gen, kekstone?
>>
>>108443287
Can it make coom?
>>
>>108443530
>>108443713
>I call it bullshit
that video model uses Stable Audio to make the sound, so yeah... unless we missed the part that Stable Audio is udio tier, then it's probably a fake
>>
>>108443720
>Can it make coom?
as someone who's old enough to have coomed on TV channels with pixelated censorship, I would have trouble to cool on a 256p video lool >>108443349
>>
>>108443530
Is it possible to continue audio or create new audio for an existing video like with LTX?
>>
File: S U S.png (85 KB, 168x300)
85 KB
85 KB PNG
>>108443634
>it's t2v right
>yes
the model can only do i2v though
>>
Coomslop style? Yeah. Flat colors, no shading, white background, the colors probably will wash out and bleed.
>>
File: plz.png (420 KB, 629x500)
420 KB
420 KB PNG
>>108443530
who believes this?
>>
>>
File: 1766751773544196.png (68 KB, 220x195)
68 KB
68 KB PNG
https://xcancel.com/Ali_TongyiLab/status/2036376985187000385#m
sounds like ass, give us Z-image edit already!
>>
>>108443530
lol, even seedance 2.0 doesn't have that sound quality >>>/wsg/6115441
>>
>>108443818
I can't even test the demo because it requires money. I'm curious how it compares to MMAudio.
>>
>>108440680
Why are you so hellbent on killing /adt/ Julien? What's even the point of /adt/ if you are posting anime here too? At least let us have something for ourselves
>>
>>108443530
seems kind of bad..
https://files.catbox.moe/nuyuti.mp4
>>
>>108443902
nevermind, finally worked. sounds like distorted shit. tried two different simple prompts. both garbage. im glad i tried the demo before wasting time downloading it.
>>
>>108443948
looks like she has the down syndrome lol
>>
does any trainer supports training anima text embeddings? has anyone tried?
>>
Reminder sub 1GB model will be the norm by the end of the year.
>>
>>108444071
any facts at all to back up that incredible assertion?
>>
>>108443695

please spoonfeed me a competent chroma workflow, I just want to goon and all of these new models are letting me down :(
>>
>>108444067
You should ask in e/edg/ they are lora makers, if you want more male centered loras in /h/
>>
>>108444139
sounds gay
>>
>>108444139
>if you want more male centered loras in /h/
lol
>>
>>108444118
don't go chroma, it's a mess. (or spend the next three months trying to stabilize the output with a limp dick in your hand.) HERE'S YOUR DANGERHAIR TRANNY/FATSO
>>
>>108444169
would
>>
>>108443939
rent free
>>
File: 1750753041985568.png (3.92 MB, 1328x1640)
3.92 MB
3.92 MB PNG
>muh namefags
end your life~ :)
>>
>>108443794


https://files.catbox.moe/xgy2mr.png
>>
File: screenshot.1774362009.jpg (283 KB, 1441x558)
283 KB
283 KB JPG
>>108444118
SPARK.Chroma_preview + t5xxl_fp16(chroma type)
35 steps
res_2s/beta57
Chroma GOONTUNE lora
Reakaaka's enhancer lora
pussy/anus/penis/hand/skin bbox detailers
8 step hires fix
>>
>>108444067
Not that I know of
>>
>>
>>
>>108444223
I though Chroma was so good that it didn't need things like LoRAs?
>>
Just in case.
>>108443846
>https://github.com/BerriAI/litellm/issues/24512
>hope nobody here was using this
>>
>>108443939
Julien is the retard trying to blame /adt/ for all the shiposting ITT, though (since they also rejected xir dogshit wrapper)
>>
>>108444271
>llm
wrong thread anon
>>
>>108444264
Who told you that? Any model benefits from loras. It helps reduces less body horror. Using Chroma again made me realize how fucking stiff ZIT/Klein are. Can't do shit with those models.
>>
>>108444284
>>108444185
>>
>>108444245

https://files.catbox.moe/igkikm.png

Yeah chief I'm gonna need some info or a catbox on these O-rod gens.

>>108444223

I made the goontune lora and I always felt like people were much better at using it than I was LOL. that workflow looks amazing, having the hardest time building something in coomfy with detailers from scratch.
>>
>>108444293
I know, but still. It's used as a dependency for other projects and there's crossover between the threads. I just wanted to make /ldg/ anons aware if they weren't.
>>
>>108444302
yeah, buddy, so rent free it only took you less than 2 minutes to reply
you are such a raped retard
>>
>>108444318
meds
>>
>>108444300
>Who told you that?
Chroma users.
>>
>>108444315
fair enough
https://www.youtube.com/watch?v=_3X2tRIYHdE
>>
>>108444223
35 steps with a slow af res_2s sampler and you still need a 2nd pass on everything, come on man gimme a break. (and if those two images below are supposed to be the hot shit this workflow can output, lol. at least tinker with the eye inpaint part, they both have a serious case of strabismus)
>>
>>108444200
cum is like a giant soap in her mouth
>>
chroma has a big problem with anal sex. it seems it tends to associate it with gay sex and gives the girl a penis 30% of the time regardless of the prompt or negatives. really annoying because the gen is still good so I have to use klein9b to remove it.
>>
>>108444352
>https://files.catbox.moe/xgy2mr.png


The man was keeping up on his zinc intake
>>
>>108444362
geg
>>
>>108444362
>chroma has a big problem with anal sex. it seems it tends to associate it with gay sex
that's because that dog fucker has probably put 90% of gay furry shit on his dataset
>>
>>108444394
and thats a good thing
>>
>>108444400
faggot
>>
>>108442178
Ublock Origin Lite has an element picker nowadays. Not sure if it's fully equivalent to the old one, but having it is better than not.
>>
File: _AnimaPreview_01632_.jpg (160 KB, 896x1152)
160 KB
160 KB JPG
>>
Are we locally diffusing? Anon?
>>
i'm diffoooosing
>>
File: _AnimaPreview_01736_.jpg (346 KB, 1216x832)
346 KB
346 KB JPG
>>
>>108444362
>chroma has a big problem with anal sex. it seems it tends to associate it with gay sex
why would you fuck an asshole when a nice pussy is right there it is gay
>>
File: your license sir.png (523 KB, 896x1152)
523 KB
523 KB PNG
>>108444400
>>
>>108444329
>Chroma users.
We did not.
>>
>>108444441
>>108444528
jak lora?
>>108444530
>Initial tightness
>I do not want to give pleasure to a w-man
>Taboo thing is exciting
>Symbolic value as a sign of advanced relationship
>No odds of preggers
>>
>>108444528
0.o
>>
>>108444575
>I do not want to give pleasure to a w-man
do you think they don't enjoy that? kek
>>
>>108444575
>jak lora?
no, anima knows jaks OOTB
>>
File: _AnimaPreview_01692_.jpg (264 KB, 896x1152)
264 KB
264 KB JPG
>>108444594
>no, anima knows jaks OOTB
nah, had to train lora
>>
>>108443650
>>108443749
>>108443764
>>108443818
You could've easily generated these reaction images. Faggot.
>>
Fresh when ready

>>108444685
>>108444685
>>108444685
>>
>>108444674
it's more fun to add reaction images and see you having a meltie desu



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.