[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


Discussion of Free and Open Source Text-to-Image/Video Models and UI

Prev: >>106703056

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/tdrussell/diffusion-pipe

>WanX
https://comfyanonymous.github.io/ComfyUI_examples/wan22/
https://github.com/Wan-Video

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
Training: https://rentry.org/mvu52t46

>Neta Lumina
https://huggingface.co/neta-art/Neta-Lumina
https://civitai.com/models/1790792?modelVersionId=2203741
https://neta-lumina-style.tz03.xyz/

>Illustrious
1girl and Beyond: https://rentry.org/comfyui_guide_1girl
Tag Explorer: https://tagexplorer.github.io/

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage

>Neighbours
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
>>
File: 1755413118895148.png (75 KB, 745x368)
75 KB
75 KB PNG
>>106706459
>>style change is harder in new qie+
>that's what happens when you want to save a model with finetunes, at some point you're trying too hard and the model starts to lose some of its concept, that's why the pretraining is always the most important part, if the base model is too weak, it's already over
chat is it true?
>>
>>106706502
2nd for MORE MOTOKO!
>>
File: 1757735398482959.png (2.51 MB, 1328x1552)
2.51 MB
2.51 MB PNG
>>106706512
you get a free lain instead
>>106706509
yes, im back to shitposting with old QIE desu
>>
>>106706509
yes. go a few threads back in the archive and you'll see anons confirming it with gens.
tl:dr: it understands some concepts/objects now but lost art understanding.
>>
>>106706509
it's all right, Tencent will save us with their own edit model that'll be released in a month
https://youtu.be/DJiMZM5kXFc?t=18
>>
Reminder to use the v2 of the lightning lora for qwen edit. Retains original quality and style better for the overall image.
>>
File: 1727468334517808.mp4 (3.99 MB, 1920x1080)
3.99 MB
3.99 MB MP4
https://xcancel.com/Alibaba_Wan/status/1971485743194484880#m
lmao, Wan 2.5 can edit images, and I'm sure that one is way better than the gigaslopped QIE shit
>>
>>106706583
it fucks up the edit capabilities thoughever bait
>>106706608
100% api only, sad
>>
again
anon, do you know these models and can you share a comfy workflows?
https://huggingface.co/ShinoharaHare/Waifu-Inpaint-XL
https://huggingface.co/ShinoharaHare/Waifu-Colorize-XL
>>
Do I have to run sage attention nodes for wan 2.2 workflows or can I just use a command line?
>>
>>106706625
its illustrious based so just check the 1girl guide in the op. youre welcome retard
>>
>thoughever
put your trip back on

>>106706608
why call it a video model at that point.
>>
>>106706633
for wan 2.2 just a command line is enough
>>
>>106706633
I find nodes to be better, so I can switch on the fly
>>
>>106706633
keep sage off. it's not worth the 5% speed increase.

>>106706635
they aren't simple models like that, retard.
>>
>>106706623
>100% api only
only for preview
>>
>>106706644
Great.

>>106706646
>>106706654
Why switch? Isn't it purely performance?
>>
>>106706608
look at 39 sec, there's 0 zoom in issue, the image stays the same and just changes the cat, kek they're really fucking with us and they're only releasing locally their failed scraps
>>
>>106706658
no, flash attention is lossless, sage attention is NOT lossless
>>
>>106706656
>being that high on copium
>>
>>106706658
anedotal and i don't care if you believe me but turning it off had a positive impact on video gens.

barely matters for sdxl but qwen also requires it to be off so i just keep it off all the time now.
>>
>he fell for the "it's going to be cloud only" bot brigade
>>
File: 1755170109268766.jpg (395 KB, 1428x1922)
395 KB
395 KB JPG
https://xcancel.com/bdsqlsz/status/1971448657011728480#m
>4x qwen image
it means a 80b model, and it's consistent to what that chink is saying in that video (80b)
https://youtu.be/DJiMZM5kXFc?t=204
it's over...
>>
File: 1740381669519705.png (209 KB, 640x639)
209 KB
209 KB PNG
>>106706693
so you're telling me they need a 80b to output this slop? lmao, China has lost the plot, instead of going for a more quality dataset training untainted by synthetic slop, they went the API route, MOAR LAYERS
>>
>>106706635
ty it was helpful
but call it maximized laziness :>
>>
>gm
>>
>>106706662
>>106706676
Now I have to try it.

I just tried running q8 in high noise and fp16 in low noise. Didn't see much of a difference. Strange.
>>
>>106706668
https://youtu.be/IhH7gDDPC4w?t=50m58s
>>
>>106706763
at no point he said that they'll release the complete model
>then we will complete next version, wan 2.5 without the preview
he said "complete" not "release"
>>
File: lmao.png (609 KB, 640x640)
609 KB
609 KB PNG
>>106706693
>80b for this
https://www.reddit.com/r/StableDiffusion/comments/1nqm5l0/images_from_the_huge_apple_model_allegedly/
>>
so anon, what kind of stuff have you been making with the edit models?
surely only wholesome memes that are family friendly, right?

> makes everyone pregnant
ANON STOP WHAT ARE YOU DOING
>>
>>106706608
>Alibaba have their own good edit model and won't release it
>>106706693
>Tencent went for LayersMaxxing and their shit still looks like pure slop
lol it's so over dude
>>
>>106706801
Hello, /adt/ repost
Impressed by the new NetaYume v3, in my opinion it's on par with current SaaS models in the anime field. Same prompt, but for copyright reasons I used character traits instead of names as shown in pic related.

I would like to do this with Chroma anime checkpoint. Does anyone have a ready workflow to import and test?

My Workflow: https://files.catbox.moe/84cdwx.png
>>
>>106706853
Making aliexpress-tier pics for online stores
>>
>>106706484
>CumshillUI still in the OP
>>
>>106706891
that sounds tedious. i hope you're getting that bag anon.
>>
File: 10000 dollars!.png (856 KB, 1977x1442)
856 KB
856 KB PNG
>>106706693
So you need a RTX PRO 6000 (96 gb) to run this shit on Q8? kek, what's the point of releasing this shit at all?
>>
>>106706880
sooooooo not local?
>>
>>106706693
Will this be the biggest local image/video model ever? If I remember correctly, the biggest one before that was step video (30b).
>>
Nvidia is capable of turning their gpus vram plug'n'play with upgradeable and affordable vram. But they won't do it because they have a monopoly and won't profit as much from it.
>>
File: 1752767022929126.png (2.18 MB, 1728x1344)
2.18 MB
2.18 MB PNG
>>
>>106706788
Well, if this can be effectively quantizised without losing too much quality and it trains well, it could still have good adoption, but those are really big IFs

Even with good quality quantization now being available, Qwen adoption has clearly been hampered by its size and slow generation
>>
File: file.png (1.76 MB, 896x1152)
1.76 MB
1.76 MB PNG
>120s to generate an image
how do fluxlets do it? it seems a terribly inefficient way to iterate.
>>
>>106706919
calling this a gaming gpu is insane work.
>>
>>106706943
>Well, if this can be effectively quantizised without losing too much quality
if you want to run this on a 24gb vram card, you'll have to do a Q2 quant, and this shit is unusable
>>
>>106706920
Neta Lumina it's local, its good news for local anime models.
>>
>>106706944
this qwen?
>>
>>106706938
>Nvidia is capable of turning their gpus vram plug'n'play with upgradeable and affordable vram.
the speed is important too, even if your 3090 has 100gb of vram and could run this, it would still be slow as fuck since it has to calculate those 80b layers
>>
>>106706938
Well, despite being three years into the AI boom, their competitors are still sitting with their thumbs up their asses = no competition

...
>>
>>106706944
it's so noisy, like the denoising process hasn't been finished, what model is this?
>>
>>106706880
those SAAS are slopped at anime. please do a comparison between neta, noob, and novelAI, which is the only good API anime model
>>
Bros.. the 4090 won't fit at all. Even with the different mounting types.

Would running it via one of those external boxes be worth it?
>>
>>106706978
>their competitors are still sitting with their thumbs up their asses = no competition
you don't betray your family anon
>>
>>106706989
would you. same thing
>>
>>106707003
I wouldn't yeah
>>
>>106706987
Did you buy the card without checking if your mobo/case has enough clearance?
You could use a riser and place on the card on top of your PC, other than that I don't know.
>>
can someone make a Erika Kirk lora?
>>
>>106706987
just get a cheap case which fits
>>
Post gens fags.

And I can't right now I'm at work.
>>
>>106707047
Make me bitch!
>>
>>106707047
/sdg/ is that way nogen
>>
>>106706526
Cyber-lain
Fishnets go with everything
>>
>>106706693
I would legit buy an expensive card if this giant model was at the level of Seedream, but it's not the case at all, it's still the same slopped shit you see on your regular model, the fuck are they doing?
>>
>>106707018
No I upgraded to a 5090, didn't plan on using the 4090 as well. But then the topic was brought up I got interested.

>>106707024
I have one of the largest cases, evo xl. It might fit if I stop using pushpull on the aio. But that'd be fully diy.
>>
>>106707096
>blurrydream
Just have a grain filter on top of all your images boom you got yourself blurrydream at home.
>>
>>106707096
thos guys have insane compute and they're wasting this on moar layers and moar synthetic slop, it's so sad when you think about it
>>
>>106706982
What you say is valid, but the thing is, Noob and Ilustrious are both based on tags. How can I fairly compare the prose prompt for Noob and Ilustrious?
>>
>>106706908
>>106706891
This, why not just take pictures of the actual product?
>>
Moar layers has yet to be debunked tho
>>
>>106707128
it's 4x the size of Qwen Image, and do you seriously believe the image looks 4x better? >>106706788
>>
>>106707143
You do understand image models are judged on things other than aesthetics, right?
>>
>>106707143
>it's 4x the size of Qwen Image
and 6.66 times the size of Flux dev, the devil is with us dude
>>
>>106706987
just get a riser retard
>>
>>106707161
go on anon, show us how those images are objectively better than what we can do on Qwen Image? >>106706788
>>
>>106707161
who fucking cares about anything besides aesthetics? it's whole purpose is to make pictures, if the pictures it makes look shit what's the point?
>>
>>106706772
???
>get asked "hey why is this model closed when everything's been open from you guys"
>response "we've done big changes to the model so in the meantime we'll give you guys a preview model for input/feedback that we can use to iron shit out"

if it wasn't going to be open he would've just said some bullshit like "model too big" instead of explaining in engrish the purpose of the preview
>>
>>106707179
You can fix aesthetics, you can't fix dogshit prompt following or anatomy. Seriously do we have some sort influx of retard in the image gen sphere recently?
>>
>>106706788
>81.3 times bigger than SD1.5
>22.8 times bigger than SDXL
>6.66 times bigger than Flux
>4.7 times bigger than HunyuanImage 2.1
>4 times bigger than Qwen Image
>>
>>106707175
I would if it was released. All I'm saying is judging STRICTLY on aesthetics is idiotic. And I'm an aesthetics fag, trust.
>>
>>106707121
noob does have some NL capability. Or, just do a comparison that's mostly tags, or leave noob out
>>
>>106707179
B-but the green ball sits next to the blue box on top of the yellow rectangle, also the text is correct!
>>
>>106707195
>I would if it was released.
there's plenty of images already publicly available, just look at them ane explain to the class how much superior they seem to be compared to the smaller models
https://www.reddit.com/r/StableDiffusion/comments/1nqm5l0/images_from_the_huge_apple_model_allegedly/
>>
File: file.png (1.52 MB, 896x1152)
1.52 MB
1.52 MB PNG
>>106706969
>>106706979
Flux,
https://civitai.com/models/1961797/srpo-refine-quantized-fp16-forge-compatible?modelVersionId=2220553

and a stack of loras, but these being the most prominent

https://huggingface.co/Alissonerdx/flux.1-dev-SRPO-LoRas/blob/main/srpo_128_base_R%26Q_model_fp16.safetensors

https://civitai.com/models/1253380/phone-quality-style?modelVersionId=1413027

I was soliciting hints to better workflows available out there, I'm still in the honeymoon phase of trying things out
>>
>>106707193
>you can't fix dogshit prompt following or anatomy
yes you can it's called inpainting and manual work. are you incompetent? how do you fix a dogshit looking gen?
>>
>>106707210
>it's called inpainting
if you want to inpaint, just use a SD1.5 model bro, you don't need a 80b model to get anatomical errors
>>
>>106707206
It doesn't matter until anon can run his usual autistic tests. Every base model newer than XL is slopped but you don't see anon posting SD1.5 do you?
>>
>>106707210
That's literally what a LoRA is for like dude wtf?
>>
>>106706025
>>106706046
Sadly I don't think it's possible with the nodes we have, but I don't see why it wouldn't be feasible.
>>
>>106707223
>you don't see anon posting SD1.5 do you?
/sdg/ exists for that no?
>>
>>106707223
*newer than SD1.5
>>
>>106707223
again, what's the point of a 80b model if it doesn't offer something better than Qwen Image? those images look exactly the same as a regular Qwen Image input, what's the point?
>>
>>106707193
>Seriously do we have some sort influx of retard in the image gen sphere recently?
They are either new, retarded, or being purposefully obtuse. I can't tell which desu.
>>
>>106707227
making a lora for a 80b model is going to be quite pricey
>>
What's up, /ldg/!

Last week was an absolute whirlwind. Thanks to a happy little accident, we did our first-ever YouTube livestream!

That means the raw, unfiltered VOD was up instantly. But for those who want the polished version, we just dropped a brand-new edited cut today.

Get ready to level up, because today we're diving deep into the art of compositing. We'll be breaking down killer techniques and workflows for SDXL, Flux, and even bleeding-edge models like Nano B.

>Now, I need your help deciding the future:

Want more raw, unfiltered livestreams? Reply with <3

Prefer the tight, info packed edited videos? Reply with :^)

Can't wait to see you there!

https://youtu.be/jmIbIIA9Qmc
>>
>>106707193
>You can fix aesthetics
did anyone fix Flux aesthetics? it's been more than a year and we're still waiting lol
>>
>>106707174
I don't think you understand the sizes at play.
But yes, the riser method is needed if they are to fit at all. I'd have to do it for both and loop it all around and diy a mount for both of them.
>>
>>106707248
Is this your first time seeing something new pop up on the jeeterboard? You're sperging out like the GAE is taking away your GPU. Just chill until it's out.
>>
>>106707227
no one is gonna run a 80b model, if you want to fit that on a 3090, the best you'll be able to do is Q2, do you know how Q2 looks like?
>>
File: 1755972308492242.png (476 KB, 890x594)
476 KB
476 KB PNG
>>106707263
>>
>>106707265
you're the one sperging out about how "stacking more layers makes shit automatically better bro, still waiting for the debunk bro", you are so fucking retarded >>106707128
>>
>>106707273
moebros, ramtorchbros what did this ramlet mean by this?
>>
>>106707287
>he wants to calculate 80b parameters with ram offloading
it's gonna take an hour to make a single image with our current gpu's lmao
>>
>>106707284
Sure, I'm the one sperging kek. Keep telling yourself that
>>
>>106707127
Product in a cool setting or looking shinier sells better than an actual picture of the product.
Sad but true.
>>
>>106707293
>blud doesnt know what MoE models are
how new are you?
>>
>>106707300
it's not a MoE model though
>>
>>106707300
where's the moe model?
>>
>>106707276
Yes nigger, I know what a riser is. We're talking 2x 4slot gpus, not one.
>>
>>106707300
who said it's a Moe model anon?
>>
I got a natural riser with all the bouncing boobs gens I made last days
>>
I wonder how long 1 step with a 80b image model would take
>>
>>106706788
I don't understand them, they created SPRO and they're not using it on that giant model? fucking why?
>>
yall mind if i up and wildly speculate thoughever
>>
File: dmmg_0072.png (1.4 MB, 832x1216)
1.4 MB
1.4 MB PNG
>>106707209
this is 60s on a 3090 with the fp8 model. drop your prompt and i'll make you a workflow.

your setup sucks man, mine is pure spaghetti right now (controlnets), but even the default workflow can do some good stuff. Why are you using SRPO?
>>
Someone still believes that bigger params = better model? Damn.
>>
>>106707317
>>106707307
>>106707306
my mistake as i didnt follow the conversation 7 replies back being about a specific model and replied to the general statement i saw of
>no one is gonna run a 80b model
>>
File: 1754524140432055.png (2.71 MB, 1728x1344)
2.71 MB
2.71 MB PNG
>>
File: 1741020717067493.png (448 KB, 972x1653)
448 KB
448 KB PNG
>>106706693
you know what? now it's the time to pray that what this furry fuck said about "seemless" offloading is true lol
https://xcancel.com/LodestoneRock/status/1968976389807161515#m
>>
Thread of poorfags with miniscule compute
>>
when will temu release a model
>>
>>106707256
buy an add faggot
>>
>>106707273
>do you know how Q2 looks like?
Bigger models quantize better, even Q1 might be fine. We'll just have to wait and see (though the full precision results don't exactly inspire interest).
>>
>>106707256
fuck OFF
>>
>>106707407
>We'll just have to wait and see
No... That's too rational. We MUST sperg out right here right now.
>>
SDXL = clay
Flux and higher = metals
the average consumer = stuck before the copper age :(

pls Nvidia
>>
>>106707423
>No... That's too rational.
Oh yes, the rationality that consists of looking at images from an 80b model, noticing that they are not much better than those from a 20b model (and even seem more slopped), and continuing to be enthusiastic about it, Chang, please.
>>
>>106707256
This is why you need to aggressively tell faggots like ani to fuck off.
>>
File: file.png (252 KB, 391x815)
252 KB
252 KB PNG
oh yeah it's qwentime
>>
>>106707447
anthro her
>>
File: WAN2.2_00064.mp4 (2.24 MB, 960x544)
2.24 MB
2.24 MB MP4
>>
>>106707443
Oh shit you have an advanced copy of the model? Leak it anon!
>>
>>106707464
>Don't look at the images bro, they don't mean anything bro, they're just the outputs of the image model after all, and you should never draw conclusions about the quality of an image model by looking at images.
(You)
>>
So after trying native wan context I think it's just busted for I2V. Their example workflow for sliding context shows the frame count on the context node being 81 and the total frame count at some number in the hundreds, but trying to do that with the wan image to video node gives tensor size errors. Has anyone gotten wan sliding context to work with i2v WITHOUT setting the frame count on the context nodes equal to the total number of frames?
>>
>continues sperging
>>
File: 00001-2225156179.jpg (795 KB, 2048x2480)
795 KB
795 KB JPG
>>
File: it's so over.png (91 KB, 277x182)
91 KB
91 KB PNG
>>106706693
You can press generate now, and by the time your first image will finished, WAN2.5 become open-source.
>>
>>106707351
appreciate the honesty, I'm honestly just throwing stuff at it to see if something works.

here's your slag prompt, probably using wrong syntax
https://pastebin.com/raw/chw9aZPv
>>
File: comfyui____0009.png (1.75 MB, 896x1216)
1.75 MB
1.75 MB PNG
>>106707525
>https://pastebin.com/raw/chw9aZPv
i gotchu senpai, gimme ten minutes
>>
File: ahahah.png (1.78 MB, 1396x1603)
1.78 MB
1.78 MB PNG
>>106706693
can you feel the power of a 80b model anon? those are next gen images that's for sure 1!!1!1!
>>
File: file.png (862 KB, 1808x692)
862 KB
862 KB PNG
>>106707455
didn't expect it to work
>>
I still have no idea how to use Qwen Edit
>>
>>106707564
they turned flux into a 80b model
>>
that anon is totally not sperging out guys cant you tell?
>>
>>106707584
you don't have to, it's so slopped and nano banana destroys everything on the edit space
>>
>>106707589
FUCK YOU!!
>>
If nobody is aware of it the faggot is often arguing with himself typically when anon posting so ignore all of it and don't take the bait
>>
size is all that matters.
>>
>>106707265
>sperging
>>106707297
>sperging
>>106707499
>sperging
>>106707589
>sperging
that's a bot right?
>>
>>106707584
so sorry for your loss.

>>106707564
oh my god it even has flux chin.
flux is a curse that keeps on giving.
>>
>>106707605
I'm forced to distill my shit because it won't fit in hers. Sadge :(
>>
>>106707564
still looks weird.
>>
there arent enough newfags here to fall for your antics KEK
>>
File: comfyui____0012.png (1.55 MB, 896x1152)
1.55 MB
1.55 MB PNG
>>106707525
an output is attached. i'll check back in an hour or two if you have questions about anything in there. you'll have to attach your own lora nodes since this is just straight flux

worfklow: https://pastebin.com/JgZEs7QQ
>>
>>106707623
I am not that guy but I take all bait, it's more fun that way
>>
>>106707128
It's an objective fact that most of the Flux layers do nothing which means the model is not fully saturated. What happens when you start stacking layers is the model can learn to skip them.
>>
>muh aesthetics
why is this an argument lmao
if a SD 1.5 model popped up and had prompt and concept understanding on par with gpt/nano we'd all be on it and just lora and upscale
>>
>>106707638
Thanks for making a good point instead of malding like the other guy
>>
>>106707645
>I love slop
Says no one, you're a Tencent employee doing damage control.
>>
File: file.png (1.1 MB, 1340x758)
1.1 MB
1.1 MB PNG
>>106707575
qwen is horny
>>
>>106707645
good piggy
>>
>>106707648
>malding
you're not saying "sperging" anymore? I wonder why lmao >>106707607
>>
>>106707631
Man, that chin.. just whyy
>>
>>106707645
Morons with a "it's always greener on the other side" syndrome
>>
Remember his thread is dead he can only necrobump it and he craves interaction with this thread. He can't outwardly show himself because he knows he will be kicked out so now he needs to anon post all day doing thee same antics he did for years.
There is no need for brand or model wars he's just mad that he's priced out and he can't afford a new card because he never worked a day in his life. Just stop replying and you'll notice it will be just him replying to himself.
>>
>>106707660
it's so slopped it looks like a low res painting
>>
>>106707564
that bottom left image looks like qwen anime style
hilarious if they spent so much money on training an 80b model just to train on slop
>>
>>106707645
>>muh aesthetics
>why is this an argument lmao
can't tell if this is bait or retardation
>>
>>106707679
because the couch was a shit lowres image. i can try n tidy it up a bit. and slopped is the wrong term retard. but i get what you're saying.
>>
File: 00106-3827431113.png (584 KB, 512x640)
584 KB
584 KB PNG
>>
>>106707699
>slopped is the wrong term
it is the right term you low IQ degenerate, look at the face of the girl, completly plastic and smooth
>>
Why are we arguing over this stupid shit, hone your fucking craft
>>
> anons insulting InvokeAI.
Sincere question: why are you doing this to Invoke and not to Comfy if they share the same business model?
>>
I'm tired of the pointless speculation over models no one ITT has access to is all. It happens every time.
>>
>>106707645
You can train a SD 1.5 tier model (~600m transformers model) with a 5090. Probably could do it with less than $1000 renting an H100.
>>
>>106707712
wow it's nearly like it's a blurry lowres mess.
post your gens anon, we're waiting to see howit's done properly.
you must kill yourself right now to death.
>>
>>106707723
Oh yeah, Tencent had always delivered good sovl model shit after all, why should we be weary of them now??
>>
>>106707723
and EVERY time the doom posters get proven right
>>
>>106707723
One retard rustles the cage and the rest of them jump in. There is zero reason to argue over shit you can't touch it's like arguing over how good the pussy would feel over a Nun
>>
>>106707631
cheers, I'll give it a spin. Already looks quite a bit more elaborate than my adapted chicken scatch.
>>
>>106707731
Like when they said Qwen would end up like hidream? lmao
>>
>>106707729
>kill yourself right now to death
pleonasm
>>
>>106707731
And despite all this free time they have yet to create a diverse well captioned dataset.
>>
>>106707735
>like arguing over how good the pussy would feel over a Nun
fucking kekd
>>
File: that's right.png (89 KB, 618x640)
89 KB
89 KB PNG
>>106707729
>post your gens anon, we're waiting to see howit's done properly.
I won't, unlike you I recognize the models we are currently using are slop machines, I'm not releasing anything until we get something good enough
>>
>>106707741
>Like when they said Qwen would end up like hidream?
who said that? Alibaba is a highly trusted company since they released the Wan series
>>
>>106707741
qwen may be usable for what you want to use it, but that does not mean its not gigaslopped
>>
>>106707750
ah, skill issue. got it.
stop posting here lil bro, you're wasting space.
>>
>>106707731
>and EVERY time the doom posters get proven right
he's out of line but he's right
>>106707764
>skill issue
that's actual skill issue -> >>106707660
>>
REMINDER:

most anon in this thread run sub 10gb cards
>>
the shittier the gen the angrier they get
>>
This thread reminds me of the time i morphed kate bush into a tiger on my pc in the mid 90's, it had mfm drives that needed a kick in the morning to spin up.
>>
File: 00012-1862212030.png (1.13 MB, 1024x1240)
1.13 MB
1.13 MB PNG
>>106707750
Why are you attacking him when it's clear he's testing stuff?
I post shit I'm testing all the time, this is part of the journey if there's context I'm missing please show me
>>
>>106707774
once HunyuanImage 3.0 will be released, every guy that doesn't have a 96gb vram card will be officially called a vramlet
>>
>>106707774
i'm going to jerk off knowing i have a medium 32gb dick.

goodbye you fucking losers. stay mad.
>>
>>106707759
Here and rweddit during release. They called flavor of the month and we would return to the mighty flix/krea lmao. I did return those models to my recycle bin, that's for sure.
>>
>>106707789
>They called flavor of the month and we would return to the mighty flix/krea lmao.
desu, only Qwen Image Edit is worth a damn, and I stopped using it after the novelty weared off, it's just too slopped
>>
>>106701867
>>106707784
>96gb vram card
Lower your voice when you speak to me you're brand new to the local meta
>>
>>106707741
trvth nvke
>>
>>106707735
You do know the images we're currently seeing are supposed to be high quality cherry picked images? This is Tencent telling you "look at what our model can do best!", they probably made 20 tries and choose the best one for each one of them, does that scream "it's gonna be good" do you?
>>
File: 00107-2593381260.png (987 KB, 888x1008)
987 KB
987 KB PNG
>>
>>106707741
>doom posters 18484141 - cope posters 1
doom posters sissies, how are we gonna cope with our only loss?
>>
>>106707784
Their example images don't justify the requirements. At that size I'd expect a model that produces extremely complex perfect scenes. Like a full Peanuts comic strip page.
>>
dis nigga never seen researcher gens before
>>
File: ComfyUI_00157_.png (3.28 MB, 1280x1920)
3.28 MB
3.28 MB PNG
>>
>>106707833
In all fairness 90% of model makers make shit tier gens when showing model ability, still people have a right to explore and see if they can get anything useful. You don't have to use the model anon
>>
>>106707461
nice
>>
>>106707837
I would have liked to hear that music.
>>
Qwen Image is good enough. I'm done getting hyped for new model releases, we should just focus on Qwen finetunes / controlnets / loras etc
>>
>>106707774
no i run an exactly 10gb card
>>
>>106707887
Small models are woefully underexplored. What the community should waste time on is a proper pretrained small model ready for finetuning on any mid-sized dataset.
>>
>>106707887
nah, I want something smaller, and without the VAE shit so that the edit doesn't introduce pixel compression
>>
>he doesn't seedmaxx
>>
>>106707887
false, qwen image is bloated and slopped.we should focus on building our own non-bloated, non-slopped model at 1/4th the size or less.
>>
slopped.we
>>
>>106707887
bruh, Qwen Image is barely better than Flux, and flux is almost twice as small
>>
>>106707918
Lol this is biggest load of BS ever, Flux doesn't even compare
>>
>>106707887
im eagerly awaiting your finetunes
>>
>>106707900
Like lumina?
>>
>>106707913
we will never make our own model
>>
File: 00108-2628670652.png (1.03 MB, 888x1008)
1.03 MB
1.03 MB PNG
>>
Based Koff.
>>
>>106706693
There's probably hundreds of ways to improve the model by adding novel training techniques or new architectures (or going for a serious unslopped dataset) but nahh, those mf went for the "just stack more layers bro" meme, seriously...
>>
>>106707935
anthro her
>>
>>
>>106707929
lumina could possibly work but we need to incorporate more optimizations like EQ VAE and TREAD going forward. and lumina has pretty lame base styles

>>106707934
yes we will, look at this
https://huggingface.co/KBlueLeaf/HDM-xut-340M-anime
>>
>>106707955
>340M
no thanks, we already have SD1.5
>>
>>106707929
Lumina is shit because it uses an opinionated, likely censored text encoder. But yes, a ~2B model with maybe something like the Qwen 0.6 text encoder but T5 XXL is still king for being verified uncensored.
>>
anyone created voices? I am trying alltalk, its uses short voice samples to clone voiced. i tried one sample with a latina accent, but alltalk makes her speak british. do different accents need different models?
>>
I'm starting to think Tencent is just incompetent, and HunyuanVideo was accidentally a kinda good model.

HunyuanVideo i2v was terrible, and dramatically changed the first frame. They hastily changed the implementation and released an updated model and just said "lol jk change your implementations and use this one instead" but it also had problems.

HunyuanImage 2.1 uses a VAE with too high of a compression ratio, which also caused problems with LTX and Wan 2.2 5b. They slapped on a refiner after the fact to cope, and also say you need to use their special snowflake guidance method. Refiner failed with SDXL, nobody will run it, it will fail here too. The model also has a bad license, is slopped as hell, and just worse across the board compared to Qwen.

Now HunyuanImage 3 is fucking 80b parameters, literally DoA, nobody can run it not even RTX 6000 Pro, and is even more slopped and just looks like ass for how large it purportedly is.

If you've ever tried to read their training or inference code, it's a fucking mess. They never released the text encoder HunyuanVideo was actually trained with, same with HunyuanImage 2.1. Technically, we've all been using wrong text embeddings the whole time.

They have no idea what they're doing.
>>
>>106707971
vibevoice has been all the rage recently as far as open models go
>>
File: disappointed.gif (485 KB, 220x220)
485 KB
485 KB GIF
>>106707955
>cheapest
nigga
>>
>>106707980
yeah for the stunt MS pulled
>>
>>106707969
Neta Lumina can do porn fine
>>
>>106707976
>They have no idea what they're doing.
yep, only Alibaba is the one chinese company that could save us (if they learn one day that synthetica data is poison)
>>
>used to have the problem of not getting enough motion in my i2v
>now have the problem of too much motion

AAAAAA
>>
>>106707900
>>106707907
>>106707913
>>106707918
>>106707925
all vramlets btw
>>
what do we need saving from doe?
>>
>>106707961
read the paper, it's a proof of concept. if he can do that with ~$600 of compute, the community can EASILY train their own real base models.

Instead of the furry blowing >$150,000 on finetuning fucking FLUX SCHNELL, we could have had a SOTA community-funded fast model by now if we went the route of HDM. ultimately it's inevitable though.
>>
>>106708001
It really just seems vramlets are getting mindbroken day after day. Why don't the LLM field have this kinda bitching?
>>
>>106708009
>read the paper, it's a proof of concept. if he can do that with ~$600 of compute, the community can EASILY train their own real base models.
I wished Tencent read that paper instead of going for gozillions of parameters lol
>>
>>106707969
>likely censored text encoder
never had an issue with this, it does what i tell it to just fine
>>
>>106708014
>Why don't the LLM field have this kinda bitching?
are you joking or something? when deepseek got released, the shitstorm was so intense they had to create a new general just for this specific model and appease the "giant models can't be considered local" group
>>
>>106708001
Yes anon, I like models that can be full finetuned on consumer hardware.
>>
>>106708001
I use Qwen image all the time (fp8 scaled), it's still fucking bloated and slopped. LORAs help deslop it though
>>
Bigma status?
>>
>>106708014
>Why don't the LLM field have this kinda bitching?
lol
>>
>>106708025
Do we have a model like that? Most XL models were tuned on a small to big cluster
>>
>>106707677
Ranfaggot is getting desperate for attention
>>
>>106707998
I'd rather have the second problem actually, I can slow down the video, while speeding it up reduces the total time..
>>
>>106708043
No because the people with compute are retards. For example, Chroma should've been trained from scratch as a 4B model.
>>
>>106708043
nta but you can do a full finetune of sdxl with a 24gb card iirc
>>
>>106707976
>I'm starting to think Tencent is just incompetent, and HunyuanVideo was accidentally a kinda good model.
yeah, they seem to have learn nothing, they went on the right path and instead of keeping those solid fundations they went for something completly new and broken, that's not how you improve on this field at all
>>
>106707887
Autistic compulsion forces him to say the line again
>reiterates insult thrown at him
More wheelchairs it its then
>>
>>106708014
>Why don't the LLM field have this kinda bitching?
https://www.youtube.com/watch?v=H47ow4_Cmk0
>>
>>106708059
>>106708061
You can tune sure but will it be worthwhile? Has there any evidence that doing something like this has shown results. Only the big tunes are usable as far as I can see.
>>
>>106707976
>HunyuanImage 2.1 uses a VAE with too high of a compression ratio, which also caused problems with LTX and Wan 2.2 5b.
wan 2.2 5b VAE is worse than the 14b one right?
>>
I only want models I can finetune with a TNT2
>>
>>106708024
I mean to be fair that one is 670 billion parameter lol
>>
>>106708076
Are you stupid or something? I said he should've made a 4B model from scratch which would've had sufficient expressive compacity as a base model while also being easy to train for other community members. And yes, finetuning is easier than making a base model as you can have a much more constrained dataset.
>>
File: 00109-3001379579.png (585 KB, 1008x888)
585 KB
585 KB PNG
>>
>>106708076
>Has there any evidence that doing something like this has shown results
only for smallscale stuff like lora extracts, people use clusters because it's waaay faster. training on a few million images with consumer hardware is too slow
>>
>>106708009
what would a $150,000 base model get us
like comparable to what model
>>
>>106708095
>Will it be worthwhile?
>Evidence, shown results
Here just added some keywords to help you understand, I get hard to english in India.
>>
>>106708106
>what would a $150,000 base model get us
a small useless shit, unfortunately we'll always be dependant of giant companies like Alibaba and Tencent
>>
>>106708107
you get hard to english in india?
>>
>>106708109
grim.
>>
>>106708097
>>106707945
>>106707935
>>106707837
>>106707709
go back to sdg with your slop.
>>
>>106708107
Okay, you are retarded. The premise is small models are woefully underexplored. We, however, can use our brains when considering say the 340m HDM model, the 600m Pixart Sigma model, and extrapolate to the overly bloated Flux model. And we can ask a simple question: is Flux being 12B parameters 20 times better than Sigma? The answer is obviously no. Which means we can make the hypothesis that a model bigger than 600m and smaller than 12B could be quite good, especially considering SDXL which is decent despite using a shitty text encoder, shitty VAE, and shitty architecture. So we can make an educated guess that a properly trained 2B model would be better than SDXL. We can also make an educated guess that a 4B model would be much better than the 2B model.
>>
>>106707988
stunt aside, i have been using it with great success. i like making asmr voices (slower, more whispery) and while alltalk is very good at what it does vibevoice does it better.
>>
>>106708149
>Hey everyone spend your money on this shit I "think" will work over stuff we "know" works
Lol good luck with the tune bro
>>
File: kek.png (1.62 MB, 1459x1492)
1.62 MB
1.62 MB PNG
>>106707564
>wait anon you don't have a 96gb VRAM card to render us? AHAHAHAHAH
>>
>>106708174
imagine standing outside that photobooth trying to get a passport picture quickly taken on the way to an important meeting and all you hear from is autistic and retarded onions voices doing onions laughs
>>
>>106708165
>"know what works"
>spent $150k on a distilled model that underwent cope brain surgery
It's actually funny how ignorant you are about everything. Chroma wasn't "what works". And we do know what does work because people have done it multiple times. Any DiT model with text conditioning trained starting at 256px and progressing to 1K and 2K. HDM is literally a dick around project and proved without a doubt that the process is fucking simple.
>>
>>106708188
why is s. oy censored. what.
>>
>>106708190
As I said good luck on that amazing small model that blows everyone else's out of the water. Looking forward to it champ
>>
File: 00110-2556412822.png (959 KB, 1008x888)
959 KB
959 KB PNG
>>106708144
>>
File: file.png (102 KB, 230x222)
102 KB
102 KB PNG
>>106708032
>Bigma
I went on a 500m mlp test phase. It'll keep training forever though.
>>
>>106708188
what's a onions voice?
>>
>>106708202
he wanted to say "s.oy" but 4chan is censoring that word and replace it with "onion"
>>
>>106708199
What is it with zoomers with just lying about what other people said. What is wrong with you, seriously.
>>
quick question, is the social media hate towards AI currently big enough to hamper this field?
>>
>>106708214
no
>>
File: 6747473.jpg (217 KB, 784x611)
217 KB
217 KB JPG
Ugh, anons, can anyone help me?
>>
>>106708214
people will take whatever they can, especially now where legislation is hazy around copyright for learning material
If anything was gonna happen I'd expect it to be around that
>>
>generates qwen images and edits, makes them into videos
heh nice.
>goes back to generating with sdxl

anyone else do this? there's just so many more tools and shit available for sdxl. ip-adapter is just pure bliss.
>>
>>106708218
ok good
>>
>>106708228
it says it filed to extract an archive, delete it and see if redownloading won't fix the error

your disk isn't full and the folder isn't write protected or has no permission for the user this process runs as right?
>>
>>106708214
Why? AI is about efficiency (saving money) which means the people with money will invest in it because it has obvious utility. All social media does is make people better at hiding AI use but I already personally use AI for my everyday work both LLMs for being my code slave and dev duck and using Image models for things like product hero images.
>>
>>106708232
>goes back to generating with sdxl
This whole gen is compromised. It's just NVIDIA and Comfy glowies telling everyone to buy more hardware and I bet none of the real anons here have more than 12GB of VRAM themselves.
>>
>>106708214
No. If the powers that be wanted to hurt AI you would be hearing "think of the children" type arguments, instead it's just kvetching artists
>>
>>106708241
i feel like if it was 100% accepted by everyone we'd have more tools idk
>>
>cumfart ooms after every gen again
>can't line up 4 wan gens anymore. again.

why is this software so cursed?
>>
>>106708232
I deleted all my XL models except one for when I just want a quick inpaint.
>>
File: file.png (9 KB, 316x186)
9 KB
9 KB PNG
>>106708250
speak for yourself
>>
>>106708250
i'm that anon and i have 32gb vram but i still just like going back to sdxl for ease of use.
making wildcards for illust/noob/sdxl is just so much easier.
i'm tired of writing entire chapters just to get a decent gen with these new models.
>>
>>106708228
looks to me like you have unstable internet
>>
>>106708259
I don't get your logic. What tools? People don't work for free. Who are these people who should be making tools for you to use for free?
>>
>>106708273
>i'm tired of writing entire chapters just to get a decent gen with these new models.
and then there's doing upscaling/2pass with flux, which doesn't seem to exist/work right/whatever does exist is a huge clusterfuck of spaghetti nodes. vs i get better gens just niggering with sdxl a little bit.
someone needs to make a proper realism model for illustrious, seeing as we can't rely on the chinese because they only like sameface gray alien girls and the americans like BOGGED negroid physiognomy. who's left?
..the french?
>>
>>106708267
>barely enough to run HunyuanImage 3.0 on Q4
vramlet
>>
File: 00111-1226643539.png (776 KB, 1008x888)
776 KB
776 KB PNG
>>
>>106708291
i tried merging lustify and biglust with some illust models which works somewhat decently but in general yeah, realism models are all biased towards shit unless you use loras.
>>
>>106708297
You can get the Hunyuan Image 3.0 experience by quadrupling the layers on Flux with Identity pass through.
>>
>>106708303
wow he's literally me

>>106708305
even a few pony loras somehow manage to add realistic lighting, the future may be in loras. who knows, might try it myself since i have the hardware.

>>106708267
lmao the 4090 user got called a vramlet get owned >>106708297
>>
When ready

>>106708328
>>106708328
>>106708328
>>
>>106707161
>things other than aesthetics
It's literally all synthetic checkboxes in benchmarks. Qwen also had funny charts shown at release with BIG NUMBAHS but in reality it's a a plastic model.
>>
>>106706484
Sauce on CWC vid? Looks hilarious



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.