[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: the longest dick general.jpg (2.28 MB, 3264x1948)
2.28 MB
2.28 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bred : >>102926788

Very Busy Day Edition

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://aitracker.art
https://huggingface.co
https://civitai.com
https://tensor.art/models
https://liblib.art
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3

>SD3 Large
https://huggingface.co/stabilityai/stable-diffusion-3.5-large
https://replicate.com/stability-ai/stable-diffusion-3.5-large

>SANA
https://github.com/NVlabs/Sana
https://ea13ab4f5bd9c74f93.gradio.live

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux
DeDistilled Quants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai
>>
What if you had MiniMax at home with an Apache 2.0 licence, but god said:
https://github.com/genmoai/models
>The model requires at least 4 H100 GPUs to run.
https://xcancel.com/genmoai/status/1848762405779574990
>>
File: ComfyUI_SD35L_0122.jpg (249 KB, 896x1152)
249 KB
249 KB JPG
>>
Blessed thread of frenship
>>
what if you had sd 3.5 large but god said: but now you need to wait for ggufs because vramlet lmao
>>
File: ComfyUI_temp_nsaes_00004_.png (3.68 MB, 2240x1440)
3.68 MB
3.68 MB PNG
yeah. img2img is fucked, I wonder if this is a safety measure
>>
>>102930138
It's just the denoiser that needs to be tweaked
>>
>>102930135
you can't try the fp8?
https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8
>>
File: 00597-2532502571.png (1.53 MB, 1024x1024)
1.53 MB
1.53 MB PNG
Is it over or are we back?
>>
What if god gave you local SOTA (sana) but said: you must have skill
>>
Blessed thread of keeping up the torch of threadly culture.
>>
File: file.jpg (321 KB, 944x1280)
321 KB
321 KB JPG
>>102930162
back for now
>>
>>102930155
>14.5gb
does that include the text encoders? i only have 10gb vram
>>
>>102930193
oh nvm the text encoders are in a seperate folder, i guess i'll wait a while
>>
File: sana.jpg (74 KB, 1023x1010)
74 KB
74 KB JPG
>>102930170
weights when
>>
>>102930214
have faith in chang
>>
always putting in a gen that says "LDG" is debo-coded
>>
>>102930193
yeah, 14.5gb is everything (fp8 unet + fp8 text encoder), desu my recommendation would be to put the text encoder into your ram/cpu so that you have spare vram room for the rest
https://reddit.com/r/StableDiffusion/comments/1el79h3/flux_can_be_run_on_a_multigpu_configuration/
>>
>>102930251
>yeah, 14.5gb is everything (fp8 unet + fp8 text encoder)
ooh okay, thank you anon, i'll give it a try
>>
>>102930214
Send Nvidia a strongly worded email, they're the ones having to approve it.
>>
>>102930111
https://arxiv.org/abs/2405.14854
If only this model was a bitnet model (1.58bit), it would be way easier to run it :(
>>
>>102930111
>4 H100s
I don't even remotely believe this is a hard requirement. I scanned through their github code, they have some weird multi-machine FSDP distributed implementation (likely taken from the training code).

I mentioned this last thread, but comparing with Allegro:
Allegro: 2.8B parameters, 80k sequence length, 2304 hidden dim, bf16 version runs in 22 GB VRAM
Mochi: 10B parameters, 44k sequence length, 3072 hidden dim, runs in ? VRAM

Memory usage is a fixed amount for all the weights, plus the memory for activations which scales linearly with both sequence length and hidden dimension size. Mochi is half the sequence length, but less than double the hidden dim size, so would theoretically use LESS activation memory per layer.

If you 8 bit quantized Mochi, it's 10GB of weights, compared with 5.6 GB of weights for Allegro bf16. Combine that with the lower activation memory per layer, and it probably can be squeezed to run in 24GB VRAM. Worst cast you'd need to go model parallel with two 24GB cards.

Someone just needs to make an optimized single-machine inference implementation that uses quantization.
>>
Can i run 3.5 with a 4080?

Is it good?
>>
>>102930334
>Allegro: 2.8B parameters, 80k sequence length, 2304 hidden dim, bf16 version runs in 22 GB VRAM
you include the text encoder to those 22gb vram or is it separate?
>>
>>102930353
Yes
No
>>
>>102930353
i doubt it I can barely run it on a 5090
>>
File: ComfyUI_04170_.png (1.75 MB, 1024x1024)
1.75 MB
1.75 MB PNG
Hmm think i just have to crank up the steps at least at 30
>>
File: file.png (2.31 MB, 1024x1024)
2.31 MB
2.31 MB PNG
>>102930353
>Is it good?
I let you judge
>George Costanza eating a Hamburger, there's a Hatsune Miku plush on the table
>>
>>102930358
I'm not including any text encoders at all, because it doesn't matter. You can load the text encoder to VRAM, compute embeddings, then unload it and load the transformer model. The time taken to do the text embeddings this way is still a negligible fraction of overall generation time.
>>
>>102930334
>44k sequence length
what does that mean? it's the number of frames or something?
>>
>>102930388
Burger looks good at least
>>
>>102930388
I'll wait for the finetunes
>>
>>102930184
No way SD3.5 just allows you to draw pepes like that???
>>
>>102930429
SAI likely put in some fuckery to prevent that
>>
>>102930424
The video gets compressed into the latent space, and then that 3d tensor is divided into a long list of embeddings. It's literally the same thing as imagegen models based on DiT, but with an extra time dimension.

So the actual input to the model is a long list of visual embeddings, each representing a tiny image patch from one frame. That's what the context length is referring to. For mochi, it's smaller than allegro due to some combination of slightly lower res video and better spatial + temporal compression by the VAE.
>>
>>102930474
I see, but on your post you're using the number of parameters + sequence length + hidden dim + quant to evaluate the vram requirement, what about the numbers of frames and the resolution?
>>
File: file.png (1.13 MB, 800x1216)
1.13 MB
1.13 MB PNG
>>102930087
>>SANA
>https://github.com/NVlabs/Sana
>https://ea13ab4f5bd9c74f93.gradio.live
real demo link btw https://8876bd28ee2da4b909.gradio.live
>>
File: 00606-1258344404.jpg (324 KB, 768x1024)
324 KB
324 KB JPG
>>
>>102930441
Eh i remember when peoole panicked about sdxl being censored and now we have pony
>>
>>102930441
The reason they'd do that would be stopping people from finetuning nipples on girls' boobs, and the model does them natively, so they're navigating in a better direction and the poor stuff is due to incompetence.
>>
File: d0d295YFT7snmBbC.webm (1.61 MB, 1280x720)
1.61 MB
1.61 MB WEBM
>>
>>102930538
Demo works now?
>>
>>102930538
yeah I've seen them on twitter, they said there's a 480p model and a higher resolution one, I wonder which one it is on the demo
>>
>>102930493
Doesn't matter, you can unload the transformer and run the VAE once at the very end to decode the latent space to a video, which is much lighter weight than the diffusion model.

The actual input to the model is what matters, and that's just a big tensor of 44k or 80k embeddings. That tensor IS the video that you're denoising, just in the latent space and represented as a bunch of tiny patches.
>>
>>102930551
>Doesn't matter, you can unload the transformer and run the VAE once at the very end to decode the latent space to a video
true, that's what CogVideoX is doing actually
>>
>>102930507
We can get a local video model that is better than proprietary video models and yet nobody has managed to make something better than Gradio?
Why is it still so hard?
>>
File: file.jpg (306 KB, 3217x1678)
306 KB
306 KB JPG
>>102930432
>No way SD3.5 just allows you to draw pepes like that???
unfortunatly no
>>
>>102930609
Reminds me of Meta.ai's output. Was Meta's image generator ever mentioned on 4chan? Anyone with Whatsapp can use it for free and nobody seemed to care, even Google's ImageFX got mentioned once.
>>
File: file.jpg (375 KB, 1485x1596)
375 KB
375 KB JPG
What I like about SD3.5 is its diversity of outputs, you don't get the same rigid shit everytime like on flux, but on the other hand there's a lack of consistency, for example one image is oversaturated somehow, and the 2 images below are "3d migu" even though I specified for an anime style only, SD3.5 is too inconsistent, probably because it's undertrained or something?
>>
>>102930672
this is a local thread anon, that's why we're not talking about it
>>
>>102930596
Because Gradio works just fine and requires minimal work to implement.
>>
>>102930706
read their release note
>>
>>102930771
this?
https://stability.ai/news/introducing-stable-diffusion-3-5
>>
>>102930706
>SD3.5 is too inconsistent
Isn't all this because it's unet? Unet is like this, no way around that.
Flux is DiT, so it's super-consistent, but then all seeds look very similar.
>>
>Consistency is bad now
>>
>>102930737
I said 4chan, didn't seem like /SDG/ mentioned it either, and they could say
>this is a stable diffusion thread anon, that's why we're not talking about it
>>
>>102930815
SD3 is also a DiT I believe
>>
>>102930706
>>102930792
It's from QK Norms which is similar to what Pixart uses, makes training faster and more efficient but you lose some stability deep in the weights.
>>
>>102930706
>you don't get the same rigid shit everytime like on flux
that doesn't happen to me on flux
>>
>>102930839
>makes training faster and more efficient but you lose some stability deep in the weights.
that's fucking retarded, couldn't they wait a bit more so that they got a better model for ever at the end? why do they always want to rush
>>
>>102930849
Why because your vague prompt gets more artistic interpretation from the model? Boo hoo.
>>
>>102930860
I asked for an anime style, not a 3d style, and it gave me 3d style, that's just a mistake from the model, and you're a retard if you think otherwise
>>
>>102930877
Or maybe you're suffering from negative bias and you ignore every time other models interpret your prompt. It's not like Flux is the king of adherence either.
>>
File: sd3.5 gen from twatter.jpg (170 KB, 1280x768)
170 KB
170 KB JPG
>>102930815
Flux is distilled to produce not very diverse set of nice images. It will never have knowledge of the full, trained from scratch model and because of that 3.5 is a much better base for finetuning, even if it doesn't have good quality in all generations out-of-the-box.
>>
>>102930706
That looks like shit kek.
>>
File: file.jpg (333 KB, 1465x1624)
333 KB
333 KB JPG
>>102930844
>that doesn't happen to me on flux
lucky you I guess, because flux is really rigid and tend to give you really similar pictures
>>
>>102930996
>why are there no mid sized models
>wah everything looks like shit
Any day now, a Flux finetune lmao
It's great because the one you kept posting turned out to be a shit
lmao
seriously
>>
>>102931012
>Any day now, a Flux finetune lmao
it happened though
https://huggingface.co/SG161222/Verus_Vision_1.0b
>>
>>102931011
at what guidance value?
>>
>>102931027
Yeah, it's shit.
hahaha, omg I can't believe you were waiting for that
Doesn't even beat fp8 dev
>>
>>102931012
You seem to have either responded to the wrong post or think I'm someone specific in this thread when that was my first post.
>>
>>102930831
It is bad if it's at the cost of creativity, the most creative model ever was Craiyon (formetly mini-dalle) and its inconsistency was so off the charts you couldn't generate a face.
>>
>>102931037
seemed to work because I got the reply I wanted :) because there's a coping Flux user in here thinking someone is going to drop a $20k finetune
>>
>>102931027
According to his donation page, he's doing finetuning with a single 4090. I doubt he used a lot of images. There's no way it's going to be good.
>>
>>102931043
No I want the things I type to appear, you can get creative with your prompts.
>>
>>102930834
Ah, well, kudos for achieving diversity of outputs with it then, and there's nothing wrong with the tech, black forest labs messed it up, and that's one thing I can say Stability did better.
>>
>>102931072
It's not going to be good because you need to do like 10 epochs on a million images to properly stamp in new concepts. If he's seriously doing a single 4090 that's like 15 seconds a step at batch size 1. At best he's doing what a merged Lora would do.
>>
>>102931034
you tried it?
>>
>>102931047
Honestly as an outsider that came to this thread after hearing the news about the new models, you seem a bit obsessed with whoever that guy is living rent free in your head.
>>
>>102931117
Yes, for the same effect find a realism lora for Flux.
>>
>>102930969
Wrong, Flux Pro doesn't have good diversity either, that's why I suspected it was DiT's fault.
But I'm glad I was wrong because unet just won't give text that looks as good as thing.
>>
>>102931141
DiT's major flaw is it requires saturation, it needs the boomer prompts which increases quality with each word.
>>
File: ComfyUI_04176_.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
>>
File: 1708395739709226.png (6 KB, 574x70)
6 KB
6 KB PNG
oh non ono sana bros this cant be happening
>>
File: ComfyUI_04178_.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
dawwww
>>
>>102931168
it doesn't look really good, desu I couldn't tell the difference between SD3-2b (medium) and SD3-8b here, they're on the same range of quality, the fuck did they do all this time?
>>
>>102931183
>username
>retarded pajeet
>>
>>102931106
That was me at the beginning, then I realized my best generations ever had something that the model put in there that I never could have imagined, sometimes it became a new fetish, one thing became my main fetish because I had never seen that before.
When the model only does what you ask for, and not more, you're missing out.
>>
>>102931195
Sadly a lot of the quality is in the training and some of the concepts are overbaked. It seems to me the more they try to fix the training process (magic deduping, overbake detection, etc) the more they fuck things up.
>>
>>102931204
>When the model only does what you ask for, and not more, you're missing out.
this, it's fucking boring if it only does it job, I want surprises, something like Dalle-3, you can go for a simple prompt and the model can add a shit ton of details that weren't on my orders but are still relevant to the image, that's where the fun is
>>
File: ComfyUI_04179_.png (1.32 MB, 1024x1024)
1.32 MB
1.32 MB PNG
>>102931195

never tried sd3 , but this new 3.5 doen't impress me that much, messed up anatomy
It is up to date with 'famous' people, i guess...
>>
>>102931043
I loved mini dalle, has that model been released?
>>
whats with this faggot schizo mentioning how Flux is bloated every time a shitty alt-model releases? >>102917495 as if Sana and SD3 being shit is a good thing because Flux is too big? Just rushing to the slide defense every time someone says these turdbakes look like shit. sd3 8b being bad is 100% a fault of the trash datatset
>>
>>102931239
>sd3 8b being bad is 100% a fault of the trash datatset
this, 100% this
>>
>>102931239
you know you are really obvious because you say turd a lot
it would help if you didn't schizo post back
also you are completely ass blasted because your savior Flux fine tune is, as you say, a turd.

lmao lol
>>
File: file.png (428 KB, 409x447)
428 KB
428 KB PNG
>>102931235
kek he looks like putin in there
>>
>>102931195
They added nipples.
https://files.catbox.moe/w0katp.png <- SD3.5Large
>>
there he goes. what's this jeet's endgame? unironically what is wrong with him? treating it like a console war
>>
>>102931274
everything except Flux is a turd for you, what's your end game?
>>
>>102931221
>something like Dalle-3
Well, with today's so many releases maybe we will get open Dalle-3 at last today!
>>
>>102931268
ok I guess they stopped acting like lunatics towards nudity, which is always a good thing, but the model quality could've been way better, it's a 8b model and it still looks bad, especially the details
>>
>>102931287
>we will get open Dalle-3 at last today!
you're talking about Sana?
>>
>>102931298
>Sana
no, that was dead before it even got released.
>>
>>102931287
We got sora at home, except you can't run it at home kek
>>
In the LLM world there's a pattern where the more creative models are dumber, while the smarter models are less creative. We want creative models but not necessarily at the cost of coherence/smarts. Something the LLM users have found to make the smart models more creative is to use the {{random}} function, which is a part of the prompt. I think I heard there was such a thing in the image gen frontends as well. I believe it was called wildcards? Basically it lets you insert random strings in the prompt. So in LLM world you could do something like

>Write in the style of {{random: Dracula from Castlevania, Kizuna AI the vtuber, Gordon Ramsey}}.

and each time you press generate, it would pick from one of the strings so you'd get a different style for each new reply. I imagine this could be pretty powerful for smart but uncreative image models as well as you use more and more wildcards in different parts of the prompt.
>>
>>102931274
you vill get compressed eyes
you vill get mangled limbs
you vill generate mj slop
you vill like sana
>>
File: file.png (3.64 MB, 2638x1452)
3.64 MB
3.64 MB PNG
>>102931312
>We got sora at home, except you can't run it at home kek
>>
>>102931281
i dont give a shit about flux man, what the fuck is wrong with your brain? flux is shit because it's rigid untrainable junk. sd3 and sana are shit because they're melted synthetic slop trained on even worse data. none of the 3 are really good. seriously get some help, you're jumping at shadows over nothing
>>
>>102931330
you should get checked for depression because everything sucks to you
maybe it's a you problem
>>
>>102930529
>poor stuff is due to incompetence
It doesn't know dicks, many artists, a lot of known people, genitals.
It's intended.
>>
Sana knows artists
Flux and SD3 do not
>>
File: file.png (43 KB, 675x164)
43 KB
43 KB PNG
>>102931343
>It's intended.
facts
>>
File: Mochi.webm (907 KB, 1280x720)
907 KB
907 KB WEBM
Guys, Mochi passes the test
>>
>>102931314
It's not hard to have a diverse image model, you just have diverse captions and diverse images and then avoid the temptation to use a DPO.
>>
>>102931335
let's not play pretend now. these models are just bad, not need to cope over it. sana was not the bigma everyone was hoping for. sd3 is just as bad as it was months ago, now with extra parameters. flux remains untunable airbrushed slop and is nowhere near dalle at home. these models are all underwhelming except that video one which isn't even runnable
>>
>>102930135
3.5 Medium still to come on the 29th too BTW. Large is looking good so far though, it can do proper booba with nipples out of the box even, even with the 4-step turbo version.
>>
>>102931357
>Finetune for it
Lykon, why would people spend thousands of dollars to finetune this giant 8b turd? You're delusional
>>
>>102930849
>couldn't they wait a bit more
apparently everyone and their dog decided to release today
so either it's a crazy coincidence (could be), or they know about some legislation that will be pushed soon that would severely restrict image models in the US
>>
>>102931367
Seriously anon, you should take a break from this, come back in five years, being on the cutting edge isn't for you.
>>
>>102931268
Really? Looks like they added areolas and forgot the nipples
>>
>>102931372
>in the US
SAI is an UK company though kek
>>
>>102931372
I'm actually going to bet there's some sort of deadline for some sort of AI event we don't know about. Maybe even for Nvidia's conference.
>>
>>102931370
>Large is looking good so far though
no it fucking doesn't, the details are horrible, the anatomy is still fucked, it's a fucking 8b model there's no excuse this time, they fucked it up
>>
>>102931236
Yes:
https://huggingface.co/dalle-mini/dalle-mini
But it's garbage compared to this one:
https://huggingface.co/dalle-mini/dalle-mega/tree/main
Which is garbage compared to... one I'll post later...
>>
>>102931372
>some legislation that will be pushed soon that would severely restrict image models in the US
Big if True
>>
>>102931380
none of these jeeted local releases are cutting edge. training on dreamshaper outputs with sai compute is nothing more than a griftbake. same with sana designed to guzzle research grants.
>>
>>102930529
Type this prompt I got from JoyCaption into SD 3.5 somewhere where there's not Web API level prompt filtering (e.g. locally), it really will do it out of the box, I swear:
"a photograph of a topless woman with a light skin tone and platinum blonde hair styled in loose waves that cascade over her shoulders. She has striking blue eyes, full lips, and a slender, toned physique. Her breasts are medium-sized with prominent, erect nipples. She has an intricate tattoo of two roses, one red and one white, with detailed green leaves and vines, covering her upper chest and extending to her shoulders. Additional tattoos are visible on her left arm, which has a large, elaborate design, and her right arm, which has a smaller, more intricate design. Her left hip features a tattoo of a rose. The background is a plain, neutral beige color, which helps to focus attention on the subject. The lighting is soft and even, highlighting her smooth skin and the vibrant colors of her tattoos. The overall composition of the image is simple yet powerful, emphasizing both her natural beauty and the artistic elements of her body art. The photograph is professionally taken, with a clear and crisp quality that brings out every detail."
>>
>>102931412
Yeah you're completely fried, get off the internet.
>>
>>102931367
>sana was not the bigma everyone was hoping for
Utter retardation you have
>>
>>102931418
>no argument
>no u
yeah maybe post some images in your defense? these models look like shit. the only thing fried here is your trash quality outputs
>>
>>102931398
Nice bait lmao, I'd post topless 4-step Turbo gens if this wasn't a blue board, it literally does nudity better than any version of Flux does without Loras
>>
File: file.png (115 KB, 960x1050)
115 KB
115 KB PNG
Maybe I should just fucking lower my expectations.
>>
>>102931365
Idk if I'd call that easy. Building a both large and diverse dataset is always a challenge in ML.
>>
WHY THE FUCK CAN'T 3.5 DO HIGH RESOLUTIONS!
>>
File: ifx618.jpg (300 KB, 1024x1024)
300 KB
300 KB JPG
>>
>>102931436
Anon, everything is shit to you because you unironically have severe depression. You should leave and come back when things stop looking like shit to you. Fix your life. I can't believe you're getting this upset about experimental software.
>>
>>102931268
no dick
>>
>>102931451
shame this isnt local. very unlike anything we have style wise desu.
>>
>>102931442
Not really, LAION for SD 1.5 proved the concept. The problem is everyone keeps on filtering their datasets based on flawed reasoning instead of letting the chaos happen. You'll notice these models make two major mistakes: overfiltering the data, and overtuning the outputs with a DPO. That's why everything turns into slop.
>>
>>102931452
ooooooh im close to cooming sir please give me the jeet schizo rant again i beg you
>>
Can we stop talking about Flux.
What about other models? Right now, what matters is how good of a base model it is so that we can reason about what future models should be fine tuned off of. Does SD3.5 perform better than SDXL base at the things we care about like non-censored anatomy? Can SDXL base do higher resolutions than 1024x1024? What about Sana that people mentioned? I haven't used any of these models actually, so I don't really know.
>>
>>102931484
I'm not the one that says everything is shit and it's really upsetting you even though you just got many models to try out today, I don't know why you're here, this obviously upsets you. Take a break, come back in a year, you obviously can't handle the research.
>>
Looking at 8b, it genuinely looks like 2b medium. Is the hype just Stockholm syndrome after Sana flopped and people want a life raft?
>>
>>102931490
you need to ask this mans expert opinion about sana he will be happy to oblige
>>102931496
>>
>>102931384
Anything the US does in legislation in regards to bleeding edge stuff is copied everywhere else.
Anything the US is obsessed culturally bleeds over in every other parts of the world at some point.
It's like Apple and the rest of the smartphone brands applied to countries/cultures.
>>
File: file.png (1.39 MB, 1216x928)
1.39 MB
1.39 MB PNG
>>102931501
>Sana flopped
Said who?
>>
sana is shit
flux is shit
all sd3 variants are shit
xl for another year, at least
>>
File: file.png (1.9 MB, 1216x928)
1.9 MB
1.9 MB PNG
>>
>>102931440
Actually I'll just box them:
https://files.catbox.moe/ckafyh.png
https://files.catbox.moe/rc6ni1.png

Both genned with the 4-step Turbo version (hence the kinda plasticy Fluxish look). The more saturated one is DPM++ 2M SGM Uniform, less saturated one is Euler SGM Uniform.
>>
>>102931509
IS OVER, THE ONLY GOOD THING WE HAVE IS A FUCKING OUT OF VRAM VIDEO MODEL!!!! Time to left the hobby and seek mental help.
>>
>>102931490
SDXL is stagnant because of a bad text encoder and generally slow training. Sana is going to be the small model king that replaces SD 1.5/SDXL because it can actually be trained locally. Feel free to refer back to the 600m Pixart gens from the other thread. Sana also has some good gens so I think outside of skill issue gens, it's going to be good especially after a finetune (which anyone can do). Pixart significantly improves with finetuning.
>>
File: file.png (1.84 MB, 1216x928)
1.84 MB
1.84 MB PNG
>>
>>102931289
No, penises dont exist, well, cronenburg ones.
>>
>booba in a base model
WE ARE SO BACK BOIS
>>
>>102930369
noob mistake baka
>>
File: 00012-1343596780.jpg (567 KB, 1728x1296)
567 KB
567 KB JPG
>>
>>102931526
if it was 3b maybe. right now there is no reason to replace xl with a smaller model. maybe a from scratch model on the architecture, but neither 1.6b sana nor 0.6b will ever get mass adoption because they're simply worse
>>
>>102931532
The future of all company trained models will be this weird thing where human anatomy and sexuality is always wrong or scrambled and no one known exists.
>>
>>102931551
Wrong, 600m Pixart generates kino gens. So 1.6B Sona is going to generate even more kino gens. It's that simple.
It's ironic too because you just saw both 2B and 8B SD3 being ass. So maybe it's not all about parameters.
>>
>>102931526
holy heck, im so excited to fix the autoencoder compression with just a little bit of training!!!!!!!!!
uhhh let me just sprinkle a bit of parameters as well
this is it, we are SO back
>>
>>102931532
I think half the reason they never do downstairs is cause they don't wanna risk having the "constantly gives dudes pussies" problem that like 80 or more percent of SDXL checkpoints still have to this day lol
>>
>>102931575
Somehow I think you'll be upset no matter what
>>
>>102931546
ya thats the stuff i've been missing from these generals
>>
>>102931589
thats right, you got me, im a flux agent sent to sabotage the great PIXART supremacy
im malding here
im balding here
im farting here
but im still not using sana IM SORRY
>>
File: file.png (1.6 MB, 832x1312)
1.6 MB
1.6 MB PNG
Sometimes it seems that anon has a little too much of a vested interest in these fun toys
>>
>>102931607
No, I've determined that you are severely depressed and that you're incapable of experiencing happiness.
>>
>>102931558
nah it generates melty nonsense. if pixart was "kino" it would've been adopted. it never was. dead-end bad models
>>
File: file.png (2 MB, 832x1312)
2 MB
2 MB PNG
Excuse the kino posting
>>
File: ComfyUI_04208_.png (1.83 MB, 1024x1024)
1.83 MB
1.83 MB PNG
>>
>>102931618
thats right... at first i was working for the great PIXART empire
but then, sana had sex with my dog....
and now i have joined the flux evil secret society and swore a revenge...
>>
>>102931546
post more man i need my fix
>>
>>102931558
If you think SD3.5 is ass there's no way you like anything or ever would lol
>>
File: 00025-1343596776.jpg (588 KB, 1728x1296)
588 KB
588 KB JPG
>>102931657
>>
File: file.png (1.27 MB, 640x768)
1.27 MB
1.27 MB PNG
>>
File: file.png (2.2 MB, 1248x928)
2.2 MB
2.2 MB PNG
>>
File: file.png (1.17 MB, 640x768)
1.17 MB
1.17 MB PNG
>>
Sana won
>>
is dis cute?
>>
File: file.png (2.13 MB, 1248x928)
2.13 MB
2.13 MB PNG
>>
File: 00097-636132240.png (1.74 MB, 1240x1240)
1.74 MB
1.74 MB PNG
>>
File: file.png (856 KB, 640x768)
856 KB
856 KB PNG
>>
File: ComfyUI_04220_.png (2.09 MB, 1024x1024)
2.09 MB
2.09 MB PNG
>>
oh sweet new lykonslop just dropped? surely its 2.5x as good as xl based on that beefy param count, and 10x as good as pixart!
>>
>>102931401
Okay, found it:
https://github.com/kakaobrain/karlo
Which can be used here:
https://huggingface.co/spaces/kakaobrain/karlo
This was as far as the UnClip technology went before being replaced by Diffusion technology
>>
File: 00039-1343596775.jpg (616 KB, 1728x1296)
616 KB
616 KB JPG
>>
File: file.png (1.04 MB, 640x768)
1.04 MB
1.04 MB PNG
>>
>>102931289
>especially the details
That's the VAE's job, the rest of the thing just makes a composition and what we love is added by the VAE
Is that it? Can't someone make a version of SD3.5 that uses Flux's VAE? That's the entire concept with the VAE being separated, that you can use it with other image models.
>>
>>102931312
It's been estimated that Dalle 3 is a 4B model, so we could run it fine.
..
...
Okay, at least give us its dataset, some dumbo could outdo Dalle with it!
>>
can i run any locals on a 6700xt yet?
>>
>>102931473
Well the issue I'm seeing right now is that just like with LLMs, there doesn't exist a single modern model to serve as proof that is both small-ish, smart, and creative. For LLMs, it's normally understood that a diverse dataset means both an opportunity cost, by training on everything under the sun instead of the targeted data you want your LLM to be good at, and a quality loss, due to architecture and training methods. You can't have something that is both SOTA in smarts and SOTA in creativity in a small model if you use a completely raw dataset with no subject area and quality control. Part of the reason that LLMs are usually not trained on old books despite all the data that could come from them.

SD1.5 may have been smart and creative at the time, but right now we only have DALLE 3 to serve as an example of what both a smart and creative model looks like, except that we don't know its parameter count nor how much data its been trained on, meaning that its performance level could be due to "cheating" on both of those factors, and in fact it's not possible to do for a small local model without major improvements in other areas like architecture.

Though since I do have a sense that SAI and BFL have filtered their datasets for safety reasons rather than to increase performance, I could imagine there is likely performance improvements (with respect to smart creativity) left out. Though my estimate is that it's probably not as much as one would hope and we need to push parameter sizes and/or dataset size in order to get to DALLE's level.

As for DPO, honestly that's a post-training method and if one has used it, that means their model is not a foundation model but a post-trained model and you need to criticize them for not releasing the pre-trained foundation model, rather than for using DPO as their post-training method.
>>
>>102931314
What people have come to is using LLMs to create prompts for image models to draw...
I guess if you want a surprise, you use it and copy and paste it without reading it, so the image is random and surprising.
>>
File: FUCK_STABILITY.png (3.83 MB, 1673x1256)
3.83 MB
3.83 MB PNG
You don’t deserve the full weights. Sorry.
>t. Stability poojeet CEO
>>
Is Hand Refiner still the best way of fixing hands, or there's something better now?
>>
>>102931761
long live the new flesh
>>
>>102931802
I really hope dalle3 is like, 40b at least. if its actually only 4b then it would make every single other model look like an absolute embarrassment.
>>
>>102931357
Using a LLM to retag images, losing information like what character is it or what artist made the painting is pure incompetence.
Intentional incompetence.
>>
File: ComfyUI_00076_.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
>>
File: 00050-1343596778.jpg (520 KB, 1152x1728)
520 KB
520 KB JPG
>>102931823
c64 helldimension fleshlight integration
>>
>>102931362
The face is really ugly and distorted when it's small.
Just like with every image model around here.
>>
>>102931851
me first
>>
>>102931872
fuck off thats my hole
>>
>>102931370
SPOILER ALERT
SD3.5 Medium with NOT be better than SD3.5 Large. It's for people that can't run Large and all the new technology is just there to alleviate the loss of quality, if it's not in Large, it makes things worse, it has to be used to save parameters.
>>
>>102931372
I've released 11 image models for the month of October, averaging one every 2 days, but did not release one today, I guess whatever I released would have been eclipsed by everything else.
>>
>>102931903
>I've released 11 image models for the month of October, averaging one every 2 days
jeetmixes dont count, anon
>>
>>102931813
>Though since I do have a sense that SAI and BFL have filtered their datasets for safety reasons rather than to increase performance
I always wonder how much of that results in the fucked up anatomy for limbs or pose.
If you filter anything "unsafe" (aka just nsfw most of the time is what they mean), wouldn't the models be worse for it, vs an approach where their own hosted access is controlled like what DALLE does, but what they release isn't their responsibility anymore ?
>>
File: SD3_00073_.png (1.59 MB, 1024x1024)
1.59 MB
1.59 MB PNG
the pajeet model is very good at cultural enrichment :)
>>
File: image.png (133 KB, 1280x720)
133 KB
133 KB PNG
>>102931357
This seems like a mistake. The less meta-captioning you do, the less the model has the ability to separate its learning about things, so you essentially end up telling the model that a cartoon like this, but oh no actually a cartoon like this other thing, so what it knew about past cartoons is partially overwritten and you need constant repetition of past data to make it not "catastrophically" forget the things it learned in the past. So basically by removing the meta data, you spend more money to make the model perform as good as it once did. And for transformers, the cost of this is HUGE.
>>
>>102931473
You'll notice they only do it when someone points out "the king has no clothes", what we need is a kind with the balls to run around naked, and not care. To stop being prude.
>>
>>102931941
Keep in mind most nsfw detection is really just skin tone detection, so you end up throwing away a lot of good images. And seriously at this point it's obvious no one is vetting anything, they're just trusting the numbers which is why we keep seeing these bullshit "look at our dumbass score" metrics for models that are clearly not the same quality.
>>
File: 00061-1343596779.jpg (447 KB, 1152x1728)
447 KB
447 KB JPG
>>
>>102931954
>To stop being prude.
Tbdesu if they fear journalist retarded clickbait articles and faux outrage on social media, anything they do will lead to that anyway, so why care? I refuse to think all the people are prudes themselves.
>>
>>102931509
No, I'm sticking to SD1.5, people are still releasing good stuff for it.
All SDXL based models look the same, and I was not a fan of the PonyXL's branch style.
>>
lotta talk for a bunch of retards whove never trained a base model in their lives
>>
>>102931960
>so you end up throwing away a lot of good images
a fucking shame
>>
>>102931516
And that's it? It isn't in the news?
LATEST VERSION OF STABLE DIFFUSION ALLOWS FEMALE NIPPLES!!!
See? It's not a big deal, they should have allowed them since SD2.0
>>
>>102931558
You made me realize how much Sona sounds better than Sana, BTW.
Sana means "heal" in spanish.
>>
>>102931947
which is why all the "art" generated by these vlm models looks so fucking bland. it all just gets tagged as "a digital painting of" with no unique descriptors for the style. resulting in an absurd amount of information loss. the equivalent to tagging every wheeled vehicle as a "car"
>>
File: ComfyUI_00098_.png (1.62 MB, 1280x960)
1.62 MB
1.62 MB PNG
I like it, I think it's neat
>>
>>102931827
Well, it "cheats" by using GPT-4whatever to rewrite the prompts and some of the creativity you see may have been added in that part, so raw Dalle would not be as good, but raw dalle's quality could be achieved without the parameter bloat.
>>
>>102931816
qrd on picrel?
>>
>>102931920
Why not? Some of my favorite models were jeetmixes.
>>
File: file.png (720 KB, 640x768)
720 KB
720 KB PNG
>>
>>102931947
Really? So what we need is technology that remplaces transformers, if training becomes cheap then anybody can made the model of our dreams.
>>
>>102931960
I think Playground was the worst offender, they were claiming to be better than Dalle 3 and Midjourney 5.
>>
>>102931947
this is why flux learns things so quickly, you aren't really teaching it anything new with those 10 images loras, you are just making it remember something it forgot
>>
>>102931974
It was funny all the things they did at Google for their Imagen model release backfired and they got exactly what they tried to avoid and then removed humans entirely from their generations.
>we... huh... have no idea what race, ethnicity, gender and sexual preference the human in the drawing should represent when you ask for "person", so we're banning the generation of humans completely.
>>
File: 00092-2286917976.jpg (479 KB, 1152x1728)
479 KB
479 KB JPG
>>
>>102931947
Wait until you realize we do something called "dropout" when we train
>>
File: d.png (3.93 MB, 1792x1024)
3.93 MB
3.93 MB PNG
>>102932083
I used Dalle through ChatGPT before and I don't believe that's the case. You're able to investigate the prompt for a generated image, so you know when the LLM has changed it, and you're able to tell ChatGPT not to change it so that you in the end do get the exact prompt sent to the image generator. Additionally, it was possible to fix the seed to get reproducibility. And in my experience doing this, I do think the base model was still pretty creative. It really is just a powerful model in my opinion.
>>
I just had a good sleep, can you sleep tonight anons?
>>
>>102931984
Imagine me selling my bitcoins to train a base model only to see them soar to 100k an regret missing the profits.
>>
>>102932158
It was a rare case when even normies noticed the sheer absurdity of all of it.
I want more of these.
>>
>>102932199
No doubt, the question is how many parameters you need to do that? If someone offered a 1 million dollar bounty to the first person that made an image model with the power of dalle 3, they may not need more than 4 billion.
Because, so far, we still haven't seen what the rest of billions are being used for.
>>
>>102932302
Dalle isn't as impressive as you think, it really is just like a 4B model but with a mostly uncensored and fully curated dataset that it was properly trained on. Most of the models we get are slopified from top to bottom including a censored dataset and even rushed out the door before it's properly trained and worse, usually getting a lobotomy pass.
>>
>>102932281
I want those things to cause a realization and change of paradigm so that new people learn about it and stop the absurdity.
But, noooo, let's make a new definition of safety even if our investors run away from us.
>>
>>102932369
I'm still impressed by its outputs and can't wait for the day an open model surpasses it.
>>
File: ComfyUI_00118_.png (1.08 MB, 1280x960)
1.08 MB
1.08 MB PNG
Not quite
>>
>>102932407
SD3 Large will be able to when someone does a real uncensored finetune that includes pop culture knowledge.
>>
>>102932390
Honestly I think the google one was so bad it was a one off, I've never seen this safety/censorship bullshit being mocked in normie spaces like this in recent years.
>>
>>102932369
>with a mostly uncensored and fully curated dataset that it was properly trained on
I think that's the impressive part at this point.
All the open models seem to compete on is to make the same "super safe" stuff at the cost of nicer results.
Having played with it, I'd be very happy if dalle was leaked, since at this point I lost all hopes from any local models to ever change their way.
>>
can someone summarize which models were out today and their size/goal (image or video)?
>>
>>102932426
nobody has ever done this, because nobody knows how to scrape a dataset at that scale. sure we have anime finetunes, but never a full scale "fix the art" finetune that pulls from all over while maintaining proper captions.
>>
>>102932563
wait for our bro cefurkan to release a detailed analysis
>>
The Mochi demo is up, here's my first attempt at a Miku
>>
>>102932584
So it's essentially impossible to fix on a local/non company level?
Damn that's grim.
>>
>>102932604
Damn, ordering my 5 H100s now
>>
>>102932617
It is possible but requires an extreme amount of curation and community effort to cover all possible character and style niches and ensure they are properly captioned
>>
>>102932680
And I guess something like the DALLE dataset has no open equivalent anywhere?
>>
>>102932680
I believe in autism but maybe we are reaching its limit
>>
>>102932158
>>102932281
Oh man the Gemini diversity debacle was hilarious. Even the main director at my company asked me what was up with that and I didn't even know where to begin haha.
>>
>>102932714
It's all public data really, it just needs to be scraped/curated. It's likely all in common crawl. The real stumper is how they managed to preserve niche things like Blazblue and Fire Emblem while also using their AI captions. As Lykon said, its thanks to shitty vlm that IP was lost >>102931357 so how was dalle able to preserve it?
>>
File: 00142-2894618249.jpg (687 KB, 1987x1325)
687 KB
687 KB JPG
>>
>>102932604
how long did inference take?
>>
>>102932789
Probably using a great early version of gpt vision model to mass caption, with no censorship?
>>
>>102932862
the question would be, why all the others seem to suck, unless they're happy it sucks because it kills IPs, artists recognition and most nsfw by design
>>
File: ComfyUI_00147_.png (1.35 MB, 1280x960)
1.35 MB
1.35 MB PNG
>>102932420
>>
>>102932740
As hilarious as it was, I don't think neither google nor the public learned much from it in the long run.
>>
>>102932979
getting closer
>>
If they used exclusively synthetic captions, why was their sample image of the woman in the grass captioned
>~*~aesthetic~*~ #boho #fashion, full body 30-something woman laying on microfloral grass, candid pose, overlay reads Stable Diffusion 3.5, cheerful cursive typography font

I'd also love to know how they got such unremarkable and average-looking people for their example images, because all I'm getting is flux-tier Instagram clones with fish lips.
>>
File: _teflon_don_.jpg (703 KB, 1568x1568)
703 KB
703 KB JPG
got banned from /pol/ so dumping this here
>>
>>102933165
can you gen donald crying while kissing vladimirs feet?
>>
>>102932264
Imagine how cool the model will be tho with you at the helm
>>
File: swiftelontrump.png (3.51 MB, 1568x1568)
3.51 MB
3.51 MB PNG
>>102933179
do it yourself retard
>>
>>102933193
do some with donald in a diaper bro
>>
>>102933193
nice cleavage pose
>>
>>102930087
I wonder what reference images it used to copy
>>
File: 1729643735_0001.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>
File: 1707252284684013.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>102931168
flux still wins for mikus
>>
File: 2024-10-22_00001_.png (385 KB, 720x1280)
385 KB
385 KB PNG
>>102933441
Is that real?
>>
who has good prompts or methods to getting good results with prompts?
>>
File: 1712914851319773.png (2.34 MB, 1024x1024)
2.34 MB
2.34 MB PNG
>>102933497
Censored by Bing?
> Create a small prompt with a figure you want in it
> Once you can get the figure you want, play around with the background
> Only add one or two words each time, use complete sentences, and refer to anything censored indirectly
> Incrementally increase the complexity of your prompt, pushing it towards your desired contents.
> If you add words you think will trip the censor, space them out from the part of the prompt you are working on. Example: rabbi at the beginning, big nose at the end
> Bury naughty words in separate sentences. Even if that sentence is talking about something else, DALLE will figure out what you mean
Following this procedure will help you build an intuition about how to write the most effective prompts

https://dallery.gallery/the-dalle-2-prompt-book
>>
>>102933497
chatgpt has some good prompts
>>
File: 2024-10-22_00003_.png (423 KB, 720x1280)
423 KB
423 KB PNG
>>102933491

>>102933497
a lot is trial and error. Certain ways seem right, until someone does it a totally different way, showing that actually you can't be too sure.

One opinion is to be extraordinarily specific.

You can use text (LML) ai to help build long prompts, you lazy loser.
>>
>>102933530
cry harder faggot
>>
File: 1699443337899918.webm (1.51 MB, 720x720)
1.51 MB
1.51 MB WEBM
HANDS FREE SLOPPING
> Words -> salad with chatGPT
> Text -> image
> Image -> video
> Edit/Subtitle
> Converter to WebM
>?????
>PROFIT

OLD MEME GUIDES:
https://files.catbox.moe/3az283.jpg
https://files.catbox.moe/e5mzsc.png
https://files.catbox.moe/5ix69v.png
>>
>>102933544
Is this real?

>>102933536
set yourself on fire. ok thank you
>>
>>102933552
mad??
>>
>>102933568
I'm patient. Google "how to set myself on fire". thanks
>>
>>102933585
>regurgitate
blah blah blah
>>
File: file.png (102 KB, 1139x544)
102 KB
102 KB PNG
>>102932584
I have millions of captions, it's not that hard.
>>
File: Untitled.jpg (890 KB, 1204x3440)
890 KB
890 KB JPG
jesus this model comparison someone just posted on reddit is ruthless.

sd3.5 vs flux. which column is sd? the shit column.
>>
>>102933544
based
>>
>>102933713
skill issue
>>
>>102933713
They both look like high quality AI slop and I mean that in a derogatory way
>>
>>102933713
True, but I want to live in the ball.
>>
>>102933713
3/3 it's a tie for me
>>
File: 2024-10-22_00008_.png (1.22 MB, 720x1280)
1.22 MB
1.22 MB PNG
>>102933730
<he won't live in the ball
<he won't be happy
>>
>>102933713
The shit column is the one that says you will tune nothing and will be unhappy.
>>
SD3M finetune status?
>>
>>102933544
People look better in 3.5. I'm so SICK of seeing the shiny buttchin flux face
>>
is there any reason models can't draw empty chessboard. Lack of parameters/dataset?

I think the problem is deeper than that since you can't even do it with lora. Diffusion probably still bad at drawing repeative patterns without hallucinating shits
>>
>>102933792
Because models are fundamentally hallucinating pixels, there really isn't much reasoning that happens with these models.
>>
>>102933713
Right is trying way too hard to be midjourney. Left feels closer to the outputs you would get from a proper base model, without the so-called "aesthetic tuning" (overfitting on artsy images). Also the fucking flux face, it never goes away.
>>
>>102933792
tagging issues.
tagging for different positions on the board
tagging for different types of boards
tagging for positions of a flying birds wing
you can get away with a lot with a 4-5b model, the issue is that other than dalle, all base models are tagged like garbage
>>
>>102933677
And yet no one was able to do that locally outside of mass importing tags from boorus.
Even less captions with artists, nsfw, IPs...
>>
File: 1706239273973789.gif (834 KB, 640x640)
834 KB
834 KB GIF
>>102933792
>>102933803
>INB4 /x/ schizo-analysis on why free masonry is 'cool'
>>
>>102933809
>Also the fucking flux face, it never goes away.
The buttchin is my go to, to detect flux made gens lol.
>>
>>102933792
sounds like a tagging issue. every image related to chess gets tagged with "chess" and there are more chess boards that have pieces on them, so the "chess" word gets stuck with always having some pieces. ("chessboard" too)
"empty chessboard" is two words and most AI are too dumb to handle this.
>I think the problem is deeper than that since you can't even do it with lora.
I bet you can, but the trigger word should not be "chess", instead a gibberish word.
>>
>>102933785
probably never
>>
>>102933833
>should not be "chess", instead a gibberish word.
"empty-chess"
>>
>>102933833
The thing with captioning is they're positive biased, meaning the model describes what it sees, not what it doesn't see. Very few captions use terms like "empty", "blank", "void", etc.
>>
>>102933833
>"empty chessboard" is two words and most AI are too dumb to handle this.
and also this is like "create a room without a pink elephant in it".
>>
>>102933792
train a lora on one good image, with emptychess as its tag?
>>
>>102933814
All you have to do is combine the alt / search title with the caption. I built a lot of my dataset by searching by artist, character, etc.
>>
it seems to me that image models would greatly benefit from the knowledge that video models have.
>>
>>102933892
video models are just image models or what I like to call "motion pictures"
>>
>>102933867
Yeah, but you don't realize that actually ai hasn't been trained to know what a chessboard is. It knows what a checkerboard is. It knows what checkers are. It knows "chess" to be the pieces. No pieces, it's probably not chess.
>>
>>102933892
I'd say they're censored/limited the same way
>>
>>102933867
>without a
Negatives exist
>>
>>102933867
Something to mention though is the Transformers only models that work only in tokens can handle this problem more because the prompt and images share the same token space which gives the model some ability to understand a prompt with "without". It's the same line of features as being able to to modify an image through prompts ie "change the monkey into an elephant".
>>
>>102933874
here >>102933833

base models know "chess" & "board" very well but can't differentiate between each chess piece and board.
Even with lora, it draws extra squares or uneven grid and shit.
>>
>>102933990
Yeah because you literally don't understand that these models are hallucinating blobs that coincidentally align with text prompts. You're like someone who asks ChatGPT to count. These models don't work like that. Even Flux can't even consistently do normal hands and limbs and you're asking for precision chess board reconstruction.
>>
SD3.5 is underwhelming. It's still bad with human anatomy.
>>
>>102934009
>these models are hallucinating blobs that coincidentally align with text prompts
abstractartfags win again
>>
>>102933951
>Transformers only models that work only in tokens can handle this problem more because the prompt and images share the same token space
This is actually quite interesting. There's no making up for that with newer architectures?
>>
>>102934044
I'm not sure what you mean but Emu3 is an example of a model that works like this. Basically you have a model that can Chat, Generation and Caption images.
>>
>>102934009
>because you literally don't understand that
dood, I said that in the original post.
>>
Next Bred

>>102934088
>>102934088
>>102934088
>>
File: asdasdad.jpg (177 KB, 862x652)
177 KB
177 KB JPG
>>102930087
What's arguably the best pony model right now
>>
>>102931393
is there a conference coming up? that's big shit for smaller industries t. am on a trip for one now
>>
>>102931505
that won't happen for China, and that's why it'll be the country that'll win in the long term
>>
>>102931785
>Can't someone make a version of SD3.5 that uses Flux's VAE?
you need to modify the VAE so that it can work with SD3.5 I guess
>>
>>102931836
it's not incompetence, it's intended, they want to remove every single artist and celebritie's name to not get liability, they have 0 balls, only MJ have them
>>
>>102933165
>>102933193
what model you used anon?
>>
>>102934613
flux with some XL facedetailer



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.