/g/ - /ldg/ - Local Diffusion General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/ldg/ - Local Diffusion Genera(...) 10/22/24(Tue)15:14:58 No.102930087

File: the longest dick general.jpg (2.28 MB, 3264x1948)

2.28 MB JPG

/ldg/ - Local Diffusion General Anonymous 10/22/24(Tue)15:14:58 No.102930087 Archived

Discussion of free and open source text-to-image models

Previous /ldg/ bred : >>102926788

Very Busy Day Edition

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://aitracker.art
https://huggingface.co
https://civitai.com
https://tensor.art/models
https://liblib.art
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3

>SD3 Large
https://huggingface.co/stabilityai/stable-diffusion-3.5-large
https://replicate.com/stability-ai/stable-diffusion-3.5-large

>SANA
https://github.com/NVlabs/Sana
https://ea13ab4f5bd9c74f93.gradio.live

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux
DeDistilled Quants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai

Anonymous
10/22/24(Tue)15:16:39 No.102930111

Anonymous 10/22/24(Tue)15:16:39 No.102930111

File: 378799737-4d268d02-906d-4(...).webm (3.64 MB, 2034x1152)

3.64 MB WEBM

What if you had MiniMax at home with an Apache 2.0 licence, but god said:
https://github.com/genmoai/models
>The model requires at least 4 H100 GPUs to run.
https://xcancel.com/genmoai/status/1848762405779574990

Anonymous
10/22/24(Tue)15:16:52 No.102930112

Anonymous 10/22/24(Tue)15:16:52 No.102930112

File: ComfyUI_SD35L_0122.jpg (249 KB, 896x1152)

249 KB JPG

Anonymous
10/22/24(Tue)15:17:04 No.102930117

Anonymous 10/22/24(Tue)15:17:04 No.102930117

Blessed thread of frenship

Anonymous
10/22/24(Tue)15:18:26 No.102930135

Anonymous 10/22/24(Tue)15:18:26 No.102930135

what if you had sd 3.5 large but god said: but now you need to wait for ggufs because vramlet lmao

Anonymous
10/22/24(Tue)15:18:32 No.102930138

Anonymous 10/22/24(Tue)15:18:32 No.102930138

File: ComfyUI_temp_nsaes_00004_.png (3.68 MB, 2240x1440)

3.68 MB PNG

yeah. img2img is fucked, I wonder if this is a safety measure

Anonymous
10/22/24(Tue)15:19:23 No.102930153

Anonymous 10/22/24(Tue)15:19:23 No.102930153

>>102930138
It's just the denoiser that needs to be tweaked

Anonymous
10/22/24(Tue)15:19:27 No.102930155

Anonymous 10/22/24(Tue)15:19:27 No.102930155

>>102930135
you can't try the fp8?
https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8

Anonymous
10/22/24(Tue)15:19:47 No.102930162

Anonymous 10/22/24(Tue)15:19:47 No.102930162

File: 00597-2532502571.png (1.53 MB, 1024x1024)

1.53 MB PNG

Is it over or are we back?

Anonymous
10/22/24(Tue)15:20:15 No.102930170

Anonymous 10/22/24(Tue)15:20:15 No.102930170

What if god gave you local SOTA (sana) but said: you must have skill

Anonymous
10/22/24(Tue)15:20:16 No.102930171

Anonymous 10/22/24(Tue)15:20:16 No.102930171

Blessed thread of keeping up the torch of threadly culture.

Anonymous
10/22/24(Tue)15:21:03 No.102930184

Anonymous 10/22/24(Tue)15:21:03 No.102930184

File: file.jpg (321 KB, 944x1280)

321 KB JPG

>>102930162
back for now

Anonymous
10/22/24(Tue)15:21:30 No.102930193

Anonymous 10/22/24(Tue)15:21:30 No.102930193

>>102930155
>14.5gb
does that include the text encoders? i only have 10gb vram

Anonymous
10/22/24(Tue)15:22:45 No.102930211

Anonymous 10/22/24(Tue)15:22:45 No.102930211

>>102930193
oh nvm the text encoders are in a seperate folder, i guess i'll wait a while

Anonymous
10/22/24(Tue)15:23:00 No.102930214

Anonymous 10/22/24(Tue)15:23:00 No.102930214

File: sana.jpg (74 KB, 1023x1010)

74 KB JPG

>>102930170
weights when

Anonymous
10/22/24(Tue)15:25:21 No.102930245

Anonymous 10/22/24(Tue)15:25:21 No.102930245

>>102930214
have faith in chang

Anonymous
10/22/24(Tue)15:25:32 No.102930249

Anonymous 10/22/24(Tue)15:25:32 No.102930249

always putting in a gen that says "LDG" is debo-coded

Anonymous
10/22/24(Tue)15:25:43 No.102930251

Anonymous 10/22/24(Tue)15:25:43 No.102930251

>>102930193
yeah, 14.5gb is everything (fp8 unet + fp8 text encoder), desu my recommendation would be to put the text encoder into your ram/cpu so that you have spare vram room for the rest
https://reddit.com/r/StableDiffusion/comments/1el79h3/flux_can_be_run_on_a_multigpu_configuration/

Anonymous
10/22/24(Tue)15:27:04 No.102930265

Anonymous 10/22/24(Tue)15:27:04 No.102930265

>>102930251
>yeah, 14.5gb is everything (fp8 unet + fp8 text encoder)
ooh okay, thank you anon, i'll give it a try

Anonymous
10/22/24(Tue)15:29:49 No.102930301

Anonymous 10/22/24(Tue)15:29:49 No.102930301

>>102930214
Send Nvidia a strongly worded email, they're the ones having to approve it.

Anonymous
10/22/24(Tue)15:31:00 No.102930323

Anonymous 10/22/24(Tue)15:31:00 No.102930323

>>102930111
https://arxiv.org/abs/2405.14854
If only this model was a bitnet model (1.58bit), it would be way easier to run it :(

Anonymous
10/22/24(Tue)15:32:15 No.102930334

Anonymous 10/22/24(Tue)15:32:15 No.102930334

>>102930111
>4 H100s
I don't even remotely believe this is a hard requirement. I scanned through their github code, they have some weird multi-machine FSDP distributed implementation (likely taken from the training code).

I mentioned this last thread, but comparing with Allegro:
Allegro: 2.8B parameters, 80k sequence length, 2304 hidden dim, bf16 version runs in 22 GB VRAM
Mochi: 10B parameters, 44k sequence length, 3072 hidden dim, runs in ? VRAM

Memory usage is a fixed amount for all the weights, plus the memory for activations which scales linearly with both sequence length and hidden dimension size. Mochi is half the sequence length, but less than double the hidden dim size, so would theoretically use LESS activation memory per layer.

If you 8 bit quantized Mochi, it's 10GB of weights, compared with 5.6 GB of weights for Allegro bf16. Combine that with the lower activation memory per layer, and it probably can be squeezed to run in 24GB VRAM. Worst cast you'd need to go model parallel with two 24GB cards.

Someone just needs to make an optimized single-machine inference implementation that uses quantization.

Anonymous
10/22/24(Tue)15:33:43 No.102930353

Anonymous 10/22/24(Tue)15:33:43 No.102930353

Can i run 3.5 with a 4080?

Is it good?

Anonymous
10/22/24(Tue)15:34:13 No.102930358

Anonymous 10/22/24(Tue)15:34:13 No.102930358

>>102930334
>Allegro: 2.8B parameters, 80k sequence length, 2304 hidden dim, bf16 version runs in 22 GB VRAM
you include the text encoder to those 22gb vram or is it separate?

Anonymous
10/22/24(Tue)15:34:19 No.102930359

Anonymous 10/22/24(Tue)15:34:19 No.102930359

>>102930353
Yes
No

Anonymous
10/22/24(Tue)15:34:49 No.102930364

Anonymous 10/22/24(Tue)15:34:49 No.102930364

>>102930353
i doubt it I can barely run it on a 5090

Anonymous
10/22/24(Tue)15:35:22 No.102930369

Anonymous 10/22/24(Tue)15:35:22 No.102930369

File: ComfyUI_04170_.png (1.75 MB, 1024x1024)

1.75 MB PNG

Hmm think i just have to crank up the steps at least at 30

Anonymous
10/22/24(Tue)15:36:21 No.102930388

Anonymous 10/22/24(Tue)15:36:21 No.102930388

File: file.png (2.31 MB, 1024x1024)

2.31 MB PNG

>>102930353
>Is it good?
I let you judge
>George Costanza eating a Hamburger, there's a Hatsune Miku plush on the table

Anonymous
10/22/24(Tue)15:37:26 No.102930401

Anonymous 10/22/24(Tue)15:37:26 No.102930401

>>102930358
I'm not including any text encoders at all, because it doesn't matter. You can load the text encoder to VRAM, compute embeddings, then unload it and load the transformer model. The time taken to do the text embeddings this way is still a negligible fraction of overall generation time.

Anonymous
10/22/24(Tue)15:38:45 No.102930424

Anonymous 10/22/24(Tue)15:38:45 No.102930424

>>102930334
>44k sequence length
what does that mean? it's the number of frames or something?

Anonymous
10/22/24(Tue)15:39:00 No.102930427

Anonymous 10/22/24(Tue)15:39:00 No.102930427

>>102930388
Burger looks good at least

Anonymous
10/22/24(Tue)15:39:05 No.102930429

Anonymous 10/22/24(Tue)15:39:05 No.102930429

>>102930388
I'll wait for the finetunes

Anonymous
10/22/24(Tue)15:39:11 No.102930432

Anonymous 10/22/24(Tue)15:39:11 No.102930432

>>102930184
No way SD3.5 just allows you to draw pepes like that???

Anonymous
10/22/24(Tue)15:40:44 No.102930441

Anonymous 10/22/24(Tue)15:40:44 No.102930441

>>102930429
SAI likely put in some fuckery to prevent that

Anonymous
10/22/24(Tue)15:43:15 No.102930474

Anonymous 10/22/24(Tue)15:43:15 No.102930474

>>102930424
The video gets compressed into the latent space, and then that 3d tensor is divided into a long list of embeddings. It's literally the same thing as imagegen models based on DiT, but with an extra time dimension.

So the actual input to the model is a long list of visual embeddings, each representing a tiny image patch from one frame. That's what the context length is referring to. For mochi, it's smaller than allegro due to some combination of slightly lower res video and better spatial + temporal compression by the VAE.

Anonymous
10/22/24(Tue)15:44:55 No.102930493

Anonymous 10/22/24(Tue)15:44:55 No.102930493

>>102930474
I see, but on your post you're using the number of parameters + sequence length + hidden dim + quant to evaluate the vram requirement, what about the numbers of frames and the resolution?

Anonymous
10/22/24(Tue)15:45:58 No.102930507

Anonymous 10/22/24(Tue)15:45:58 No.102930507

File: file.png (1.13 MB, 800x1216)

1.13 MB PNG

>>102930087
>>SANA
>https://github.com/NVlabs/Sana
>https://ea13ab4f5bd9c74f93.gradio.live
real demo link btw https://8876bd28ee2da4b909.gradio.live

Anonymous
10/22/24(Tue)15:47:46 No.102930525

Anonymous 10/22/24(Tue)15:47:46 No.102930525

File: 00606-1258344404.jpg (324 KB, 768x1024)

324 KB JPG

Anonymous
10/22/24(Tue)15:47:52 No.102930527

Anonymous 10/22/24(Tue)15:47:52 No.102930527

>>102930441
Eh i remember when peoole panicked about sdxl being censored and now we have pony

Anonymous
10/22/24(Tue)15:47:55 No.102930529

Anonymous 10/22/24(Tue)15:47:55 No.102930529

>>102930441
The reason they'd do that would be stopping people from finetuning nipples on girls' boobs, and the model does them natively, so they're navigating in a better direction and the poor stuff is due to incompetence.

Anonymous
10/22/24(Tue)15:48:54 No.102930538

Anonymous 10/22/24(Tue)15:48:54 No.102930538

File: d0d295YFT7snmBbC.webm (1.61 MB, 1280x720)

1.61 MB WEBM

Anonymous
10/22/24(Tue)15:49:47 No.102930549

Anonymous 10/22/24(Tue)15:49:47 No.102930549

>>102930538
Demo works now?

Anonymous
10/22/24(Tue)15:49:48 No.102930550

Anonymous 10/22/24(Tue)15:49:48 No.102930550

>>102930538
yeah I've seen them on twitter, they said there's a 480p model and a higher resolution one, I wonder which one it is on the demo

Anonymous
10/22/24(Tue)15:49:55 No.102930551

Anonymous 10/22/24(Tue)15:49:55 No.102930551

>>102930493
Doesn't matter, you can unload the transformer and run the VAE once at the very end to decode the latent space to a video, which is much lighter weight than the diffusion model.

The actual input to the model is what matters, and that's just a big tensor of 44k or 80k embeddings. That tensor IS the video that you're denoising, just in the latent space and represented as a bunch of tiny patches.

Anonymous
10/22/24(Tue)15:51:23 No.102930571

Anonymous 10/22/24(Tue)15:51:23 No.102930571

>>102930551
>Doesn't matter, you can unload the transformer and run the VAE once at the very end to decode the latent space to a video
true, that's what CogVideoX is doing actually

Anonymous
10/22/24(Tue)15:53:48 No.102930596

Anonymous 10/22/24(Tue)15:53:48 No.102930596

>>102930507
We can get a local video model that is better than proprietary video models and yet nobody has managed to make something better than Gradio?
Why is it still so hard?

Anonymous
10/22/24(Tue)15:55:04 No.102930609

Anonymous 10/22/24(Tue)15:55:04 No.102930609

File: file.jpg (306 KB, 3217x1678)

306 KB JPG

>>102930432
>No way SD3.5 just allows you to draw pepes like that???
unfortunatly no

Anonymous
10/22/24(Tue)16:00:38 No.102930672

Anonymous 10/22/24(Tue)16:00:38 No.102930672

>>102930609
Reminds me of Meta.ai's output. Was Meta's image generator ever mentioned on 4chan? Anyone with Whatsapp can use it for free and nobody seemed to care, even Google's ImageFX got mentioned once.

Anonymous
10/22/24(Tue)16:03:40 No.102930706

Anonymous 10/22/24(Tue)16:03:40 No.102930706

File: file.jpg (375 KB, 1485x1596)

375 KB JPG

What I like about SD3.5 is its diversity of outputs, you don't get the same rigid shit everytime like on flux, but on the other hand there's a lack of consistency, for example one image is oversaturated somehow, and the 2 images below are "3d migu" even though I specified for an anime style only, SD3.5 is too inconsistent, probably because it's undertrained or something?

Anonymous
10/22/24(Tue)16:06:02 No.102930737

Anonymous 10/22/24(Tue)16:06:02 No.102930737

>>102930672
this is a local thread anon, that's why we're not talking about it

Anonymous
10/22/24(Tue)16:08:30 No.102930767

Anonymous 10/22/24(Tue)16:08:30 No.102930767

>>102930596
Because Gradio works just fine and requires minimal work to implement.

Anonymous
10/22/24(Tue)16:08:58 No.102930771

Anonymous 10/22/24(Tue)16:08:58 No.102930771

>>102930706
read their release note

Anonymous
10/22/24(Tue)16:10:28 No.102930792

Anonymous 10/22/24(Tue)16:10:28 No.102930792

>>102930771
this?
https://stability.ai/news/introducing-stable-diffusion-3-5

Anonymous
10/22/24(Tue)16:11:58 No.102930815

Anonymous 10/22/24(Tue)16:11:58 No.102930815

>>102930706
>SD3.5 is too inconsistent
Isn't all this because it's unet? Unet is like this, no way around that.
Flux is DiT, so it's super-consistent, but then all seeds look very similar.

Anonymous
10/22/24(Tue)16:13:04 No.102930831

Anonymous 10/22/24(Tue)16:13:04 No.102930831

>Consistency is bad now

Anonymous
10/22/24(Tue)16:13:10 No.102930833

Anonymous 10/22/24(Tue)16:13:10 No.102930833

>>102930737
I said 4chan, didn't seem like /SDG/ mentioned it either, and they could say
>this is a stable diffusion thread anon, that's why we're not talking about it

Anonymous
10/22/24(Tue)16:13:16 No.102930834

Anonymous 10/22/24(Tue)16:13:16 No.102930834

>>102930815
SD3 is also a DiT I believe

Anonymous
10/22/24(Tue)16:13:42 No.102930839

Anonymous 10/22/24(Tue)16:13:42 No.102930839

>>102930706
>>102930792
It's from QK Norms which is similar to what Pixart uses, makes training faster and more efficient but you lose some stability deep in the weights.

Anonymous
10/22/24(Tue)16:14:14 No.102930844

Anonymous 10/22/24(Tue)16:14:14 No.102930844

>>102930706
>you don't get the same rigid shit everytime like on flux
that doesn't happen to me on flux

Anonymous
10/22/24(Tue)16:14:53 No.102930849

Anonymous 10/22/24(Tue)16:14:53 No.102930849

>>102930839
>makes training faster and more efficient but you lose some stability deep in the weights.
that's fucking retarded, couldn't they wait a bit more so that they got a better model for ever at the end? why do they always want to rush

Anonymous
10/22/24(Tue)16:15:36 No.102930860

Anonymous 10/22/24(Tue)16:15:36 No.102930860

>>102930849
Why because your vague prompt gets more artistic interpretation from the model? Boo hoo.

Anonymous
10/22/24(Tue)16:16:59 No.102930877

Anonymous 10/22/24(Tue)16:16:59 No.102930877

>>102930860
I asked for an anime style, not a 3d style, and it gave me 3d style, that's just a mistake from the model, and you're a retard if you think otherwise

Anonymous
10/22/24(Tue)16:18:36 No.102930899

Anonymous 10/22/24(Tue)16:18:36 No.102930899

>>102930877
Or maybe you're suffering from negative bias and you ignore every time other models interpret your prompt. It's not like Flux is the king of adherence either.

Anonymous
10/22/24(Tue)16:24:45 No.102930969

Anonymous 10/22/24(Tue)16:24:45 No.102930969

File: sd3.5 gen from twatter.jpg (170 KB, 1280x768)

170 KB JPG

>>102930815
Flux is distilled to produce not very diverse set of nice images. It will never have knowledge of the full, trained from scratch model and because of that 3.5 is a much better base for finetuning, even if it doesn't have good quality in all generations out-of-the-box.

Anonymous
10/22/24(Tue)16:27:25 No.102930996

Anonymous 10/22/24(Tue)16:27:25 No.102930996

>>102930706
That looks like shit kek.

Anonymous
10/22/24(Tue)16:28:24 No.102931011

Anonymous 10/22/24(Tue)16:28:24 No.102931011

File: file.jpg (333 KB, 1465x1624)

333 KB JPG

>>102930844
>that doesn't happen to me on flux
lucky you I guess, because flux is really rigid and tend to give you really similar pictures

Anonymous
10/22/24(Tue)16:28:27 No.102931012

Anonymous 10/22/24(Tue)16:28:27 No.102931012

>>102930996
>why are there no mid sized models
>wah everything looks like shit
Any day now, a Flux finetune lmao
It's great because the one you kept posting turned out to be a shit
lmao
seriously

Anonymous
10/22/24(Tue)16:29:25 No.102931027

Anonymous 10/22/24(Tue)16:29:25 No.102931027

>>102931012
>Any day now, a Flux finetune lmao
it happened though
https://huggingface.co/SG161222/Verus_Vision_1.0b

Anonymous
10/22/24(Tue)16:29:26 No.102931028

Anonymous 10/22/24(Tue)16:29:26 No.102931028

>>102931011
at what guidance value?

Anonymous
10/22/24(Tue)16:30:10 No.102931034

Anonymous 10/22/24(Tue)16:30:10 No.102931034

>>102931027
Yeah, it's shit.
hahaha, omg I can't believe you were waiting for that
Doesn't even beat fp8 dev

Anonymous
10/22/24(Tue)16:30:32 No.102931037

Anonymous 10/22/24(Tue)16:30:32 No.102931037

>>102931012
You seem to have either responded to the wrong post or think I'm someone specific in this thread when that was my first post.

Anonymous
10/22/24(Tue)16:31:06 No.102931043

Anonymous 10/22/24(Tue)16:31:06 No.102931043

>>102930831
It is bad if it's at the cost of creativity, the most creative model ever was Craiyon (formetly mini-dalle) and its inconsistency was so off the charts you couldn't generate a face.

Anonymous
10/22/24(Tue)16:31:17 No.102931047

Anonymous 10/22/24(Tue)16:31:17 No.102931047

>>102931037
seemed to work because I got the reply I wanted :) because there's a coping Flux user in here thinking someone is going to drop a $20k finetune

Anonymous
10/22/24(Tue)16:34:19 No.102931072

Anonymous 10/22/24(Tue)16:34:19 No.102931072

>>102931027
According to his donation page, he's doing finetuning with a single 4090. I doubt he used a lot of images. There's no way it's going to be good.

Anonymous
10/22/24(Tue)16:37:16 No.102931106

Anonymous 10/22/24(Tue)16:37:16 No.102931106

>>102931043
No I want the things I type to appear, you can get creative with your prompts.

Anonymous
10/22/24(Tue)16:37:28 No.102931108

Anonymous 10/22/24(Tue)16:37:28 No.102931108

>>102930834
Ah, well, kudos for achieving diversity of outputs with it then, and there's nothing wrong with the tech, black forest labs messed it up, and that's one thing I can say Stability did better.

Anonymous
10/22/24(Tue)16:37:43 No.102931111

Anonymous 10/22/24(Tue)16:37:43 No.102931111

>>102931072
It's not going to be good because you need to do like 10 epochs on a million images to properly stamp in new concepts. If he's seriously doing a single 4090 that's like 15 seconds a step at batch size 1. At best he's doing what a merged Lora would do.

Anonymous
10/22/24(Tue)16:38:23 No.102931117

Anonymous 10/22/24(Tue)16:38:23 No.102931117

>>102931034
you tried it?

Anonymous
10/22/24(Tue)16:38:41 No.102931122

Anonymous 10/22/24(Tue)16:38:41 No.102931122

>>102931047
Honestly as an outsider that came to this thread after hearing the news about the new models, you seem a bit obsessed with whoever that guy is living rent free in your head.

Anonymous
10/22/24(Tue)16:39:04 No.102931125

Anonymous 10/22/24(Tue)16:39:04 No.102931125

>>102931117
Yes, for the same effect find a realism lora for Flux.

Anonymous
10/22/24(Tue)16:40:03 No.102931141

Anonymous 10/22/24(Tue)16:40:03 No.102931141

>>102930969
Wrong, Flux Pro doesn't have good diversity either, that's why I suspected it was DiT's fault.
But I'm glad I was wrong because unet just won't give text that looks as good as thing.

Anonymous
10/22/24(Tue)16:40:54 No.102931154

Anonymous 10/22/24(Tue)16:40:54 No.102931154

>>102931141
DiT's major flaw is it requires saturation, it needs the boomer prompts which increases quality with each word.

Anonymous
10/22/24(Tue)16:42:09 No.102931168

Anonymous 10/22/24(Tue)16:42:09 No.102931168

File: ComfyUI_04176_.png (1.33 MB, 1024x1024)

1.33 MB PNG

Anonymous
10/22/24(Tue)16:43:29 No.102931183

Anonymous 10/22/24(Tue)16:43:29 No.102931183

File: 1708395739709226.png (6 KB, 574x70)

6 KB PNG

oh non ono sana bros this cant be happening

Anonymous
10/22/24(Tue)16:43:53 No.102931189

Anonymous 10/22/24(Tue)16:43:53 No.102931189

File: ComfyUI_04178_.png (1.24 MB, 1024x1024)

1.24 MB PNG

dawwww

Anonymous
10/22/24(Tue)16:44:27 No.102931195

Anonymous 10/22/24(Tue)16:44:27 No.102931195

>>102931168
it doesn't look really good, desu I couldn't tell the difference between SD3-2b (medium) and SD3-8b here, they're on the same range of quality, the fuck did they do all this time?

Anonymous
10/22/24(Tue)16:44:29 No.102931196

Anonymous 10/22/24(Tue)16:44:29 No.102931196

>>102931183
>username
>retarded pajeet

Anonymous
10/22/24(Tue)16:45:33 No.102931204

Anonymous 10/22/24(Tue)16:45:33 No.102931204

>>102931106
That was me at the beginning, then I realized my best generations ever had something that the model put in there that I never could have imagined, sometimes it became a new fetish, one thing became my main fetish because I had never seen that before.
When the model only does what you ask for, and not more, you're missing out.

Anonymous
10/22/24(Tue)16:46:40 No.102931217

Anonymous 10/22/24(Tue)16:46:40 No.102931217

>>102931195
Sadly a lot of the quality is in the training and some of the concepts are overbaked. It seems to me the more they try to fix the training process (magic deduping, overbake detection, etc) the more they fuck things up.

Anonymous
10/22/24(Tue)16:46:54 No.102931221

Anonymous 10/22/24(Tue)16:46:54 No.102931221

>>102931204
>When the model only does what you ask for, and not more, you're missing out.
this, it's fucking boring if it only does it job, I want surprises, something like Dalle-3, you can go for a simple prompt and the model can add a shit ton of details that weren't on my orders but are still relevant to the image, that's where the fun is

Anonymous
10/22/24(Tue)16:48:40 No.102931235

Anonymous 10/22/24(Tue)16:48:40 No.102931235

File: ComfyUI_04179_.png (1.32 MB, 1024x1024)

1.32 MB PNG

>>102931195

never tried sd3 , but this new 3.5 doen't impress me that much, messed up anatomy
It is up to date with 'famous' people, i guess...

Anonymous
10/22/24(Tue)16:48:51 No.102931236

Anonymous 10/22/24(Tue)16:48:51 No.102931236

>>102931043
I loved mini dalle, has that model been released?

Anonymous
10/22/24(Tue)16:48:55 No.102931239

Anonymous 10/22/24(Tue)16:48:55 No.102931239

whats with this faggot schizo mentioning how Flux is bloated every time a shitty alt-model releases? >>102917495 as if Sana and SD3 being shit is a good thing because Flux is too big? Just rushing to the slide defense every time someone says these turdbakes look like shit. sd3 8b being bad is 100% a fault of the trash datatset

Anonymous
10/22/24(Tue)16:49:39 No.102931250

Anonymous 10/22/24(Tue)16:49:39 No.102931250

>>102931239
>sd3 8b being bad is 100% a fault of the trash datatset
this, 100% this

Anonymous
10/22/24(Tue)16:50:15 No.102931257

Anonymous 10/22/24(Tue)16:50:15 No.102931257

>>102931239
you know you are really obvious because you say turd a lot
it would help if you didn't schizo post back
also you are completely ass blasted because your savior Flux fine tune is, as you say, a turd.

lmao lol

Anonymous
10/22/24(Tue)16:50:41 No.102931260

Anonymous 10/22/24(Tue)16:50:41 No.102931260

File: file.png (428 KB, 409x447)

428 KB PNG

>>102931235
kek he looks like putin in there

Anonymous
10/22/24(Tue)16:51:29 No.102931268

Anonymous 10/22/24(Tue)16:51:29 No.102931268

>>102931195
They added nipples.
https://files.catbox.moe/w0katp.png <- SD3.5Large

Anonymous
10/22/24(Tue)16:52:03 No.102931274

Anonymous 10/22/24(Tue)16:52:03 No.102931274

there he goes. what's this jeet's endgame? unironically what is wrong with him? treating it like a console war

Anonymous
10/22/24(Tue)16:52:40 No.102931281

Anonymous 10/22/24(Tue)16:52:40 No.102931281

>>102931274
everything except Flux is a turd for you, what's your end game?

Anonymous
10/22/24(Tue)16:53:09 No.102931287

Anonymous 10/22/24(Tue)16:53:09 No.102931287

>>102931221
>something like Dalle-3
Well, with today's so many releases maybe we will get open Dalle-3 at last today!

Anonymous
10/22/24(Tue)16:53:17 No.102931289

Anonymous 10/22/24(Tue)16:53:17 No.102931289

>>102931268
ok I guess they stopped acting like lunatics towards nudity, which is always a good thing, but the model quality could've been way better, it's a 8b model and it still looks bad, especially the details

Anonymous
10/22/24(Tue)16:54:18 No.102931298

Anonymous 10/22/24(Tue)16:54:18 No.102931298

>>102931287
>we will get open Dalle-3 at last today!
you're talking about Sana?

Anonymous
10/22/24(Tue)16:55:25 No.102931311

Anonymous 10/22/24(Tue)16:55:25 No.102931311

>>102931298
>Sana
no, that was dead before it even got released.

Anonymous
10/22/24(Tue)16:55:25 No.102931312

Anonymous 10/22/24(Tue)16:55:25 No.102931312

>>102931287
We got sora at home, except you can't run it at home kek

Anonymous
10/22/24(Tue)16:55:33 No.102931314

Anonymous 10/22/24(Tue)16:55:33 No.102931314

In the LLM world there's a pattern where the more creative models are dumber, while the smarter models are less creative. We want creative models but not necessarily at the cost of coherence/smarts. Something the LLM users have found to make the smart models more creative is to use the {{random}} function, which is a part of the prompt. I think I heard there was such a thing in the image gen frontends as well. I believe it was called wildcards? Basically it lets you insert random strings in the prompt. So in LLM world you could do something like

>Write in the style of {{random: Dracula from Castlevania, Kizuna AI the vtuber, Gordon Ramsey}}.

and each time you press generate, it would pick from one of the strings so you'd get a different style for each new reply. I imagine this could be pretty powerful for smart but uncreative image models as well as you use more and more wildcards in different parts of the prompt.

Anonymous
10/22/24(Tue)16:55:50 No.102931320

Anonymous 10/22/24(Tue)16:55:50 No.102931320

>>102931274
you vill get compressed eyes
you vill get mangled limbs
you vill generate mj slop
you vill like sana

Anonymous
10/22/24(Tue)16:56:06 No.102931326

Anonymous 10/22/24(Tue)16:56:06 No.102931326

File: file.png (3.64 MB, 2638x1452)

3.64 MB PNG

>>102931312
>We got sora at home, except you can't run it at home kek

Anonymous
10/22/24(Tue)16:56:19 No.102931330

Anonymous 10/22/24(Tue)16:56:19 No.102931330

>>102931281
i dont give a shit about flux man, what the fuck is wrong with your brain? flux is shit because it's rigid untrainable junk. sd3 and sana are shit because they're melted synthetic slop trained on even worse data. none of the 3 are really good. seriously get some help, you're jumping at shadows over nothing

Anonymous
10/22/24(Tue)16:56:58 No.102931335

Anonymous 10/22/24(Tue)16:56:58 No.102931335

>>102931330
you should get checked for depression because everything sucks to you
maybe it's a you problem

Anonymous
10/22/24(Tue)16:57:36 No.102931343

Anonymous 10/22/24(Tue)16:57:36 No.102931343

>>102930529
>poor stuff is due to incompetence
It doesn't know dicks, many artists, a lot of known people, genitals.
It's intended.

Anonymous
10/22/24(Tue)16:58:13 No.102931347

Anonymous 10/22/24(Tue)16:58:13 No.102931347

Sana knows artists
Flux and SD3 do not

Anonymous
10/22/24(Tue)16:58:58 No.102931357

Anonymous 10/22/24(Tue)16:58:58 No.102931357

File: file.png (43 KB, 675x164)

43 KB PNG

>>102931343
>It's intended.
facts

Anonymous
10/22/24(Tue)16:59:23 No.102931362

Anonymous 10/22/24(Tue)16:59:23 No.102931362

File: Mochi.webm (907 KB, 1280x720)

907 KB WEBM

Guys, Mochi passes the test

Anonymous
10/22/24(Tue)16:59:49 No.102931365

Anonymous 10/22/24(Tue)16:59:49 No.102931365

>>102931314
It's not hard to have a diverse image model, you just have diverse captions and diverse images and then avoid the temptation to use a DPO.

Anonymous
10/22/24(Tue)17:00:03 No.102931367

Anonymous 10/22/24(Tue)17:00:03 No.102931367

>>102931335
let's not play pretend now. these models are just bad, not need to cope over it. sana was not the bigma everyone was hoping for. sd3 is just as bad as it was months ago, now with extra parameters. flux remains untunable airbrushed slop and is nowhere near dalle at home. these models are all underwhelming except that video one which isn't even runnable

Anonymous
10/22/24(Tue)17:00:06 No.102931370

Anonymous 10/22/24(Tue)17:00:06 No.102931370

>>102930135
3.5 Medium still to come on the 29th too BTW. Large is looking good so far though, it can do proper booba with nipples out of the box even, even with the 4-step turbo version.

Anonymous
10/22/24(Tue)17:00:08 No.102931371

Anonymous 10/22/24(Tue)17:00:08 No.102931371

>>102931357
>Finetune for it
Lykon, why would people spend thousands of dollars to finetune this giant 8b turd? You're delusional

Anonymous
10/22/24(Tue)17:00:14 No.102931372

Anonymous 10/22/24(Tue)17:00:14 No.102931372

>>102930849
>couldn't they wait a bit more
apparently everyone and their dog decided to release today
so either it's a crazy coincidence (could be), or they know about some legislation that will be pushed soon that would severely restrict image models in the US

Anonymous
10/22/24(Tue)17:00:53 No.102931380

Anonymous 10/22/24(Tue)17:00:53 No.102931380

>>102931367
Seriously anon, you should take a break from this, come back in five years, being on the cutting edge isn't for you.

Anonymous
10/22/24(Tue)17:00:54 No.102931381

Anonymous 10/22/24(Tue)17:00:54 No.102931381

>>102931268
Really? Looks like they added areolas and forgot the nipples

Anonymous
10/22/24(Tue)17:01:09 No.102931384

Anonymous 10/22/24(Tue)17:01:09 No.102931384

>>102931372
>in the US
SAI is an UK company though kek

Anonymous
10/22/24(Tue)17:02:09 No.102931393

Anonymous 10/22/24(Tue)17:02:09 No.102931393

>>102931372
I'm actually going to bet there's some sort of deadline for some sort of AI event we don't know about. Maybe even for Nvidia's conference.

Anonymous
10/22/24(Tue)17:02:23 No.102931398

Anonymous 10/22/24(Tue)17:02:23 No.102931398

>>102931370
>Large is looking good so far though
no it fucking doesn't, the details are horrible, the anatomy is still fucked, it's a fucking 8b model there's no excuse this time, they fucked it up

Anonymous
10/22/24(Tue)17:02:28 No.102931401

Anonymous 10/22/24(Tue)17:02:28 No.102931401

>>102931236
Yes:
https://huggingface.co/dalle-mini/dalle-mini
But it's garbage compared to this one:
https://huggingface.co/dalle-mini/dalle-mega/tree/main
Which is garbage compared to... one I'll post later...

Anonymous
10/22/24(Tue)17:02:52 No.102931408

Anonymous 10/22/24(Tue)17:02:52 No.102931408

>>102931372
>some legislation that will be pushed soon that would severely restrict image models in the US
Big if True

Anonymous
10/22/24(Tue)17:03:04 No.102931412

Anonymous 10/22/24(Tue)17:03:04 No.102931412

>>102931380
none of these jeeted local releases are cutting edge. training on dreamshaper outputs with sai compute is nothing more than a griftbake. same with sana designed to guzzle research grants.

Anonymous
10/22/24(Tue)17:03:23 No.102931417

Anonymous 10/22/24(Tue)17:03:23 No.102931417

>>102930529
Type this prompt I got from JoyCaption into SD 3.5 somewhere where there's not Web API level prompt filtering (e.g. locally), it really will do it out of the box, I swear:
"a photograph of a topless woman with a light skin tone and platinum blonde hair styled in loose waves that cascade over her shoulders. She has striking blue eyes, full lips, and a slender, toned physique. Her breasts are medium-sized with prominent, erect nipples. She has an intricate tattoo of two roses, one red and one white, with detailed green leaves and vines, covering her upper chest and extending to her shoulders. Additional tattoos are visible on her left arm, which has a large, elaborate design, and her right arm, which has a smaller, more intricate design. Her left hip features a tattoo of a rose. The background is a plain, neutral beige color, which helps to focus attention on the subject. The lighting is soft and even, highlighting her smooth skin and the vibrant colors of her tattoos. The overall composition of the image is simple yet powerful, emphasizing both her natural beauty and the artistic elements of her body art. The photograph is professionally taken, with a clear and crisp quality that brings out every detail."

Anonymous
10/22/24(Tue)17:03:30 No.102931418

Anonymous 10/22/24(Tue)17:03:30 No.102931418

>>102931412
Yeah you're completely fried, get off the internet.

Anonymous
10/22/24(Tue)17:04:31 No.102931430

Anonymous 10/22/24(Tue)17:04:31 No.102931430

>>102931367
>sana was not the bigma everyone was hoping for
Utter retardation you have

Anonymous
10/22/24(Tue)17:05:00 No.102931436

Anonymous 10/22/24(Tue)17:05:00 No.102931436

>>102931418
>no argument
>no u
yeah maybe post some images in your defense? these models look like shit. the only thing fried here is your trash quality outputs

Anonymous
10/22/24(Tue)17:05:35 No.102931440

Anonymous 10/22/24(Tue)17:05:35 No.102931440

>>102931398
Nice bait lmao, I'd post topless 4-step Turbo gens if this wasn't a blue board, it literally does nudity better than any version of Flux does without Loras

Anonymous
10/22/24(Tue)17:05:37 No.102931441

Anonymous 10/22/24(Tue)17:05:37 No.102931441

File: file.png (115 KB, 960x1050)

115 KB PNG

Maybe I should just fucking lower my expectations.

Anonymous
10/22/24(Tue)17:05:41 No.102931442

Anonymous 10/22/24(Tue)17:05:41 No.102931442

>>102931365
Idk if I'd call that easy. Building a both large and diverse dataset is always a challenge in ML.

Anonymous
10/22/24(Tue)17:06:13 No.102931449

Anonymous 10/22/24(Tue)17:06:13 No.102931449

WHY THE FUCK CAN'T 3.5 DO HIGH RESOLUTIONS!

Anonymous
10/22/24(Tue)17:06:25 No.102931451

Anonymous 10/22/24(Tue)17:06:25 No.102931451

File: ifx618.jpg (300 KB, 1024x1024)

300 KB JPG

Anonymous
10/22/24(Tue)17:06:25 No.102931452

Anonymous 10/22/24(Tue)17:06:25 No.102931452

>>102931436
Anon, everything is shit to you because you unironically have severe depression. You should leave and come back when things stop looking like shit to you. Fix your life. I can't believe you're getting this upset about experimental software.

Anonymous
10/22/24(Tue)17:07:22 No.102931461

Anonymous 10/22/24(Tue)17:07:22 No.102931461

>>102931268
no dick

Anonymous
10/22/24(Tue)17:07:42 No.102931466

Anonymous 10/22/24(Tue)17:07:42 No.102931466

>>102931451
shame this isnt local. very unlike anything we have style wise desu.

Anonymous
10/22/24(Tue)17:08:17 No.102931473

Anonymous 10/22/24(Tue)17:08:17 No.102931473

>>102931442
Not really, LAION for SD 1.5 proved the concept. The problem is everyone keeps on filtering their datasets based on flawed reasoning instead of letting the chaos happen. You'll notice these models make two major mistakes: overfiltering the data, and overtuning the outputs with a DPO. That's why everything turns into slop.

Anonymous
10/22/24(Tue)17:09:30 No.102931484

Anonymous 10/22/24(Tue)17:09:30 No.102931484

>>102931452
ooooooh im close to cooming sir please give me the jeet schizo rant again i beg you

Anonymous
10/22/24(Tue)17:09:57 No.102931490

Anonymous 10/22/24(Tue)17:09:57 No.102931490

Can we stop talking about Flux.
What about other models? Right now, what matters is how good of a base model it is so that we can reason about what future models should be fine tuned off of. Does SD3.5 perform better than SDXL base at the things we care about like non-censored anatomy? Can SDXL base do higher resolutions than 1024x1024? What about Sana that people mentioned? I haven't used any of these models actually, so I don't really know.

Anonymous
10/22/24(Tue)17:10:42 No.102931496

Anonymous 10/22/24(Tue)17:10:42 No.102931496

>>102931484
I'm not the one that says everything is shit and it's really upsetting you even though you just got many models to try out today, I don't know why you're here, this obviously upsets you. Take a break, come back in a year, you obviously can't handle the research.

Anonymous
10/22/24(Tue)17:11:32 No.102931501

Anonymous 10/22/24(Tue)17:11:32 No.102931501

Looking at 8b, it genuinely looks like 2b medium. Is the hype just Stockholm syndrome after Sana flopped and people want a life raft?

Anonymous
10/22/24(Tue)17:11:59 No.102931503

Anonymous 10/22/24(Tue)17:11:59 No.102931503

>>102931490
you need to ask this mans expert opinion about sana he will be happy to oblige
>>102931496

Anonymous
10/22/24(Tue)17:12:24 No.102931505

Anonymous 10/22/24(Tue)17:12:24 No.102931505

>>102931384
Anything the US does in legislation in regards to bleeding edge stuff is copied everywhere else.
Anything the US is obsessed culturally bleeds over in every other parts of the world at some point.
It's like Apple and the rest of the smartphone brands applied to countries/cultures.

Anonymous
10/22/24(Tue)17:12:25 No.102931506

Anonymous 10/22/24(Tue)17:12:25 No.102931506

File: file.png (1.39 MB, 1216x928)

1.39 MB PNG

>>102931501
>Sana flopped
Said who?

Anonymous
10/22/24(Tue)17:12:45 No.102931509

Anonymous 10/22/24(Tue)17:12:45 No.102931509

sana is shit
flux is shit
all sd3 variants are shit
xl for another year, at least

Anonymous
10/22/24(Tue)17:13:27 No.102931513

Anonymous 10/22/24(Tue)17:13:27 No.102931513

File: file.png (1.9 MB, 1216x928)

1.9 MB PNG

Anonymous
10/22/24(Tue)17:13:37 No.102931516

Anonymous 10/22/24(Tue)17:13:37 No.102931516

>>102931440
Actually I'll just box them:
https://files.catbox.moe/ckafyh.png
https://files.catbox.moe/rc6ni1.png

Both genned with the 4-step Turbo version (hence the kinda plasticy Fluxish look). The more saturated one is DPM++ 2M SGM Uniform, less saturated one is Euler SGM Uniform.

Anonymous
10/22/24(Tue)17:14:32 No.102931525

Anonymous 10/22/24(Tue)17:14:32 No.102931525

>>102931509
IS OVER, THE ONLY GOOD THING WE HAVE IS A FUCKING OUT OF VRAM VIDEO MODEL!!!! Time to left the hobby and seek mental help.

Anonymous
10/22/24(Tue)17:14:39 No.102931526

Anonymous 10/22/24(Tue)17:14:39 No.102931526

>>102931490
SDXL is stagnant because of a bad text encoder and generally slow training. Sana is going to be the small model king that replaces SD 1.5/SDXL because it can actually be trained locally. Feel free to refer back to the 600m Pixart gens from the other thread. Sana also has some good gens so I think outside of skill issue gens, it's going to be good especially after a finetune (which anyone can do). Pixart significantly improves with finetuning.

Anonymous
10/22/24(Tue)17:14:48 No.102931528

Anonymous 10/22/24(Tue)17:14:48 No.102931528

File: file.png (1.84 MB, 1216x928)

1.84 MB PNG

Anonymous
10/22/24(Tue)17:15:19 No.102931532

Anonymous 10/22/24(Tue)17:15:19 No.102931532

>>102931289
No, penises dont exist, well, cronenburg ones.

Anonymous
10/22/24(Tue)17:15:25 No.102931535

Anonymous 10/22/24(Tue)17:15:25 No.102931535

>booba in a base model
WE ARE SO BACK BOIS

Anonymous
10/22/24(Tue)17:16:22 No.102931540

Anonymous 10/22/24(Tue)17:16:22 No.102931540

>>102930369
noob mistake baka

Anonymous
10/22/24(Tue)17:17:09 No.102931546

Anonymous 10/22/24(Tue)17:17:09 No.102931546

File: 00012-1343596780.jpg (567 KB, 1728x1296)

567 KB JPG

Anonymous
10/22/24(Tue)17:17:44 No.102931551

Anonymous 10/22/24(Tue)17:17:44 No.102931551

>>102931526
if it was 3b maybe. right now there is no reason to replace xl with a smaller model. maybe a from scratch model on the architecture, but neither 1.6b sana nor 0.6b will ever get mass adoption because they're simply worse

Anonymous
10/22/24(Tue)17:18:08 No.102931554

Anonymous 10/22/24(Tue)17:18:08 No.102931554

>>102931532
The future of all company trained models will be this weird thing where human anatomy and sexuality is always wrong or scrambled and no one known exists.

Anonymous
10/22/24(Tue)17:19:13 No.102931558

Anonymous 10/22/24(Tue)17:19:13 No.102931558

>>102931551
Wrong, 600m Pixart generates kino gens. So 1.6B Sona is going to generate even more kino gens. It's that simple.
It's ironic too because you just saw both 2B and 8B SD3 being ass. So maybe it's not all about parameters.

Anonymous
10/22/24(Tue)17:20:36 No.102931575

Anonymous 10/22/24(Tue)17:20:36 No.102931575

>>102931526
holy heck, im so excited to fix the autoencoder compression with just a little bit of training!!!!!!!!!
uhhh let me just sprinkle a bit of parameters as well
this is it, we are SO back

Anonymous
10/22/24(Tue)17:20:48 No.102931580

Anonymous 10/22/24(Tue)17:20:48 No.102931580

>>102931532
I think half the reason they never do downstairs is cause they don't wanna risk having the "constantly gives dudes pussies" problem that like 80 or more percent of SDXL checkpoints still have to this day lol

Anonymous
10/22/24(Tue)17:21:26 No.102931589

Anonymous 10/22/24(Tue)17:21:26 No.102931589

>>102931575
Somehow I think you'll be upset no matter what

Anonymous
10/22/24(Tue)17:21:56 No.102931591

Anonymous 10/22/24(Tue)17:21:56 No.102931591

>>102931546
ya thats the stuff i've been missing from these generals

Anonymous
10/22/24(Tue)17:23:52 No.102931607

Anonymous 10/22/24(Tue)17:23:52 No.102931607

>>102931589
thats right, you got me, im a flux agent sent to sabotage the great PIXART supremacy
im malding here
im balding here
im farting here
but im still not using sana IM SORRY

Anonymous
10/22/24(Tue)17:24:01 No.102931608

Anonymous 10/22/24(Tue)17:24:01 No.102931608

File: file.png (1.6 MB, 832x1312)

1.6 MB PNG

Sometimes it seems that anon has a little too much of a vested interest in these fun toys

Anonymous
10/22/24(Tue)17:24:45 No.102931618

Anonymous 10/22/24(Tue)17:24:45 No.102931618

>>102931607
No, I've determined that you are severely depressed and that you're incapable of experiencing happiness.

Anonymous
10/22/24(Tue)17:25:01 No.102931624

Anonymous 10/22/24(Tue)17:25:01 No.102931624

>>102931558
nah it generates melty nonsense. if pixart was "kino" it would've been adopted. it never was. dead-end bad models

Anonymous
10/22/24(Tue)17:26:18 No.102931636

Anonymous 10/22/24(Tue)17:26:18 No.102931636

File: file.png (2 MB, 832x1312)

2 MB PNG

Excuse the kino posting

Anonymous
10/22/24(Tue)17:26:37 No.102931640

Anonymous 10/22/24(Tue)17:26:37 No.102931640

File: ComfyUI_04208_.png (1.83 MB, 1024x1024)

1.83 MB PNG

Anonymous
10/22/24(Tue)17:28:42 No.102931656

Anonymous 10/22/24(Tue)17:28:42 No.102931656

>>102931618
thats right... at first i was working for the great PIXART empire
but then, sana had sex with my dog....
and now i have joined the flux evil secret society and swore a revenge...

Anonymous
10/22/24(Tue)17:28:44 No.102931657

Anonymous 10/22/24(Tue)17:28:44 No.102931657

>>102931546
post more man i need my fix

Anonymous
10/22/24(Tue)17:29:36 No.102931663

Anonymous 10/22/24(Tue)17:29:36 No.102931663

>>102931558
If you think SD3.5 is ass there's no way you like anything or ever would lol

Anonymous
10/22/24(Tue)17:29:45 No.102931664

Anonymous 10/22/24(Tue)17:29:45 No.102931664

File: 00025-1343596776.jpg (588 KB, 1728x1296)

588 KB JPG

>>102931657

Anonymous
10/22/24(Tue)17:32:22 No.102931679

Anonymous 10/22/24(Tue)17:32:22 No.102931679

File: file.png (1.27 MB, 640x768)

1.27 MB PNG

Anonymous
10/22/24(Tue)17:35:44 No.102931706

Anonymous 10/22/24(Tue)17:35:44 No.102931706

File: file.png (2.2 MB, 1248x928)

2.2 MB PNG

Anonymous
10/22/24(Tue)17:36:37 No.102931714

Anonymous 10/22/24(Tue)17:36:37 No.102931714

File: file.png (1.17 MB, 640x768)

1.17 MB PNG

Anonymous
10/22/24(Tue)17:36:38 No.102931716

Anonymous 10/22/24(Tue)17:36:38 No.102931716

Sana won

Anonymous
10/22/24(Tue)17:36:42 No.102931718

Anonymous 10/22/24(Tue)17:36:42 No.102931718

File: 0192aed6-4d14-77a9-8536-2(...).jpg (135 KB, 1024x1024)

135 KB JPG

is dis cute?

Anonymous
10/22/24(Tue)17:36:52 No.102931720

Anonymous 10/22/24(Tue)17:36:52 No.102931720

File: file.png (2.13 MB, 1248x928)

2.13 MB PNG

Anonymous
10/22/24(Tue)17:40:25 No.102931745

Anonymous 10/22/24(Tue)17:40:25 No.102931745

File: 00097-636132240.png (1.74 MB, 1240x1240)

1.74 MB PNG

Anonymous
10/22/24(Tue)17:40:25 No.102931746

Anonymous 10/22/24(Tue)17:40:25 No.102931746

File: file.png (856 KB, 640x768)

856 KB PNG

Anonymous
10/22/24(Tue)17:40:42 No.102931749

Anonymous 10/22/24(Tue)17:40:42 No.102931749

File: ComfyUI_04220_.png (2.09 MB, 1024x1024)

2.09 MB PNG

Anonymous
10/22/24(Tue)17:41:25 No.102931755

Anonymous 10/22/24(Tue)17:41:25 No.102931755

oh sweet new lykonslop just dropped? surely its 2.5x as good as xl based on that beefy param count, and 10x as good as pixart!

Anonymous
10/22/24(Tue)17:41:46 No.102931759

Anonymous 10/22/24(Tue)17:41:46 No.102931759

>>102931401
Okay, found it:
https://github.com/kakaobrain/karlo
Which can be used here:
https://huggingface.co/spaces/kakaobrain/karlo
This was as far as the UnClip technology went before being replaced by Diffusion technology

Anonymous
10/22/24(Tue)17:42:05 No.102931761

Anonymous 10/22/24(Tue)17:42:05 No.102931761

File: 00039-1343596775.jpg (616 KB, 1728x1296)

616 KB JPG

Anonymous
10/22/24(Tue)17:42:28 No.102931763

Anonymous 10/22/24(Tue)17:42:28 No.102931763

File: file.png (1.04 MB, 640x768)

1.04 MB PNG

Anonymous
10/22/24(Tue)17:45:27 No.102931785

Anonymous 10/22/24(Tue)17:45:27 No.102931785

>>102931289
>especially the details
That's the VAE's job, the rest of the thing just makes a composition and what we love is added by the VAE
Is that it? Can't someone make a version of SD3.5 that uses Flux's VAE? That's the entire concept with the VAE being separated, that you can use it with other image models.

Anonymous
10/22/24(Tue)17:47:09 No.102931802

Anonymous 10/22/24(Tue)17:47:09 No.102931802

>>102931312
It's been estimated that Dalle 3 is a 4B model, so we could run it fine.
..
...
Okay, at least give us its dataset, some dumbo could outdo Dalle with it!

Anonymous
10/22/24(Tue)17:48:41 No.102931812

Anonymous 10/22/24(Tue)17:48:41 No.102931812

can i run any locals on a 6700xt yet?

Anonymous
10/22/24(Tue)17:48:47 No.102931813

Anonymous 10/22/24(Tue)17:48:47 No.102931813

>>102931473
Well the issue I'm seeing right now is that just like with LLMs, there doesn't exist a single modern model to serve as proof that is both small-ish, smart, and creative. For LLMs, it's normally understood that a diverse dataset means both an opportunity cost, by training on everything under the sun instead of the targeted data you want your LLM to be good at, and a quality loss, due to architecture and training methods. You can't have something that is both SOTA in smarts and SOTA in creativity in a small model if you use a completely raw dataset with no subject area and quality control. Part of the reason that LLMs are usually not trained on old books despite all the data that could come from them.

SD1.5 may have been smart and creative at the time, but right now we only have DALLE 3 to serve as an example of what both a smart and creative model looks like, except that we don't know its parameter count nor how much data its been trained on, meaning that its performance level could be due to "cheating" on both of those factors, and in fact it's not possible to do for a small local model without major improvements in other areas like architecture.

Though since I do have a sense that SAI and BFL have filtered their datasets for safety reasons rather than to increase performance, I could imagine there is likely performance improvements (with respect to smart creativity) left out. Though my estimate is that it's probably not as much as one would hope and we need to push parameter sizes and/or dataset size in order to get to DALLE's level.

As for DPO, honestly that's a post-training method and if one has used it, that means their model is not a foundation model but a post-trained model and you need to criticize them for not releasing the pre-trained foundation model, rather than for using DPO as their post-training method.

Anonymous
10/22/24(Tue)17:48:52 No.102931815

Anonymous 10/22/24(Tue)17:48:52 No.102931815

>>102931314
What people have come to is using LLMs to create prompts for image models to draw...
I guess if you want a surprise, you use it and copy and paste it without reading it, so the image is random and surprising.

Anonymous
10/22/24(Tue)17:48:51 No.102931816

Anonymous 10/22/24(Tue)17:48:51 No.102931816

File: FUCK_STABILITY.png (3.83 MB, 1673x1256)

3.83 MB PNG

You don’t deserve the full weights. Sorry.
>t. Stability poojeet CEO

Anonymous
10/22/24(Tue)17:49:03 No.102931817

Anonymous 10/22/24(Tue)17:49:03 No.102931817

Is Hand Refiner still the best way of fixing hands, or there's something better now?

Anonymous
10/22/24(Tue)17:49:27 No.102931823

Anonymous 10/22/24(Tue)17:49:27 No.102931823

>>102931761
long live the new flesh

Anonymous
10/22/24(Tue)17:49:52 No.102931827

Anonymous 10/22/24(Tue)17:49:52 No.102931827

>>102931802
I really hope dalle3 is like, 40b at least. if its actually only 4b then it would make every single other model look like an absolute embarrassment.

Anonymous
10/22/24(Tue)17:50:54 No.102931836

Anonymous 10/22/24(Tue)17:50:54 No.102931836

>>102931357
Using a LLM to retag images, losing information like what character is it or what artist made the painting is pure incompetence.
Intentional incompetence.

Anonymous
10/22/24(Tue)17:51:23 No.102931847

Anonymous 10/22/24(Tue)17:51:23 No.102931847

File: ComfyUI_00076_.png (1.33 MB, 1024x1024)

1.33 MB PNG

Anonymous
10/22/24(Tue)17:51:51 No.102931851

Anonymous 10/22/24(Tue)17:51:51 No.102931851

File: 00050-1343596778.jpg (520 KB, 1152x1728)

520 KB JPG

>>102931823
c64 helldimension fleshlight integration

Anonymous
10/22/24(Tue)17:51:56 No.102931852

Anonymous 10/22/24(Tue)17:51:56 No.102931852

>>102931362
The face is really ugly and distorted when it's small.
Just like with every image model around here.

Anonymous
10/22/24(Tue)17:53:20 No.102931872

Anonymous 10/22/24(Tue)17:53:20 No.102931872

>>102931851
me first

Anonymous
10/22/24(Tue)17:53:40 No.102931878

Anonymous 10/22/24(Tue)17:53:40 No.102931878

>>102931872
fuck off thats my hole

Anonymous
10/22/24(Tue)17:53:52 No.102931880

Anonymous 10/22/24(Tue)17:53:52 No.102931880

>>102931370
SPOILER ALERT
SD3.5 Medium with NOT be better than SD3.5 Large. It's for people that can't run Large and all the new technology is just there to alleviate the loss of quality, if it's not in Large, it makes things worse, it has to be used to save parameters.

Anonymous
10/22/24(Tue)17:56:37 No.102931903

Anonymous 10/22/24(Tue)17:56:37 No.102931903

>>102931372
I've released 11 image models for the month of October, averaging one every 2 days, but did not release one today, I guess whatever I released would have been eclipsed by everything else.

Anonymous
10/22/24(Tue)17:58:12 No.102931920

Anonymous 10/22/24(Tue)17:58:12 No.102931920

>>102931903
>I've released 11 image models for the month of October, averaging one every 2 days
jeetmixes dont count, anon

Anonymous
10/22/24(Tue)18:00:14 No.102931941

Anonymous 10/22/24(Tue)18:00:14 No.102931941

>>102931813
>Though since I do have a sense that SAI and BFL have filtered their datasets for safety reasons rather than to increase performance
I always wonder how much of that results in the fucked up anatomy for limbs or pose.
If you filter anything "unsafe" (aka just nsfw most of the time is what they mean), wouldn't the models be worse for it, vs an approach where their own hosted access is controlled like what DALLE does, but what they release isn't their responsibility anymore ?

GOD PAGLIACCI
10/22/24(Tue)18:00:37 No.102931946

GOD PAGLIACCI 10/22/24(Tue)18:00:37 No.102931946

File: SD3_00073_.png (1.59 MB, 1024x1024)

1.59 MB PNG

the pajeet model is very good at cultural enrichment :)

Anonymous
10/22/24(Tue)18:00:41 No.102931947

Anonymous 10/22/24(Tue)18:00:41 No.102931947

File: image.png (133 KB, 1280x720)

133 KB PNG

>>102931357
This seems like a mistake. The less meta-captioning you do, the less the model has the ability to separate its learning about things, so you essentially end up telling the model that a cartoon like this, but oh no actually a cartoon like this other thing, so what it knew about past cartoons is partially overwritten and you need constant repetition of past data to make it not "catastrophically" forget the things it learned in the past. So basically by removing the meta data, you spend more money to make the model perform as good as it once did. And for transformers, the cost of this is HUGE.

Anonymous
10/22/24(Tue)18:01:27 No.102931954

Anonymous 10/22/24(Tue)18:01:27 No.102931954

>>102931473
You'll notice they only do it when someone points out "the king has no clothes", what we need is a kind with the balls to run around naked, and not care. To stop being prude.

Anonymous
10/22/24(Tue)18:01:56 No.102931960

Anonymous 10/22/24(Tue)18:01:56 No.102931960

>>102931941
Keep in mind most nsfw detection is really just skin tone detection, so you end up throwing away a lot of good images. And seriously at this point it's obvious no one is vetting anything, they're just trusting the numbers which is why we keep seeing these bullshit "look at our dumbass score" metrics for models that are clearly not the same quality.

Anonymous
10/22/24(Tue)18:03:06 No.102931970

Anonymous 10/22/24(Tue)18:03:06 No.102931970

File: 00061-1343596779.jpg (447 KB, 1152x1728)

447 KB JPG

Anonymous
10/22/24(Tue)18:03:28 No.102931974

Anonymous 10/22/24(Tue)18:03:28 No.102931974

>>102931954
>To stop being prude.
Tbdesu if they fear journalist retarded clickbait articles and faux outrage on social media, anything they do will lead to that anyway, so why care? I refuse to think all the people are prudes themselves.

Anonymous
10/22/24(Tue)18:04:01 No.102931982

Anonymous 10/22/24(Tue)18:04:01 No.102931982

>>102931509
No, I'm sticking to SD1.5, people are still releasing good stuff for it.
All SDXL based models look the same, and I was not a fan of the PonyXL's branch style.

Anonymous
10/22/24(Tue)18:04:13 No.102931984

Anonymous 10/22/24(Tue)18:04:13 No.102931984

lotta talk for a bunch of retards whove never trained a base model in their lives

Anonymous
10/22/24(Tue)18:05:11 No.102931994

Anonymous 10/22/24(Tue)18:05:11 No.102931994

>>102931960
>so you end up throwing away a lot of good images
a fucking shame

Anonymous
10/22/24(Tue)18:05:34 No.102931999

Anonymous 10/22/24(Tue)18:05:34 No.102931999

>>102931516
And that's it? It isn't in the news?
LATEST VERSION OF STABLE DIFFUSION ALLOWS FEMALE NIPPLES!!!
See? It's not a big deal, they should have allowed them since SD2.0

Anonymous
10/22/24(Tue)18:09:31 No.102932022

Anonymous 10/22/24(Tue)18:09:31 No.102932022

>>102931558
You made me realize how much Sona sounds better than Sana, BTW.
Sana means "heal" in spanish.

Anonymous
10/22/24(Tue)18:09:39 No.102932024

Anonymous 10/22/24(Tue)18:09:39 No.102932024

>>102931947
which is why all the "art" generated by these vlm models looks so fucking bland. it all just gets tagged as "a digital painting of" with no unique descriptors for the style. resulting in an absurd amount of information loss. the equivalent to tagging every wheeled vehicle as a "car"

Anonymous
10/22/24(Tue)18:15:45 No.102932076

Anonymous 10/22/24(Tue)18:15:45 No.102932076

File: ComfyUI_00098_.png (1.62 MB, 1280x960)

1.62 MB PNG

I like it, I think it's neat

Anonymous
10/22/24(Tue)18:16:02 No.102932083

Anonymous 10/22/24(Tue)18:16:02 No.102932083

>>102931827
Well, it "cheats" by using GPT-4whatever to rewrite the prompts and some of the creativity you see may have been added in that part, so raw Dalle would not be as good, but raw dalle's quality could be achieved without the parameter bloat.

Anonymous
10/22/24(Tue)18:17:52 No.102932099

Anonymous 10/22/24(Tue)18:17:52 No.102932099

>>102931816
qrd on picrel?

Anonymous
10/22/24(Tue)18:17:54 No.102932101

Anonymous 10/22/24(Tue)18:17:54 No.102932101

>>102931920
Why not? Some of my favorite models were jeetmixes.

Anonymous
10/22/24(Tue)18:19:34 No.102932112

Anonymous 10/22/24(Tue)18:19:34 No.102932112

File: file.png (720 KB, 640x768)

720 KB PNG

Anonymous
10/22/24(Tue)18:20:45 No.102932121

Anonymous 10/22/24(Tue)18:20:45 No.102932121

>>102931947
Really? So what we need is technology that remplaces transformers, if training becomes cheap then anybody can made the model of our dreams.

Anonymous
10/22/24(Tue)18:22:33 No.102932137

Anonymous 10/22/24(Tue)18:22:33 No.102932137

>>102931960
I think Playground was the worst offender, they were claiming to be better than Dalle 3 and Midjourney 5.

Anonymous
10/22/24(Tue)18:24:05 No.102932147

Anonymous 10/22/24(Tue)18:24:05 No.102932147

>>102931947
this is why flux learns things so quickly, you aren't really teaching it anything new with those 10 images loras, you are just making it remember something it forgot

Anonymous
10/22/24(Tue)18:24:58 No.102932158

Anonymous 10/22/24(Tue)18:24:58 No.102932158

>>102931974
It was funny all the things they did at Google for their Imagen model release backfired and they got exactly what they tried to avoid and then removed humans entirely from their generations.
>we... huh... have no idea what race, ethnicity, gender and sexual preference the human in the drawing should represent when you ask for "person", so we're banning the generation of humans completely.

Anonymous
10/22/24(Tue)18:25:30 No.102932166

Anonymous 10/22/24(Tue)18:25:30 No.102932166

File: 00092-2286917976.jpg (479 KB, 1152x1728)

479 KB JPG

Anonymous
10/22/24(Tue)18:25:37 No.102932167

Anonymous 10/22/24(Tue)18:25:37 No.102932167

>>102931947
Wait until you realize we do something called "dropout" when we train

Anonymous
10/22/24(Tue)18:31:10 No.102932199

Anonymous 10/22/24(Tue)18:31:10 No.102932199

File: d.png (3.93 MB, 1792x1024)

3.93 MB PNG

>>102932083
I used Dalle through ChatGPT before and I don't believe that's the case. You're able to investigate the prompt for a generated image, so you know when the LLM has changed it, and you're able to tell ChatGPT not to change it so that you in the end do get the exact prompt sent to the image generator. Additionally, it was possible to fix the seed to get reproducibility. And in my experience doing this, I do think the base model was still pretty creative. It really is just a powerful model in my opinion.

Anonymous
10/22/24(Tue)18:32:22 No.102932210

Anonymous 10/22/24(Tue)18:32:22 No.102932210

I just had a good sleep, can you sleep tonight anons?

Anonymous
10/22/24(Tue)18:37:34 No.102932264

Anonymous 10/22/24(Tue)18:37:34 No.102932264

>>102931984
Imagine me selling my bitcoins to train a base model only to see them soar to 100k an regret missing the profits.

Anonymous
10/22/24(Tue)18:39:43 No.102932281

Anonymous 10/22/24(Tue)18:39:43 No.102932281

>>102932158
It was a rare case when even normies noticed the sheer absurdity of all of it.
I want more of these.

Anonymous
10/22/24(Tue)18:42:15 No.102932302

Anonymous 10/22/24(Tue)18:42:15 No.102932302

>>102932199
No doubt, the question is how many parameters you need to do that? If someone offered a 1 million dollar bounty to the first person that made an image model with the power of dalle 3, they may not need more than 4 billion.
Because, so far, we still haven't seen what the rest of billions are being used for.

Anonymous
10/22/24(Tue)18:48:43 No.102932369

Anonymous 10/22/24(Tue)18:48:43 No.102932369

>>102932302
Dalle isn't as impressive as you think, it really is just like a 4B model but with a mostly uncensored and fully curated dataset that it was properly trained on. Most of the models we get are slopified from top to bottom including a censored dataset and even rushed out the door before it's properly trained and worse, usually getting a lobotomy pass.

Anonymous
10/22/24(Tue)18:50:34 No.102932390

Anonymous 10/22/24(Tue)18:50:34 No.102932390

>>102932281
I want those things to cause a realization and change of paradigm so that new people learn about it and stop the absurdity.
But, noooo, let's make a new definition of safety even if our investors run away from us.

Anonymous
10/22/24(Tue)18:51:37 No.102932407

Anonymous 10/22/24(Tue)18:51:37 No.102932407

>>102932369
I'm still impressed by its outputs and can't wait for the day an open model surpasses it.

Anonymous
10/22/24(Tue)18:53:23 No.102932420

Anonymous 10/22/24(Tue)18:53:23 No.102932420

File: ComfyUI_00118_.png (1.08 MB, 1280x960)

1.08 MB PNG

Not quite

Anonymous
10/22/24(Tue)18:54:15 No.102932426

Anonymous 10/22/24(Tue)18:54:15 No.102932426

>>102932407
SD3 Large will be able to when someone does a real uncensored finetune that includes pop culture knowledge.

Anonymous
10/22/24(Tue)18:55:53 No.102932443

Anonymous 10/22/24(Tue)18:55:53 No.102932443

>>102932390
Honestly I think the google one was so bad it was a one off, I've never seen this safety/censorship bullshit being mocked in normie spaces like this in recent years.

Anonymous
10/22/24(Tue)19:00:00 No.102932493

Anonymous 10/22/24(Tue)19:00:00 No.102932493

>>102932369
>with a mostly uncensored and fully curated dataset that it was properly trained on
I think that's the impressive part at this point.
All the open models seem to compete on is to make the same "super safe" stuff at the cost of nicer results.
Having played with it, I'd be very happy if dalle was leaked, since at this point I lost all hopes from any local models to ever change their way.

Anonymous
10/22/24(Tue)19:06:19 No.102932563

Anonymous 10/22/24(Tue)19:06:19 No.102932563

can someone summarize which models were out today and their size/goal (image or video)?

Anonymous
10/22/24(Tue)19:08:34 No.102932584

Anonymous 10/22/24(Tue)19:08:34 No.102932584

>>102932426
nobody has ever done this, because nobody knows how to scrape a dataset at that scale. sure we have anime finetunes, but never a full scale "fix the art" finetune that pulls from all over while maintaining proper captions.

Anonymous
10/22/24(Tue)19:09:14 No.102932590

Anonymous 10/22/24(Tue)19:09:14 No.102932590

>>102932563
wait for our bro cefurkan to release a detailed analysis

Anonymous
10/22/24(Tue)19:10:32 No.102932604

Anonymous 10/22/24(Tue)19:10:32 No.102932604

File: cm2l23kxx0010356z8qetfnek.webm (1.58 MB, 1696x960)

1.58 MB WEBM

The Mochi demo is up, here's my first attempt at a Miku

Anonymous
10/22/24(Tue)19:11:51 No.102932617

Anonymous 10/22/24(Tue)19:11:51 No.102932617

>>102932584
So it's essentially impossible to fix on a local/non company level?
Damn that's grim.

Anonymous
10/22/24(Tue)19:15:10 No.102932647

Anonymous 10/22/24(Tue)19:15:10 No.102932647

>>102932604
Damn, ordering my 5 H100s now

Anonymous
10/22/24(Tue)19:18:23 No.102932680

Anonymous 10/22/24(Tue)19:18:23 No.102932680

>>102932617
It is possible but requires an extreme amount of curation and community effort to cover all possible character and style niches and ensure they are properly captioned

Anonymous
10/22/24(Tue)19:21:11 No.102932714

Anonymous 10/22/24(Tue)19:21:11 No.102932714

>>102932680
And I guess something like the DALLE dataset has no open equivalent anywhere?

Anonymous
10/22/24(Tue)19:22:23 No.102932732

Anonymous 10/22/24(Tue)19:22:23 No.102932732

>>102932680
I believe in autism but maybe we are reaching its limit

Anonymous
10/22/24(Tue)19:23:07 No.102932740

Anonymous 10/22/24(Tue)19:23:07 No.102932740

>>102932158
>>102932281
Oh man the Gemini diversity debacle was hilarious. Even the main director at my company asked me what was up with that and I didn't even know where to begin haha.

Anonymous
10/22/24(Tue)19:27:22 No.102932789

Anonymous 10/22/24(Tue)19:27:22 No.102932789

>>102932714
It's all public data really, it just needs to be scraped/curated. It's likely all in common crawl. The real stumper is how they managed to preserve niche things like Blazblue and Fire Emblem while also using their AI captions. As Lykon said, its thanks to shitty vlm that IP was lost >>102931357 so how was dalle able to preserve it?

Anonymous
10/22/24(Tue)19:31:14 No.102932832

Anonymous 10/22/24(Tue)19:31:14 No.102932832

File: 00142-2894618249.jpg (687 KB, 1987x1325)

687 KB JPG

Anonymous
10/22/24(Tue)19:32:14 No.102932848

Anonymous 10/22/24(Tue)19:32:14 No.102932848

>>102932604
how long did inference take?

Anonymous
10/22/24(Tue)19:33:34 No.102932862

Anonymous 10/22/24(Tue)19:33:34 No.102932862

>>102932789
Probably using a great early version of gpt vision model to mass caption, with no censorship?

Anonymous
10/22/24(Tue)19:38:56 No.102932916

Anonymous 10/22/24(Tue)19:38:56 No.102932916

>>102932862
the question would be, why all the others seem to suck, unless they're happy it sucks because it kills IPs, artists recognition and most nsfw by design

Anonymous
10/22/24(Tue)19:45:31 No.102932979

Anonymous 10/22/24(Tue)19:45:31 No.102932979

File: ComfyUI_00147_.png (1.35 MB, 1280x960)

1.35 MB PNG

>>102932420

Anonymous
10/22/24(Tue)19:49:47 No.102933029

Anonymous 10/22/24(Tue)19:49:47 No.102933029

>>102932740
As hilarious as it was, I don't think neither google nor the public learned much from it in the long run.

Anonymous
10/22/24(Tue)19:53:54 No.102933065

Anonymous 10/22/24(Tue)19:53:54 No.102933065

>>102932979
getting closer

Anonymous
10/22/24(Tue)19:57:52 No.102933098

Anonymous 10/22/24(Tue)19:57:52 No.102933098

If they used exclusively synthetic captions, why was their sample image of the woman in the grass captioned
>~*~aesthetic~*~ #boho #fashion, full body 30-something woman laying on microfloral grass, candid pose, overlay reads Stable Diffusion 3.5, cheerful cursive typography font

I'd also love to know how they got such unremarkable and average-looking people for their example images, because all I'm getting is flux-tier Instagram clones with fish lips.

Anonymous
10/22/24(Tue)20:04:15 No.102933165

Anonymous 10/22/24(Tue)20:04:15 No.102933165

File: _teflon_don_.jpg (703 KB, 1568x1568)

703 KB JPG

got banned from /pol/ so dumping this here

Anonymous
10/22/24(Tue)20:05:29 No.102933179

Anonymous 10/22/24(Tue)20:05:29 No.102933179

>>102933165
can you gen donald crying while kissing vladimirs feet?

Anonymous
10/22/24(Tue)20:06:29 No.102933188

Anonymous 10/22/24(Tue)20:06:29 No.102933188

>>102932264
Imagine how cool the model will be tho with you at the helm

Anonymous
10/22/24(Tue)20:07:08 No.102933193

Anonymous 10/22/24(Tue)20:07:08 No.102933193

File: swiftelontrump.png (3.51 MB, 1568x1568)

3.51 MB PNG

>>102933179
do it yourself retard

Anonymous
10/22/24(Tue)20:10:49 No.102933228

Anonymous 10/22/24(Tue)20:10:49 No.102933228

>>102933193
do some with donald in a diaper bro

Anonymous
10/22/24(Tue)20:17:41 No.102933284

Anonymous 10/22/24(Tue)20:17:41 No.102933284

>>102933193
nice cleavage pose

Anonymous
10/22/24(Tue)20:30:08 No.102933377

Anonymous 10/22/24(Tue)20:30:08 No.102933377

>>102930087
I wonder what reference images it used to copy

Anonymous
10/22/24(Tue)20:36:43 No.102933431

Anonymous 10/22/24(Tue)20:36:43 No.102933431

File: 1729643735_0001.png (1.2 MB, 1024x1024)

1.2 MB PNG

Anonymous
10/22/24(Tue)20:38:11 No.102933441

Anonymous 10/22/24(Tue)20:38:11 No.102933441

File: 1707252284684013.png (1.25 MB, 1024x1024)

1.25 MB PNG

>>102931168
flux still wins for mikus

Anonymous
10/22/24(Tue)20:46:30 No.102933491

Anonymous 10/22/24(Tue)20:46:30 No.102933491

File: 2024-10-22_00001_.png (385 KB, 720x1280)

385 KB PNG

>>102933441
Is that real?

Anonymous
10/22/24(Tue)20:48:21 No.102933497

Anonymous 10/22/24(Tue)20:48:21 No.102933497

who has good prompts or methods to getting good results with prompts?

Anonymous
10/22/24(Tue)20:53:02 No.102933522

Anonymous 10/22/24(Tue)20:53:02 No.102933522

File: 1712914851319773.png (2.34 MB, 1024x1024)

2.34 MB PNG

>>102933497
Censored by Bing?
> Create a small prompt with a figure you want in it
> Once you can get the figure you want, play around with the background
> Only add one or two words each time, use complete sentences, and refer to anything censored indirectly
> Incrementally increase the complexity of your prompt, pushing it towards your desired contents.
> If you add words you think will trip the censor, space them out from the part of the prompt you are working on. Example: rabbi at the beginning, big nose at the end
> Bury naughty words in separate sentences. Even if that sentence is talking about something else, DALLE will figure out what you mean
Following this procedure will help you build an intuition about how to write the most effective prompts

https://dallery.gallery/the-dalle-2-prompt-book

Anonymous
10/22/24(Tue)20:53:29 No.102933526

Anonymous 10/22/24(Tue)20:53:29 No.102933526

>>102933497
chatgpt has some good prompts

Anonymous
10/22/24(Tue)20:53:55 No.102933530

Anonymous 10/22/24(Tue)20:53:55 No.102933530

File: 2024-10-22_00003_.png (423 KB, 720x1280)

423 KB PNG

>>102933491

>>102933497
a lot is trial and error. Certain ways seem right, until someone does it a totally different way, showing that actually you can't be too sure.

One opinion is to be extraordinarily specific.

You can use text (LML) ai to help build long prompts, you lazy loser.

Anonymous
10/22/24(Tue)20:54:42 No.102933536

Anonymous 10/22/24(Tue)20:54:42 No.102933536

>>102933530
cry harder faggot

Anonymous
10/22/24(Tue)20:55:15 No.102933544

Anonymous 10/22/24(Tue)20:55:15 No.102933544

File: 1699443337899918.webm (1.51 MB, 720x720)

1.51 MB WEBM

HANDS FREE SLOPPING
> Words -> salad with chatGPT
> Text -> image
> Image -> video
> Edit/Subtitle
> Converter to WebM
>?????
>PROFIT

OLD MEME GUIDES:
https://files.catbox.moe/3az283.jpg
https://files.catbox.moe/e5mzsc.png
https://files.catbox.moe/5ix69v.png

Anonymous
10/22/24(Tue)20:56:14 No.102933552

Anonymous 10/22/24(Tue)20:56:14 No.102933552

>>102933544
Is this real?

>>102933536
set yourself on fire. ok thank you

Anonymous
10/22/24(Tue)20:58:10 No.102933568

Anonymous 10/22/24(Tue)20:58:10 No.102933568

>>102933552
mad??

Anonymous
10/22/24(Tue)21:00:34 No.102933585

Anonymous 10/22/24(Tue)21:00:34 No.102933585

>>102933568
I'm patient. Google "how to set myself on fire". thanks

Anonymous
10/22/24(Tue)21:02:50 No.102933601

Anonymous 10/22/24(Tue)21:02:50 No.102933601

>>102933585
>regurgitate
blah blah blah

Anonymous
10/22/24(Tue)21:13:31 No.102933677

Anonymous 10/22/24(Tue)21:13:31 No.102933677

File: file.png (102 KB, 1139x544)

102 KB PNG

>>102932584
I have millions of captions, it's not that hard.

Anonymous
10/22/24(Tue)21:17:41 No.102933713

Anonymous 10/22/24(Tue)21:17:41 No.102933713

File: Untitled.jpg (890 KB, 1204x3440)

890 KB JPG

jesus this model comparison someone just posted on reddit is ruthless.

sd3.5 vs flux. which column is sd? the shit column.

Anonymous
10/22/24(Tue)21:18:04 No.102933715

Anonymous 10/22/24(Tue)21:18:04 No.102933715

>>102933544
based

Anonymous
10/22/24(Tue)21:18:51 No.102933727

Anonymous 10/22/24(Tue)21:18:51 No.102933727

>>102933713
skill issue

Anonymous
10/22/24(Tue)21:19:13 No.102933730

Anonymous 10/22/24(Tue)21:19:13 No.102933730

>>102933713
They both look like high quality AI slop and I mean that in a derogatory way

Anonymous
10/22/24(Tue)21:19:39 No.102933738

Anonymous 10/22/24(Tue)21:19:39 No.102933738

>>102933713
True, but I want to live in the ball.

Anonymous
10/22/24(Tue)21:20:12 No.102933743

Anonymous 10/22/24(Tue)21:20:12 No.102933743

>>102933713
3/3 it's a tie for me

Anonymous
10/22/24(Tue)21:21:22 No.102933756

Anonymous 10/22/24(Tue)21:21:22 No.102933756

File: 2024-10-22_00008_.png (1.22 MB, 720x1280)

1.22 MB PNG

>>102933730
<he won't live in the ball
<he won't be happy

Anonymous
10/22/24(Tue)21:24:17 No.102933779

Anonymous 10/22/24(Tue)21:24:17 No.102933779

>>102933713
The shit column is the one that says you will tune nothing and will be unhappy.

Anonymous
10/22/24(Tue)21:24:54 No.102933785

Anonymous 10/22/24(Tue)21:24:54 No.102933785

SD3M finetune status?

Anonymous
10/22/24(Tue)21:25:24 No.102933790

Anonymous 10/22/24(Tue)21:25:24 No.102933790

>>102933544
People look better in 3.5. I'm so SICK of seeing the shiny buttchin flux face

Anonymous
10/22/24(Tue)21:25:37 No.102933792

Anonymous 10/22/24(Tue)21:25:37 No.102933792

is there any reason models can't draw empty chessboard. Lack of parameters/dataset?

I think the problem is deeper than that since you can't even do it with lora. Diffusion probably still bad at drawing repeative patterns without hallucinating shits

Anonymous
10/22/24(Tue)21:27:22 No.102933803

Anonymous 10/22/24(Tue)21:27:22 No.102933803

>>102933792
Because models are fundamentally hallucinating pixels, there really isn't much reasoning that happens with these models.

Anonymous
10/22/24(Tue)21:27:52 No.102933809

Anonymous 10/22/24(Tue)21:27:52 No.102933809

>>102933713
Right is trying way too hard to be midjourney. Left feels closer to the outputs you would get from a proper base model, without the so-called "aesthetic tuning" (overfitting on artsy images). Also the fucking flux face, it never goes away.

Anonymous
10/22/24(Tue)21:27:59 No.102933811

Anonymous 10/22/24(Tue)21:27:59 No.102933811

>>102933792
tagging issues.
tagging for different positions on the board
tagging for different types of boards
tagging for positions of a flying birds wing
you can get away with a lot with a 4-5b model, the issue is that other than dalle, all base models are tagged like garbage

Anonymous
10/22/24(Tue)21:28:10 No.102933814

Anonymous 10/22/24(Tue)21:28:10 No.102933814

>>102933677
And yet no one was able to do that locally outside of mass importing tags from boorus.
Even less captions with artists, nsfw, IPs...

Anonymous
10/22/24(Tue)21:28:49 No.102933820

Anonymous 10/22/24(Tue)21:28:49 No.102933820

File: 1706239273973789.gif (834 KB, 640x640)

834 KB GIF

>>102933792
>>102933803
>INB4 /x/ schizo-analysis on why free masonry is 'cool'

Anonymous
10/22/24(Tue)21:29:30 No.102933829

Anonymous 10/22/24(Tue)21:29:30 No.102933829

>>102933809
>Also the fucking flux face, it never goes away.
The buttchin is my go to, to detect flux made gens lol.

Anonymous
10/22/24(Tue)21:29:40 No.102933833

Anonymous 10/22/24(Tue)21:29:40 No.102933833

>>102933792
sounds like a tagging issue. every image related to chess gets tagged with "chess" and there are more chess boards that have pieces on them, so the "chess" word gets stuck with always having some pieces. ("chessboard" too)
"empty chessboard" is two words and most AI are too dumb to handle this.
>I think the problem is deeper than that since you can't even do it with lora.
I bet you can, but the trigger word should not be "chess", instead a gibberish word.

Anonymous
10/22/24(Tue)21:31:04 No.102933846

Anonymous 10/22/24(Tue)21:31:04 No.102933846

>>102933785
probably never

Anonymous
10/22/24(Tue)21:32:03 No.102933855

Anonymous 10/22/24(Tue)21:32:03 No.102933855

>>102933833
>should not be "chess", instead a gibberish word.
"empty-chess"

Anonymous
10/22/24(Tue)21:32:37 No.102933863

Anonymous 10/22/24(Tue)21:32:37 No.102933863

>>102933833
The thing with captioning is they're positive biased, meaning the model describes what it sees, not what it doesn't see. Very few captions use terms like "empty", "blank", "void", etc.

Anonymous
10/22/24(Tue)21:32:45 No.102933867

Anonymous 10/22/24(Tue)21:32:45 No.102933867

>>102933833
>"empty chessboard" is two words and most AI are too dumb to handle this.
and also this is like "create a room without a pink elephant in it".

Anonymous
10/22/24(Tue)21:33:24 No.102933874

Anonymous 10/22/24(Tue)21:33:24 No.102933874

>>102933792
train a lora on one good image, with emptychess as its tag?

Anonymous
10/22/24(Tue)21:34:04 No.102933882

Anonymous 10/22/24(Tue)21:34:04 No.102933882

>>102933814
All you have to do is combine the alt / search title with the caption. I built a lot of my dataset by searching by artist, character, etc.

Anonymous
10/22/24(Tue)21:35:25 No.102933892

Anonymous 10/22/24(Tue)21:35:25 No.102933892

it seems to me that image models would greatly benefit from the knowledge that video models have.

Anonymous
10/22/24(Tue)21:36:08 No.102933900

Anonymous 10/22/24(Tue)21:36:08 No.102933900

>>102933892
video models are just image models or what I like to call "motion pictures"

Anonymous
10/22/24(Tue)21:36:21 No.102933903

Anonymous 10/22/24(Tue)21:36:21 No.102933903

>>102933867
Yeah, but you don't realize that actually ai hasn't been trained to know what a chessboard is. It knows what a checkerboard is. It knows what checkers are. It knows "chess" to be the pieces. No pieces, it's probably not chess.

Anonymous
10/22/24(Tue)21:37:10 No.102933912

Anonymous 10/22/24(Tue)21:37:10 No.102933912

>>102933892
I'd say they're censored/limited the same way

Anonymous
10/22/24(Tue)21:38:42 No.102933926

Anonymous 10/22/24(Tue)21:38:42 No.102933926

>>102933867
>without a
Negatives exist

Anonymous
10/22/24(Tue)21:40:34 No.102933951

Anonymous 10/22/24(Tue)21:40:34 No.102933951

>>102933867
Something to mention though is the Transformers only models that work only in tokens can handle this problem more because the prompt and images share the same token space which gives the model some ability to understand a prompt with "without". It's the same line of features as being able to to modify an image through prompts ie "change the monkey into an elephant".

Anonymous
10/22/24(Tue)21:45:19 No.102933990

Anonymous 10/22/24(Tue)21:45:19 No.102933990

>>102933874
here >>102933833

base models know "chess" & "board" very well but can't differentiate between each chess piece and board.
Even with lora, it draws extra squares or uneven grid and shit.

Anonymous
10/22/24(Tue)21:47:42 No.102934009

Anonymous 10/22/24(Tue)21:47:42 No.102934009

>>102933990
Yeah because you literally don't understand that these models are hallucinating blobs that coincidentally align with text prompts. You're like someone who asks ChatGPT to count. These models don't work like that. Even Flux can't even consistently do normal hands and limbs and you're asking for precision chess board reconstruction.

Anonymous
10/22/24(Tue)21:48:26 No.102934018

Anonymous 10/22/24(Tue)21:48:26 No.102934018

SD3.5 is underwhelming. It's still bad with human anatomy.

Anonymous
10/22/24(Tue)21:48:41 No.102934020

Anonymous 10/22/24(Tue)21:48:41 No.102934020

>>102934009
>these models are hallucinating blobs that coincidentally align with text prompts
abstractartfags win again

Anonymous
10/22/24(Tue)21:51:06 No.102934044

Anonymous 10/22/24(Tue)21:51:06 No.102934044

>>102933951
>Transformers only models that work only in tokens can handle this problem more because the prompt and images share the same token space
This is actually quite interesting. There's no making up for that with newer architectures?

Anonymous
10/22/24(Tue)21:55:19 No.102934071

Anonymous 10/22/24(Tue)21:55:19 No.102934071

>>102934044
I'm not sure what you mean but Emu3 is an example of a model that works like this. Basically you have a model that can Chat, Generation and Caption images.

Anonymous
10/22/24(Tue)21:55:29 No.102934073

Anonymous 10/22/24(Tue)21:55:29 No.102934073

>>102934009
>because you literally don't understand that
dood, I said that in the original post.

Anonymous
10/22/24(Tue)21:57:16 No.102934089

Anonymous 10/22/24(Tue)21:57:16 No.102934089

Next Bred

>>102934088
>>102934088
>>102934088

Anonymous
10/22/24(Tue)22:08:30 No.102934203

Anonymous 10/22/24(Tue)22:08:30 No.102934203

File: asdasdad.jpg (177 KB, 862x652)

177 KB JPG

>>102930087
What's arguably the best pony model right now

Anonymous
10/22/24(Tue)22:51:25 No.102934555

Anonymous 10/22/24(Tue)22:51:25 No.102934555

>>102931393
is there a conference coming up? that's big shit for smaller industries t. am on a trip for one now

Anonymous
10/22/24(Tue)22:55:06 No.102934580

Anonymous 10/22/24(Tue)22:55:06 No.102934580

>>102931505
that won't happen for China, and that's why it'll be the country that'll win in the long term

Anonymous
10/22/24(Tue)22:56:39 No.102934591

Anonymous 10/22/24(Tue)22:56:39 No.102934591

>>102931785
>Can't someone make a version of SD3.5 that uses Flux's VAE?
you need to modify the VAE so that it can work with SD3.5 I guess

Anonymous
10/22/24(Tue)22:57:39 No.102934598

Anonymous 10/22/24(Tue)22:57:39 No.102934598

>>102931836
it's not incompetence, it's intended, they want to remove every single artist and celebritie's name to not get liability, they have 0 balls, only MJ have them

Anonymous
10/22/24(Tue)23:00:05 No.102934613

Anonymous 10/22/24(Tue)23:00:05 No.102934613

>>102933165
>>102933193
what model you used anon?

Anonymous
10/23/24(Wed)00:12:18 No.102935178

Anonymous 10/23/24(Wed)00:12:18 No.102935178

>>102934613
flux with some XL facedetailer

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.