Discussion of free and open source text-to-image modelsPrevious /ldg/ bred : >>102926788Very Busy Day Edition>Beginner UIFooocus: https://github.com/lllyasviel/fooocusEasyDiffusion: https://easydiffusion.github.ioMetastable: https://metastable.studio>Advanced UIForge: https://github.com/lllyasviel/stable-diffusion-webui-forgereForge: https://github.com/Panchovix/stable-diffusion-webui-reForgeAutomatic1111: https://github.com/automatic1111/stable-diffusion-webuiComfyUI: https://github.com/comfyanonymous/ComfyUIInvokeAI: https://github.com/invoke-ai/InvokeAISD.Next: https://github.com/vladmandic/automaticSwarmUI: https://github.com/mcmonkeyprojects/SwarmUI>Use a VAE if your images look washed outhttps://rentry.org/sdvae>Model Rankinghttps://imgsys.org/rankings>Models, LoRAs & traininghttps://aitracker.arthttps://huggingface.cohttps://civitai.comhttps://tensor.art/modelshttps://liblib.arthttps://github.com/Nerogar/OneTrainerhttps://github.com/kohya-ss/sd-scripts/tree/sd3>SD3 Largehttps://huggingface.co/stabilityai/stable-diffusion-3.5-largehttps://replicate.com/stability-ai/stable-diffusion-3.5-large>SANAhttps://github.com/NVlabs/Sanahttps://ea13ab4f5bd9c74f93.gradio.live>Fluxhttps://huggingface.co/spaces/black-forest-labs/FLUX.1-schnellhttps://comfyanonymous.github.io/ComfyUI_examples/fluxDeDistilled Quants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main>Index of guides and other toolshttps://rentry.org/sdg-linkhttps://rentry.org/rentrysd>Try online without registrationtxt2img: https://www.mage.spaceimg2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest>Maintain thread qualityhttps://rentry.org/debo>Related boards>>>/aco/sdg>>>/aco/aivg>>>/b/degen>>>/c/kdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/tg/slop>>>/trash/sdg>>>/u/udg>>>/vt/vtai
What if you had MiniMax at home with an Apache 2.0 licence, but god said:https://github.com/genmoai/models>The model requires at least 4 H100 GPUs to run.https://xcancel.com/genmoai/status/1848762405779574990
Blessed thread of frenship
what if you had sd 3.5 large but god said: but now you need to wait for ggufs because vramlet lmao
yeah. img2img is fucked, I wonder if this is a safety measure
>>102930138It's just the denoiser that needs to be tweaked
>>102930135you can't try the fp8?https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8
Is it over or are we back?
What if god gave you local SOTA (sana) but said: you must have skill
Blessed thread of keeping up the torch of threadly culture.
>>102930162back for now
>>102930155>14.5gbdoes that include the text encoders? i only have 10gb vram
>>102930193oh nvm the text encoders are in a seperate folder, i guess i'll wait a while
>>102930170weights when
>>102930214have faith in chang
always putting in a gen that says "LDG" is debo-coded
>>102930193yeah, 14.5gb is everything (fp8 unet + fp8 text encoder), desu my recommendation would be to put the text encoder into your ram/cpu so that you have spare vram room for the resthttps://reddit.com/r/StableDiffusion/comments/1el79h3/flux_can_be_run_on_a_multigpu_configuration/
>>102930251>yeah, 14.5gb is everything (fp8 unet + fp8 text encoder)ooh okay, thank you anon, i'll give it a try
>>102930214Send Nvidia a strongly worded email, they're the ones having to approve it.
>>102930111https://arxiv.org/abs/2405.14854If only this model was a bitnet model (1.58bit), it would be way easier to run it :(
>>102930111>4 H100sI don't even remotely believe this is a hard requirement. I scanned through their github code, they have some weird multi-machine FSDP distributed implementation (likely taken from the training code).I mentioned this last thread, but comparing with Allegro:Allegro: 2.8B parameters, 80k sequence length, 2304 hidden dim, bf16 version runs in 22 GB VRAMMochi: 10B parameters, 44k sequence length, 3072 hidden dim, runs in ? VRAMMemory usage is a fixed amount for all the weights, plus the memory for activations which scales linearly with both sequence length and hidden dimension size. Mochi is half the sequence length, but less than double the hidden dim size, so would theoretically use LESS activation memory per layer.If you 8 bit quantized Mochi, it's 10GB of weights, compared with 5.6 GB of weights for Allegro bf16. Combine that with the lower activation memory per layer, and it probably can be squeezed to run in 24GB VRAM. Worst cast you'd need to go model parallel with two 24GB cards.Someone just needs to make an optimized single-machine inference implementation that uses quantization.
Can i run 3.5 with a 4080?Is it good?
>>102930334>Allegro: 2.8B parameters, 80k sequence length, 2304 hidden dim, bf16 version runs in 22 GB VRAMyou include the text encoder to those 22gb vram or is it separate?
>>102930353YesNo
>>102930353i doubt it I can barely run it on a 5090
Hmm think i just have to crank up the steps at least at 30
>>102930353>Is it good?I let you judge>George Costanza eating a Hamburger, there's a Hatsune Miku plush on the table
>>102930358I'm not including any text encoders at all, because it doesn't matter. You can load the text encoder to VRAM, compute embeddings, then unload it and load the transformer model. The time taken to do the text embeddings this way is still a negligible fraction of overall generation time.
>>102930334>44k sequence lengthwhat does that mean? it's the number of frames or something?
>>102930388Burger looks good at least
>>102930388I'll wait for the finetunes
>>102930184No way SD3.5 just allows you to draw pepes like that???
>>102930429SAI likely put in some fuckery to prevent that
>>102930424The video gets compressed into the latent space, and then that 3d tensor is divided into a long list of embeddings. It's literally the same thing as imagegen models based on DiT, but with an extra time dimension.So the actual input to the model is a long list of visual embeddings, each representing a tiny image patch from one frame. That's what the context length is referring to. For mochi, it's smaller than allegro due to some combination of slightly lower res video and better spatial + temporal compression by the VAE.
>>102930474I see, but on your post you're using the number of parameters + sequence length + hidden dim + quant to evaluate the vram requirement, what about the numbers of frames and the resolution?
>>102930087>>SANA>https://github.com/NVlabs/Sana>https://ea13ab4f5bd9c74f93.gradio.livereal demo link btw https://8876bd28ee2da4b909.gradio.live
>>102930441Eh i remember when peoole panicked about sdxl being censored and now we have pony
>>102930441The reason they'd do that would be stopping people from finetuning nipples on girls' boobs, and the model does them natively, so they're navigating in a better direction and the poor stuff is due to incompetence.
>>102930538Demo works now?
>>102930538yeah I've seen them on twitter, they said there's a 480p model and a higher resolution one, I wonder which one it is on the demo
>>102930493Doesn't matter, you can unload the transformer and run the VAE once at the very end to decode the latent space to a video, which is much lighter weight than the diffusion model.The actual input to the model is what matters, and that's just a big tensor of 44k or 80k embeddings. That tensor IS the video that you're denoising, just in the latent space and represented as a bunch of tiny patches.
>>102930551>Doesn't matter, you can unload the transformer and run the VAE once at the very end to decode the latent space to a videotrue, that's what CogVideoX is doing actually
>>102930507We can get a local video model that is better than proprietary video models and yet nobody has managed to make something better than Gradio?Why is it still so hard?
>>102930432>No way SD3.5 just allows you to draw pepes like that???unfortunatly no
>>102930609Reminds me of Meta.ai's output. Was Meta's image generator ever mentioned on 4chan? Anyone with Whatsapp can use it for free and nobody seemed to care, even Google's ImageFX got mentioned once.
What I like about SD3.5 is its diversity of outputs, you don't get the same rigid shit everytime like on flux, but on the other hand there's a lack of consistency, for example one image is oversaturated somehow, and the 2 images below are "3d migu" even though I specified for an anime style only, SD3.5 is too inconsistent, probably because it's undertrained or something?
>>102930672this is a local thread anon, that's why we're not talking about it
>>102930596Because Gradio works just fine and requires minimal work to implement.
>>102930706read their release note
>>102930771this?https://stability.ai/news/introducing-stable-diffusion-3-5
>>102930706>SD3.5 is too inconsistentIsn't all this because it's unet? Unet is like this, no way around that.Flux is DiT, so it's super-consistent, but then all seeds look very similar.
>Consistency is bad now
>>102930737I said 4chan, didn't seem like /SDG/ mentioned it either, and they could say>this is a stable diffusion thread anon, that's why we're not talking about it
>>102930815SD3 is also a DiT I believe
>>102930706>>102930792It's from QK Norms which is similar to what Pixart uses, makes training faster and more efficient but you lose some stability deep in the weights.
>>102930706>you don't get the same rigid shit everytime like on fluxthat doesn't happen to me on flux
>>102930839>makes training faster and more efficient but you lose some stability deep in the weights.that's fucking retarded, couldn't they wait a bit more so that they got a better model for ever at the end? why do they always want to rush
>>102930849Why because your vague prompt gets more artistic interpretation from the model? Boo hoo.
>>102930860I asked for an anime style, not a 3d style, and it gave me 3d style, that's just a mistake from the model, and you're a retard if you think otherwise
>>102930877Or maybe you're suffering from negative bias and you ignore every time other models interpret your prompt. It's not like Flux is the king of adherence either.
>>102930815Flux is distilled to produce not very diverse set of nice images. It will never have knowledge of the full, trained from scratch model and because of that 3.5 is a much better base for finetuning, even if it doesn't have good quality in all generations out-of-the-box.
>>102930706That looks like shit kek.
>>102930844>that doesn't happen to me on fluxlucky you I guess, because flux is really rigid and tend to give you really similar pictures
>>102930996>why are there no mid sized models>wah everything looks like shitAny day now, a Flux finetune lmaoIt's great because the one you kept posting turned out to be a shitlmaoseriously
>>102931012>Any day now, a Flux finetune lmaoit happened thoughhttps://huggingface.co/SG161222/Verus_Vision_1.0b
>>102931011at what guidance value?
>>102931027Yeah, it's shit.hahaha, omg I can't believe you were waiting for thatDoesn't even beat fp8 dev
>>102931012You seem to have either responded to the wrong post or think I'm someone specific in this thread when that was my first post.
>>102930831It is bad if it's at the cost of creativity, the most creative model ever was Craiyon (formetly mini-dalle) and its inconsistency was so off the charts you couldn't generate a face.
>>102931037seemed to work because I got the reply I wanted :) because there's a coping Flux user in here thinking someone is going to drop a $20k finetune
>>102931027According to his donation page, he's doing finetuning with a single 4090. I doubt he used a lot of images. There's no way it's going to be good.
>>102931043No I want the things I type to appear, you can get creative with your prompts.
>>102930834Ah, well, kudos for achieving diversity of outputs with it then, and there's nothing wrong with the tech, black forest labs messed it up, and that's one thing I can say Stability did better.
>>102931072It's not going to be good because you need to do like 10 epochs on a million images to properly stamp in new concepts. If he's seriously doing a single 4090 that's like 15 seconds a step at batch size 1. At best he's doing what a merged Lora would do.
>>102931034you tried it?
>>102931047Honestly as an outsider that came to this thread after hearing the news about the new models, you seem a bit obsessed with whoever that guy is living rent free in your head.
>>102931117Yes, for the same effect find a realism lora for Flux.
>>102930969Wrong, Flux Pro doesn't have good diversity either, that's why I suspected it was DiT's fault.But I'm glad I was wrong because unet just won't give text that looks as good as thing.
>>102931141DiT's major flaw is it requires saturation, it needs the boomer prompts which increases quality with each word.
oh non ono sana bros this cant be happening
dawwww
>>102931168it doesn't look really good, desu I couldn't tell the difference between SD3-2b (medium) and SD3-8b here, they're on the same range of quality, the fuck did they do all this time?
>>102931183>username>retarded pajeet
>>102931106That was me at the beginning, then I realized my best generations ever had something that the model put in there that I never could have imagined, sometimes it became a new fetish, one thing became my main fetish because I had never seen that before.When the model only does what you ask for, and not more, you're missing out.
>>102931195Sadly a lot of the quality is in the training and some of the concepts are overbaked. It seems to me the more they try to fix the training process (magic deduping, overbake detection, etc) the more they fuck things up.
>>102931204>When the model only does what you ask for, and not more, you're missing out.this, it's fucking boring if it only does it job, I want surprises, something like Dalle-3, you can go for a simple prompt and the model can add a shit ton of details that weren't on my orders but are still relevant to the image, that's where the fun is
>>102931195never tried sd3 , but this new 3.5 doen't impress me that much, messed up anatomyIt is up to date with 'famous' people, i guess...
>>102931043I loved mini dalle, has that model been released?
whats with this faggot schizo mentioning how Flux is bloated every time a shitty alt-model releases? >>102917495 as if Sana and SD3 being shit is a good thing because Flux is too big? Just rushing to the slide defense every time someone says these turdbakes look like shit. sd3 8b being bad is 100% a fault of the trash datatset
>>102931239>sd3 8b being bad is 100% a fault of the trash datatsetthis, 100% this
>>102931239you know you are really obvious because you say turd a lotit would help if you didn't schizo post backalso you are completely ass blasted because your savior Flux fine tune is, as you say, a turd.lmao lol
>>102931235kek he looks like putin in there
>>102931195They added nipples.https://files.catbox.moe/w0katp.png <- SD3.5Large
there he goes. what's this jeet's endgame? unironically what is wrong with him? treating it like a console war
>>102931274everything except Flux is a turd for you, what's your end game?
>>102931221>something like Dalle-3Well, with today's so many releases maybe we will get open Dalle-3 at last today!
>>102931268ok I guess they stopped acting like lunatics towards nudity, which is always a good thing, but the model quality could've been way better, it's a 8b model and it still looks bad, especially the details
>>102931287>we will get open Dalle-3 at last today!you're talking about Sana?
>>102931298>Sanano, that was dead before it even got released.
>>102931287We got sora at home, except you can't run it at home kek
In the LLM world there's a pattern where the more creative models are dumber, while the smarter models are less creative. We want creative models but not necessarily at the cost of coherence/smarts. Something the LLM users have found to make the smart models more creative is to use the {{random}} function, which is a part of the prompt. I think I heard there was such a thing in the image gen frontends as well. I believe it was called wildcards? Basically it lets you insert random strings in the prompt. So in LLM world you could do something like>Write in the style of {{random: Dracula from Castlevania, Kizuna AI the vtuber, Gordon Ramsey}}.and each time you press generate, it would pick from one of the strings so you'd get a different style for each new reply. I imagine this could be pretty powerful for smart but uncreative image models as well as you use more and more wildcards in different parts of the prompt.
>>102931274you vill get compressed eyesyou vill get mangled limbsyou vill generate mj slopyou vill like sana
>>102931312>We got sora at home, except you can't run it at home kek
>>102931281i dont give a shit about flux man, what the fuck is wrong with your brain? flux is shit because it's rigid untrainable junk. sd3 and sana are shit because they're melted synthetic slop trained on even worse data. none of the 3 are really good. seriously get some help, you're jumping at shadows over nothing
>>102931330you should get checked for depression because everything sucks to youmaybe it's a you problem
>>102930529>poor stuff is due to incompetenceIt doesn't know dicks, many artists, a lot of known people, genitals.It's intended.
Sana knows artists Flux and SD3 do not
>>102931343>It's intended.facts
Guys, Mochi passes the test
>>102931314It's not hard to have a diverse image model, you just have diverse captions and diverse images and then avoid the temptation to use a DPO.
>>102931335let's not play pretend now. these models are just bad, not need to cope over it. sana was not the bigma everyone was hoping for. sd3 is just as bad as it was months ago, now with extra parameters. flux remains untunable airbrushed slop and is nowhere near dalle at home. these models are all underwhelming except that video one which isn't even runnable
>>1029301353.5 Medium still to come on the 29th too BTW. Large is looking good so far though, it can do proper booba with nipples out of the box even, even with the 4-step turbo version.
>>102931357>Finetune for itLykon, why would people spend thousands of dollars to finetune this giant 8b turd? You're delusional
>>102930849>couldn't they wait a bit moreapparently everyone and their dog decided to release todayso either it's a crazy coincidence (could be), or they know about some legislation that will be pushed soon that would severely restrict image models in the US
>>102931367Seriously anon, you should take a break from this, come back in five years, being on the cutting edge isn't for you.
>>102931268Really? Looks like they added areolas and forgot the nipples
>>102931372>in the USSAI is an UK company though kek
>>102931372I'm actually going to bet there's some sort of deadline for some sort of AI event we don't know about. Maybe even for Nvidia's conference.
>>102931370>Large is looking good so far thoughno it fucking doesn't, the details are horrible, the anatomy is still fucked, it's a fucking 8b model there's no excuse this time, they fucked it up
>>102931236Yes:https://huggingface.co/dalle-mini/dalle-miniBut it's garbage compared to this one:https://huggingface.co/dalle-mini/dalle-mega/tree/mainWhich is garbage compared to... one I'll post later...
>>102931372>some legislation that will be pushed soon that would severely restrict image models in the USBig if True
>>102931380none of these jeeted local releases are cutting edge. training on dreamshaper outputs with sai compute is nothing more than a griftbake. same with sana designed to guzzle research grants.
>>102930529Type this prompt I got from JoyCaption into SD 3.5 somewhere where there's not Web API level prompt filtering (e.g. locally), it really will do it out of the box, I swear:"a photograph of a topless woman with a light skin tone and platinum blonde hair styled in loose waves that cascade over her shoulders. She has striking blue eyes, full lips, and a slender, toned physique. Her breasts are medium-sized with prominent, erect nipples. She has an intricate tattoo of two roses, one red and one white, with detailed green leaves and vines, covering her upper chest and extending to her shoulders. Additional tattoos are visible on her left arm, which has a large, elaborate design, and her right arm, which has a smaller, more intricate design. Her left hip features a tattoo of a rose. The background is a plain, neutral beige color, which helps to focus attention on the subject. The lighting is soft and even, highlighting her smooth skin and the vibrant colors of her tattoos. The overall composition of the image is simple yet powerful, emphasizing both her natural beauty and the artistic elements of her body art. The photograph is professionally taken, with a clear and crisp quality that brings out every detail."
>>102931412Yeah you're completely fried, get off the internet.
>>102931367>sana was not the bigma everyone was hoping forUtter retardation you have
>>102931418>no argument>no u yeah maybe post some images in your defense? these models look like shit. the only thing fried here is your trash quality outputs
>>102931398Nice bait lmao, I'd post topless 4-step Turbo gens if this wasn't a blue board, it literally does nudity better than any version of Flux does without Loras
Maybe I should just fucking lower my expectations.
>>102931365Idk if I'd call that easy. Building a both large and diverse dataset is always a challenge in ML.
WHY THE FUCK CAN'T 3.5 DO HIGH RESOLUTIONS!
>>102931436Anon, everything is shit to you because you unironically have severe depression. You should leave and come back when things stop looking like shit to you. Fix your life. I can't believe you're getting this upset about experimental software.
>>102931268no dick
>>102931451shame this isnt local. very unlike anything we have style wise desu.
>>102931442Not really, LAION for SD 1.5 proved the concept. The problem is everyone keeps on filtering their datasets based on flawed reasoning instead of letting the chaos happen. You'll notice these models make two major mistakes: overfiltering the data, and overtuning the outputs with a DPO. That's why everything turns into slop.
>>102931452ooooooh im close to cooming sir please give me the jeet schizo rant again i beg you
Can we stop talking about Flux.What about other models? Right now, what matters is how good of a base model it is so that we can reason about what future models should be fine tuned off of. Does SD3.5 perform better than SDXL base at the things we care about like non-censored anatomy? Can SDXL base do higher resolutions than 1024x1024? What about Sana that people mentioned? I haven't used any of these models actually, so I don't really know.
>>102931484I'm not the one that says everything is shit and it's really upsetting you even though you just got many models to try out today, I don't know why you're here, this obviously upsets you. Take a break, come back in a year, you obviously can't handle the research.
Looking at 8b, it genuinely looks like 2b medium. Is the hype just Stockholm syndrome after Sana flopped and people want a life raft?
>>102931490you need to ask this mans expert opinion about sana he will be happy to oblige>>102931496
>>102931384Anything the US does in legislation in regards to bleeding edge stuff is copied everywhere else.Anything the US is obsessed culturally bleeds over in every other parts of the world at some point.It's like Apple and the rest of the smartphone brands applied to countries/cultures.
>>102931501>Sana floppedSaid who?
sana is shitflux is shitall sd3 variants are shitxl for another year, at least
>>102931440Actually I'll just box them:https://files.catbox.moe/ckafyh.pnghttps://files.catbox.moe/rc6ni1.pngBoth genned with the 4-step Turbo version (hence the kinda plasticy Fluxish look). The more saturated one is DPM++ 2M SGM Uniform, less saturated one is Euler SGM Uniform.
>>102931509IS OVER, THE ONLY GOOD THING WE HAVE IS A FUCKING OUT OF VRAM VIDEO MODEL!!!! Time to left the hobby and seek mental help.
>>102931490SDXL is stagnant because of a bad text encoder and generally slow training. Sana is going to be the small model king that replaces SD 1.5/SDXL because it can actually be trained locally. Feel free to refer back to the 600m Pixart gens from the other thread. Sana also has some good gens so I think outside of skill issue gens, it's going to be good especially after a finetune (which anyone can do). Pixart significantly improves with finetuning.
>>102931289No, penises dont exist, well, cronenburg ones.
>booba in a base modelWE ARE SO BACK BOIS
>>102930369noob mistake baka
>>102931526if it was 3b maybe. right now there is no reason to replace xl with a smaller model. maybe a from scratch model on the architecture, but neither 1.6b sana nor 0.6b will ever get mass adoption because they're simply worse
>>102931532The future of all company trained models will be this weird thing where human anatomy and sexuality is always wrong or scrambled and no one known exists.
>>102931551Wrong, 600m Pixart generates kino gens. So 1.6B Sona is going to generate even more kino gens. It's that simple.It's ironic too because you just saw both 2B and 8B SD3 being ass. So maybe it's not all about parameters.
>>102931526holy heck, im so excited to fix the autoencoder compression with just a little bit of training!!!!!!!!!uhhh let me just sprinkle a bit of parameters as wellthis is it, we are SO back
>>102931532I think half the reason they never do downstairs is cause they don't wanna risk having the "constantly gives dudes pussies" problem that like 80 or more percent of SDXL checkpoints still have to this day lol
>>102931575Somehow I think you'll be upset no matter what
>>102931546ya thats the stuff i've been missing from these generals
>>102931589thats right, you got me, im a flux agent sent to sabotage the great PIXART supremacyim malding hereim balding hereim farting herebut im still not using sana IM SORRY
Sometimes it seems that anon has a little too much of a vested interest in these fun toys
>>102931607No, I've determined that you are severely depressed and that you're incapable of experiencing happiness.
>>102931558nah it generates melty nonsense. if pixart was "kino" it would've been adopted. it never was. dead-end bad models
Excuse the kino posting
>>102931618thats right... at first i was working for the great PIXART empirebut then, sana had sex with my dog....and now i have joined the flux evil secret society and swore a revenge...
>>102931546post more man i need my fix
>>102931558If you think SD3.5 is ass there's no way you like anything or ever would lol
>>102931657
Sana won
is dis cute?
oh sweet new lykonslop just dropped? surely its 2.5x as good as xl based on that beefy param count, and 10x as good as pixart!
>>102931401Okay, found it:https://github.com/kakaobrain/karloWhich can be used here:https://huggingface.co/spaces/kakaobrain/karloThis was as far as the UnClip technology went before being replaced by Diffusion technology
>>102931289>especially the detailsThat's the VAE's job, the rest of the thing just makes a composition and what we love is added by the VAEIs that it? Can't someone make a version of SD3.5 that uses Flux's VAE? That's the entire concept with the VAE being separated, that you can use it with other image models.
>>102931312It's been estimated that Dalle 3 is a 4B model, so we could run it fine......Okay, at least give us its dataset, some dumbo could outdo Dalle with it!
can i run any locals on a 6700xt yet?
>>102931473Well the issue I'm seeing right now is that just like with LLMs, there doesn't exist a single modern model to serve as proof that is both small-ish, smart, and creative. For LLMs, it's normally understood that a diverse dataset means both an opportunity cost, by training on everything under the sun instead of the targeted data you want your LLM to be good at, and a quality loss, due to architecture and training methods. You can't have something that is both SOTA in smarts and SOTA in creativity in a small model if you use a completely raw dataset with no subject area and quality control. Part of the reason that LLMs are usually not trained on old books despite all the data that could come from them.SD1.5 may have been smart and creative at the time, but right now we only have DALLE 3 to serve as an example of what both a smart and creative model looks like, except that we don't know its parameter count nor how much data its been trained on, meaning that its performance level could be due to "cheating" on both of those factors, and in fact it's not possible to do for a small local model without major improvements in other areas like architecture.Though since I do have a sense that SAI and BFL have filtered their datasets for safety reasons rather than to increase performance, I could imagine there is likely performance improvements (with respect to smart creativity) left out. Though my estimate is that it's probably not as much as one would hope and we need to push parameter sizes and/or dataset size in order to get to DALLE's level.As for DPO, honestly that's a post-training method and if one has used it, that means their model is not a foundation model but a post-trained model and you need to criticize them for not releasing the pre-trained foundation model, rather than for using DPO as their post-training method.
>>102931314What people have come to is using LLMs to create prompts for image models to draw...I guess if you want a surprise, you use it and copy and paste it without reading it, so the image is random and surprising.
You don’t deserve the full weights. Sorry.>t. Stability poojeet CEO
Is Hand Refiner still the best way of fixing hands, or there's something better now?
>>102931761long live the new flesh
>>102931802I really hope dalle3 is like, 40b at least. if its actually only 4b then it would make every single other model look like an absolute embarrassment.
>>102931357Using a LLM to retag images, losing information like what character is it or what artist made the painting is pure incompetence.Intentional incompetence.
>>102931823c64 helldimension fleshlight integration
>>102931362The face is really ugly and distorted when it's small.Just like with every image model around here.
>>102931851me first
>>102931872fuck off thats my hole
>>102931370SPOILER ALERTSD3.5 Medium with NOT be better than SD3.5 Large. It's for people that can't run Large and all the new technology is just there to alleviate the loss of quality, if it's not in Large, it makes things worse, it has to be used to save parameters.
>>102931372I've released 11 image models for the month of October, averaging one every 2 days, but did not release one today, I guess whatever I released would have been eclipsed by everything else.
>>102931903>I've released 11 image models for the month of October, averaging one every 2 daysjeetmixes dont count, anon
>>102931813>Though since I do have a sense that SAI and BFL have filtered their datasets for safety reasons rather than to increase performanceI always wonder how much of that results in the fucked up anatomy for limbs or pose.If you filter anything "unsafe" (aka just nsfw most of the time is what they mean), wouldn't the models be worse for it, vs an approach where their own hosted access is controlled like what DALLE does, but what they release isn't their responsibility anymore ?
the pajeet model is very good at cultural enrichment :)
>>102931357This seems like a mistake. The less meta-captioning you do, the less the model has the ability to separate its learning about things, so you essentially end up telling the model that a cartoon like this, but oh no actually a cartoon like this other thing, so what it knew about past cartoons is partially overwritten and you need constant repetition of past data to make it not "catastrophically" forget the things it learned in the past. So basically by removing the meta data, you spend more money to make the model perform as good as it once did. And for transformers, the cost of this is HUGE.
>>102931473You'll notice they only do it when someone points out "the king has no clothes", what we need is a kind with the balls to run around naked, and not care. To stop being prude.
>>102931941Keep in mind most nsfw detection is really just skin tone detection, so you end up throwing away a lot of good images. And seriously at this point it's obvious no one is vetting anything, they're just trusting the numbers which is why we keep seeing these bullshit "look at our dumbass score" metrics for models that are clearly not the same quality.
>>102931954>To stop being prude.Tbdesu if they fear journalist retarded clickbait articles and faux outrage on social media, anything they do will lead to that anyway, so why care? I refuse to think all the people are prudes themselves.
>>102931509No, I'm sticking to SD1.5, people are still releasing good stuff for it.All SDXL based models look the same, and I was not a fan of the PonyXL's branch style.
lotta talk for a bunch of retards whove never trained a base model in their lives
>>102931960>so you end up throwing away a lot of good imagesa fucking shame
>>102931516And that's it? It isn't in the news?LATEST VERSION OF STABLE DIFFUSION ALLOWS FEMALE NIPPLES!!!See? It's not a big deal, they should have allowed them since SD2.0
>>102931558You made me realize how much Sona sounds better than Sana, BTW.Sana means "heal" in spanish.
>>102931947which is why all the "art" generated by these vlm models looks so fucking bland. it all just gets tagged as "a digital painting of" with no unique descriptors for the style. resulting in an absurd amount of information loss. the equivalent to tagging every wheeled vehicle as a "car"
I like it, I think it's neat
>>102931827Well, it "cheats" by using GPT-4whatever to rewrite the prompts and some of the creativity you see may have been added in that part, so raw Dalle would not be as good, but raw dalle's quality could be achieved without the parameter bloat.
>>102931816qrd on picrel?
>>102931920Why not? Some of my favorite models were jeetmixes.
>>102931947Really? So what we need is technology that remplaces transformers, if training becomes cheap then anybody can made the model of our dreams.
>>102931960I think Playground was the worst offender, they were claiming to be better than Dalle 3 and Midjourney 5.
>>102931947this is why flux learns things so quickly, you aren't really teaching it anything new with those 10 images loras, you are just making it remember something it forgot
>>102931974It was funny all the things they did at Google for their Imagen model release backfired and they got exactly what they tried to avoid and then removed humans entirely from their generations.>we... huh... have no idea what race, ethnicity, gender and sexual preference the human in the drawing should represent when you ask for "person", so we're banning the generation of humans completely.
>>102931947Wait until you realize we do something called "dropout" when we train
>>102932083I used Dalle through ChatGPT before and I don't believe that's the case. You're able to investigate the prompt for a generated image, so you know when the LLM has changed it, and you're able to tell ChatGPT not to change it so that you in the end do get the exact prompt sent to the image generator. Additionally, it was possible to fix the seed to get reproducibility. And in my experience doing this, I do think the base model was still pretty creative. It really is just a powerful model in my opinion.
I just had a good sleep, can you sleep tonight anons?
>>102931984Imagine me selling my bitcoins to train a base model only to see them soar to 100k an regret missing the profits.
>>102932158It was a rare case when even normies noticed the sheer absurdity of all of it.I want more of these.
>>102932199No doubt, the question is how many parameters you need to do that? If someone offered a 1 million dollar bounty to the first person that made an image model with the power of dalle 3, they may not need more than 4 billion.Because, so far, we still haven't seen what the rest of billions are being used for.
>>102932302Dalle isn't as impressive as you think, it really is just like a 4B model but with a mostly uncensored and fully curated dataset that it was properly trained on. Most of the models we get are slopified from top to bottom including a censored dataset and even rushed out the door before it's properly trained and worse, usually getting a lobotomy pass.
>>102932281I want those things to cause a realization and change of paradigm so that new people learn about it and stop the absurdity.But, noooo, let's make a new definition of safety even if our investors run away from us.
>>102932369I'm still impressed by its outputs and can't wait for the day an open model surpasses it.
Not quite
>>102932407SD3 Large will be able to when someone does a real uncensored finetune that includes pop culture knowledge.
>>102932390Honestly I think the google one was so bad it was a one off, I've never seen this safety/censorship bullshit being mocked in normie spaces like this in recent years.
>>102932369>with a mostly uncensored and fully curated dataset that it was properly trained onI think that's the impressive part at this point.All the open models seem to compete on is to make the same "super safe" stuff at the cost of nicer results.Having played with it, I'd be very happy if dalle was leaked, since at this point I lost all hopes from any local models to ever change their way.
can someone summarize which models were out today and their size/goal (image or video)?
>>102932426nobody has ever done this, because nobody knows how to scrape a dataset at that scale. sure we have anime finetunes, but never a full scale "fix the art" finetune that pulls from all over while maintaining proper captions.
>>102932563wait for our bro cefurkan to release a detailed analysis
The Mochi demo is up, here's my first attempt at a Miku
>>102932584So it's essentially impossible to fix on a local/non company level?Damn that's grim.
>>102932604Damn, ordering my 5 H100s now
>>102932617It is possible but requires an extreme amount of curation and community effort to cover all possible character and style niches and ensure they are properly captioned
>>102932680And I guess something like the DALLE dataset has no open equivalent anywhere?
>>102932680I believe in autism but maybe we are reaching its limit
>>102932158>>102932281Oh man the Gemini diversity debacle was hilarious. Even the main director at my company asked me what was up with that and I didn't even know where to begin haha.
>>102932714It's all public data really, it just needs to be scraped/curated. It's likely all in common crawl. The real stumper is how they managed to preserve niche things like Blazblue and Fire Emblem while also using their AI captions. As Lykon said, its thanks to shitty vlm that IP was lost >>102931357 so how was dalle able to preserve it?
>>102932604how long did inference take?
>>102932789Probably using a great early version of gpt vision model to mass caption, with no censorship?
>>102932862the question would be, why all the others seem to suck, unless they're happy it sucks because it kills IPs, artists recognition and most nsfw by design
>>102932420
>>102932740As hilarious as it was, I don't think neither google nor the public learned much from it in the long run.
>>102932979getting closer
If they used exclusively synthetic captions, why was their sample image of the woman in the grass captioned >~*~aesthetic~*~ #boho #fashion, full body 30-something woman laying on microfloral grass, candid pose, overlay reads Stable Diffusion 3.5, cheerful cursive typography fontI'd also love to know how they got such unremarkable and average-looking people for their example images, because all I'm getting is flux-tier Instagram clones with fish lips.
got banned from /pol/ so dumping this here
>>102933165can you gen donald crying while kissing vladimirs feet?
>>102932264Imagine how cool the model will be tho with you at the helm
>>102933179do it yourself retard
>>102933193do some with donald in a diaper bro
>>102933193nice cleavage pose
>>102930087I wonder what reference images it used to copy
>>102931168flux still wins for mikus
>>102933441Is that real?
who has good prompts or methods to getting good results with prompts?
>>102933497Censored by Bing?> Create a small prompt with a figure you want in it> Once you can get the figure you want, play around with the background> Only add one or two words each time, use complete sentences, and refer to anything censored indirectly> Incrementally increase the complexity of your prompt, pushing it towards your desired contents.> If you add words you think will trip the censor, space them out from the part of the prompt you are working on. Example: rabbi at the beginning, big nose at the end> Bury naughty words in separate sentences. Even if that sentence is talking about something else, DALLE will figure out what you meanFollowing this procedure will help you build an intuition about how to write the most effective promptshttps://dallery.gallery/the-dalle-2-prompt-book
>>102933497chatgpt has some good prompts
>>102933491>>102933497a lot is trial and error. Certain ways seem right, until someone does it a totally different way, showing that actually you can't be too sure.One opinion is to be extraordinarily specific.You can use text (LML) ai to help build long prompts, you lazy loser.
>>102933530cry harder faggot
HANDS FREE SLOPPING> Words -> salad with chatGPT> Text -> image > Image -> video > Edit/Subtitle> Converter to WebM>?????>PROFITOLD MEME GUIDES:https://files.catbox.moe/3az283.jpghttps://files.catbox.moe/e5mzsc.pnghttps://files.catbox.moe/5ix69v.png
>>102933544Is this real?>>102933536set yourself on fire. ok thank you
>>102933552mad??
>>102933568I'm patient. Google "how to set myself on fire". thanks
>>102933585>regurgitate blah blah blah
>>102932584I have millions of captions, it's not that hard.
jesus this model comparison someone just posted on reddit is ruthless.sd3.5 vs flux. which column is sd? the shit column.
>>102933544based
>>102933713skill issue
>>102933713They both look like high quality AI slop and I mean that in a derogatory way
>>102933713True, but I want to live in the ball.
>>1029337133/3 it's a tie for me
>>102933730<he won't live in the ball<he won't be happy
>>102933713The shit column is the one that says you will tune nothing and will be unhappy.
SD3M finetune status?
>>102933544People look better in 3.5. I'm so SICK of seeing the shiny buttchin flux face
is there any reason models can't draw empty chessboard. Lack of parameters/dataset? I think the problem is deeper than that since you can't even do it with lora. Diffusion probably still bad at drawing repeative patterns without hallucinating shits
>>102933792Because models are fundamentally hallucinating pixels, there really isn't much reasoning that happens with these models.
>>102933713Right is trying way too hard to be midjourney. Left feels closer to the outputs you would get from a proper base model, without the so-called "aesthetic tuning" (overfitting on artsy images). Also the fucking flux face, it never goes away.
>>102933792tagging issues.tagging for different positions on the boardtagging for different types of boardstagging for positions of a flying birds wingyou can get away with a lot with a 4-5b model, the issue is that other than dalle, all base models are tagged like garbage
>>102933677And yet no one was able to do that locally outside of mass importing tags from boorus.Even less captions with artists, nsfw, IPs...
>>102933792>>102933803>INB4 /x/ schizo-analysis on why free masonry is 'cool'
>>102933809>Also the fucking flux face, it never goes away.The buttchin is my go to, to detect flux made gens lol.
>>102933792sounds like a tagging issue. every image related to chess gets tagged with "chess" and there are more chess boards that have pieces on them, so the "chess" word gets stuck with always having some pieces. ("chessboard" too)"empty chessboard" is two words and most AI are too dumb to handle this.>I think the problem is deeper than that since you can't even do it with lora.I bet you can, but the trigger word should not be "chess", instead a gibberish word.
>>102933785probably never
>>102933833>should not be "chess", instead a gibberish word."empty-chess"
>>102933833The thing with captioning is they're positive biased, meaning the model describes what it sees, not what it doesn't see. Very few captions use terms like "empty", "blank", "void", etc.
>>102933833>"empty chessboard" is two words and most AI are too dumb to handle this.and also this is like "create a room without a pink elephant in it".
>>102933792train a lora on one good image, with emptychess as its tag?
>>102933814All you have to do is combine the alt / search title with the caption. I built a lot of my dataset by searching by artist, character, etc.
it seems to me that image models would greatly benefit from the knowledge that video models have.
>>102933892video models are just image models or what I like to call "motion pictures"
>>102933867Yeah, but you don't realize that actually ai hasn't been trained to know what a chessboard is. It knows what a checkerboard is. It knows what checkers are. It knows "chess" to be the pieces. No pieces, it's probably not chess.
>>102933892I'd say they're censored/limited the same way
>>102933867>without a Negatives exist
>>102933867Something to mention though is the Transformers only models that work only in tokens can handle this problem more because the prompt and images share the same token space which gives the model some ability to understand a prompt with "without". It's the same line of features as being able to to modify an image through prompts ie "change the monkey into an elephant".
>>102933874here >>102933833base models know "chess" & "board" very well but can't differentiate between each chess piece and board. Even with lora, it draws extra squares or uneven grid and shit.
>>102933990Yeah because you literally don't understand that these models are hallucinating blobs that coincidentally align with text prompts. You're like someone who asks ChatGPT to count. These models don't work like that. Even Flux can't even consistently do normal hands and limbs and you're asking for precision chess board reconstruction.
SD3.5 is underwhelming. It's still bad with human anatomy.
>>102934009>these models are hallucinating blobs that coincidentally align with text promptsabstractartfags win again
>>102933951>Transformers only models that work only in tokens can handle this problem more because the prompt and images share the same token spaceThis is actually quite interesting. There's no making up for that with newer architectures?
>>102934044I'm not sure what you mean but Emu3 is an example of a model that works like this. Basically you have a model that can Chat, Generation and Caption images.
>>102934009>because you literally don't understand that dood, I said that in the original post.
Next Bred>>102934088>>102934088>>102934088
>>102930087What's arguably the best pony model right now
>>102931393is there a conference coming up? that's big shit for smaller industries t. am on a trip for one now
>>102931505that won't happen for China, and that's why it'll be the country that'll win in the long term
>>102931785>Can't someone make a version of SD3.5 that uses Flux's VAE?you need to modify the VAE so that it can work with SD3.5 I guess
>>102931836it's not incompetence, it's intended, they want to remove every single artist and celebritie's name to not get liability, they have 0 balls, only MJ have them
>>102933165>>102933193what model you used anon?
>>102934613flux with some XL facedetailer