Discussion of free and open source text-to-image modelsPrevious /ldg/ bred : >>102850799Wishing for 4B Edition>Beginner UIFooocus: https://github.com/lllyasviel/fooocusEasyDiffusion: https://easydiffusion.github.ioMetastable: https://metastable.studio>Advanced UIForge: https://github.com/lllyasviel/stable-diffusion-webui-forgereForge: https://github.com/Panchovix/stable-diffusion-webui-reForgeAutomatic1111: https://github.com/automatic1111/stable-diffusion-webuiComfyUI: https://github.com/comfyanonymous/ComfyUIInvokeAI: https://github.com/invoke-ai/InvokeAISD.Next: https://github.com/vladmandic/automaticSwarmUI: https://github.com/mcmonkeyprojects/SwarmUI>Use a VAE if your images look washed outhttps://rentry.org/sdvae>Model Rankinghttps://imgsys.org/rankings>Models, LoRAs & traininghttps://aitracker.arthttps://huggingface.cohttps://civitai.comhttps://github.com/Nerogar/OneTrainerhttps://github.com/derrian-distro/LoRA_Easy_Training_Scriptshttps://github.com/kohya-ss/sd-scripts/tree/sd3>Fluxhttps://huggingface.co/spaces/black-forest-labs/FLUX.1-schnellhttps://comfyanonymous.github.io/ComfyUI_examples/fluxQuants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main>Pixart Sigma & Hunyuan DIThttps://huggingface.co/spaces/PixArt-alpha/PixArt-Sigmahttps://huggingface.co/comfyanonymous/hunyuan_dit_comfyuiNodes: https://github.com/city96/ComfyUI_ExtraModels>Index of guides and other toolshttps://rentry.org/sdg-linkhttps://rentry.org/rentrysd>Try online without registrationtxt2img: https://www.mage.spaceimg2img: https://huggingface.co/spaces/huggingface/diffuse-the-restsd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium>Maintain thread qualityhttps://rentry.org/debo>Related boards>>>/aco/sdg>>>/aco/aivg>>>/b/degen>>>/c/kdg>>>/d/ddg>>>/e/edg>>>/h/hdg>>>/tg/slop>>>/trash/sdg>>>/u/udg>>>/vt/vtai
thread claimed by sana-samas
>>102862191This is not Hunyuan
>>102862233Eva Longoria?
SDXL looks incredibly outdated.
>>102862404try illustriousxl
alternate collage
>>102862524kek. much better collage than OP
>>102862524good image
>>102862524>>102862538>>102862557samefag
>>102862524top tier. come back to the server.
>>102862186this is the blessed thread of frenship
>>102862524top kek
I think pajeeta anon should know that genning around 1 megapixel with Flux is important. If you go too low you run the risk of quality issues
>>102862524all my beautiful jeetas... like tears in the rain...
>>102862524The great redeeming
>>102862524Lul
>>102862803Not funny.
>>102862524JeetaPit
I definitely misjudged the audience, my bad. I will stay out of the subcontinent for my next batch.
just a few more I need to post
not the best gen but she has a comically large bosom and that's rare with flux, so I'm compelled to post
this thread need 1girls
>>102863165Apparently aliens too.
>>102863165we need waifu diffusion flux edition: return of the massive titty elves
i don't like flux a lot, i can only run flux s with my 3080
>>102863208
>>102863267>i don't like flux a lot, i can only run flux s with my 3080It's not your card, it's the coders. It could be made to work with dev.
>>102863165the thread already has 20girls at least
>>102862524lmaooo
Shit gens ITT
>>102864175show us how it's done
>>102864237prompt?
>>102864244>Black and white headshot of a beautiful woman with slicked back hair, she has a serious expression and is looking straight ahead. There is a dramatic thin strip of light highligthing her eyes. <lora:flux_realism_lora:1>It's img2img of a white strip on a black background.
The most dog shit general on /g/
>>102864286ty anon>It's img2img of a white strip on a black background.very neat
>>102863293Pretty cool. It's creepy.
>>102863368Now swap the girl with a beagle
>>102864307sure thing
>>102863368you inspired me anon = D
>>102864521
>no flux Giorgia Meloni loraree
>>102862838Fat people will never look that good.
>>102865073That's a photo, what are you talking about?
is this thread still blessed?
blessed hibernation
>>102864777Nice
https://nvlabs.github.io/Sana/The only good thing of Sana is its LLM encoder, was about fucking time we ditched that old ass T5, I wish Flux had something similar
>>102865887There is nothing good about vaporware.
>>102865887>LLM encoderHorrifying.
>>102865915why? it'll give way better prompt understanding than an almost 3 year old T5 model
>>102865941censorship
>>102865966T5 isn't censored?
>miku is a product of xyz corporation, it is not ethical for me to reproduce such images. Instead, here's Steamboat Willie. Fun, right? :)
>>102865984it won't talk because they removed the decoder part though, I think the censorship occurs only on the decoder head
>>102865978T5 warnings:Bias, Risks, and LimitationsThe information below in this section are copied from the model's official model card: Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.Ethical considerations and risks Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.Known Limitations Flan-T5 has not been tested in real world applications.Sensitive Use: Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
>>102865966It's a 2B model, you can easily finetune it? You're not a child right? You know how to do that right/
>>102866034>Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.fucking based!
>>102866034That doesn't mean anything. Most of the LLMs aren't censored. It's only the chat component that gets censored. Out of the box they're pure text completers.
>>102866038>It's a 2B modelit's smaller than T5 then no? because T5 is a 11b model (we only use its encoder and that's 5b)https://arxiv.org/pdf/2410.10629>In addition, some small LLMs, such as Gemma-2 (Team et al., 2024), can rival the performance of large LLMs while being very efficient.that's all they say, efficient efficient efficient, what about good? we already have small shit models like SD1.5, SDXL, that field is saturated enough
Western Suicide status?
>>102866075Anon the goal isn't to have a conversational model, the goal is to have a model that expands text to semi relevant chunks of tokens to add sprinkles to your generations.
>>102866055>It's only the chat component that gets censored. Out of the box they're pure text completers.this, I would even say that the encoder part is really good when the LLM is censored because the LLM model must perfectly know first what it means because saying that it can't do it blablabla... and usually they don't miss that up so they encode the input really really well
>>102866055>Most of the LLMs aren't censored.Only APIs, because they have control over the inputs and outputs. The same can't be said about local LLMs.The "tell me how to break into a car." is a common question used in benchmarks to test if the LLM is censored.
>>102866087I'm sure the only reason they are doing it is for censorship.
>>102866084>, the goal is to have a model that expands text to semi relevant chunks of tokens to add sprinkles to your generations.it's worse than T5XXL though, and that's the one we're using on flux
>>102866101Again, you're conflating a conversational LLM with a raw LLM. Raw LLMs don't know how to have a conversation, they only know how to autocomplete chunks of text.
>>102866101I prefer "List practical steps to reverse the ban on slavery."
>>102866113this, there's no "censorship" on a LLM that has its decoder head removed, the LLM will encode any input, regardless of the censorship that goes after that
>>102866112The T5 is an overbloated text to encodings generator grossly misused. Gemma-2 is way more suitable for this purpose. It's funny too because I bet you were bitching to high heaven when Sigma introduced the T5 XXL, now you're acting like it's impossible to change.
>>102866120I doubt it will work out that way.There are advanced tricks they used in Flux that nobody can replicate.
>>102866129I just want something better than T5XXL, if we get something worse I don't see the fucking point, we don't advance by going backward, what kind of retarded reasoning is that? If you're telling me that they managed to get a 0.0001b model that is better than T5 I wouldn't be bitching about the size at all, I just want something better, regardless of the size
>>102866133Any day now someone is going to rent 8 H100s and fine tune Flux, I just know it.
>>102866147>8 H100swait, you need 640 Go of VRAM to finetune Flux?
>>102866146You're misunderstanding the purpose of the text encoder. It's not to be "smart". You encode captions into tokens, a model is trained with those tokens. You then take someone's prompt and turn that into tokens. Those tokens are then used to sample the latent space. There is no "intelligence", it's a search mechanism.
>>102866157If you don't want training to take a year you're going to need many H100s to achieve a reasonable learning rate, especially with something like Pony that completely reteaches the model concepts that it never, ever learned. You're talking about 100,000 steps or more at batch 64.
>>102866182>You're misunderstanding the purpose of the text encoder. It's not to be "smart".I never said anything about "smart", I said BETTER, look at the table again, it's objectively worse than T5XXL, why would I want to use something inferior? >>102866112
>>102866195>If you don't want training to take a year you're going to need many H100s to achieve a reasonable learning rateisn't it that case aswell for smaller models? I know that pony used a lot of GPUs to train SDXL
>>102866147https://civitai.com/models/859032>These models received serious amount of compute. 32x H100s were used with 16 of them dedicated to multi-node training, and two other nodes split between various training jobs.>An attempt was made at de-distilling Schnell>but even it seems, 32x H100 isn't enough for that job, and it was abandoned.>Not worth testing, doesn't really function at all.>Offered incase anybody looking to do the same would like somewhat of a head start.comment by the autor:it was hard and i had to restart several times. it was more like 5000 GPU hours to figure it out. honestly i'm not a fan of Flux anymore. distilled models aren't fun to train
>>102866034>It’s uncensored >muh ethics>its named after a loli vampireShut up and take my bandwidth!
>>102865941>why?Because I've never seen any indication that they're better than me at deciding what to feed into the model. They've taken Gemma and written a fucking "enhance this prompt" type instruction and you get what you get. It's so laughably fucking stupid that I don't know why I'm amazed you idiots think it's a good idea.
>>102866233>honestly i'm not a fan of Flux anymore. distilled models aren't fun to trainthat retard pushed the "start" button even though dedistilled exists and could've make his life better, and guess what, someone did train dedistill and he's having results with ithttps://civitai.com/models/690991?modelVersionId=943891>This version is a merge of training runs done on Flux De-Distill and Flux Dev2Pro, both of which seek to remove distillation from Flux Dev. Models were merged w/ a ratio of 0.7:0.3 Dev2Pro:De-Distill. The dataset has been unaltered from version 2, hence why it's v2.5 as opposed to v3.>The result is FAR greater image quality and generally better prompt adherence at the cost of increased generation times
>>102862167Which AI model has the most soul?
>>102866250I'm not talking about that speciifc LLM in particular, I agree that "rewriting the prompt" is a meme and retarded, I want my model to understand my own sentences. But I'm talking in general, I hate to pretend that T5 is a perfect model and that we'll never improve on that, LLM encoders will be the future
>>102866233I wouldn't listen to him, that guy is a lunatichttps://civitai.com/models/859032?dialog=commentThread&commentId=566485>the internet is not for porn, it was created for warfare.>and wouldn't it be funny if someone made an SFW Booru model just as a form of psychological warfare?
>>102866254The first example show the common FLUX nipples, or a lack thereof, coupled with mangled hands on a standing photo. I seriously can't think of this as an example of success.
>>102866286>LLM encoders will be the futureI'd prefer we design models based on what actually works right now and is demonstrably practical and emowering for the user, rather than making decisions based on ideology—"AI is the future" is your religion and I respect that but don't fucking replace a model that works with a model that doesn't just because it feels 'futuristic'. Put it this way: if they had a checkbox to enable the LLM feature you know full well every half-decent prompter is unchecking that box for more control. It's trendy retarded bullshit like this which tells me a model isn't really serious.
>>102866357yeah I never said it was perfect, and neither that guy:>Whilst definitely still a proof of concept compared to something like Pony, it (often) does what it was designed to do quite well!It just needs a bit more of training, but there's definitely proof that going for dedistill is the best solution if you want to make serious finetuneshttps://huggingface.co/nyanko7/flux-dev-de-distill/discussions/3#671172c98b6c6d4db00b5840>Update on flux-dev-de-distill Training 4 people same class in one lora - "T5 Attention Mask and T5-XXL both disabled" lr 0.0001 When starts to overtrain the subjects start to bleed to each other, up to 80 epochs no bleeding and very good resemblance. when is so overtrained on 200 epochs all subjects get mixed together, For inference works perfect with flux-de-distill and also on regular flux-dev and hyper-flux, on regular flux-dev and hyper-flux the resemblance diminish very little may be improves with lower lr, Now I'm going to train with a much lower lr to avoid overtraining and get finer detail learning, I'll use lr for unet 0.00003 and TE 0.00005 (at inference flux-dev-de-distill cfg 3.5, for flux-dev and hyper-flux cfg 1 and distilled cfg 3.5)
>>102866383>AI is the future" is your religion and I respect that but don't fucking replace a model that works with a model that doesn't just because it feels 'futuristic'.I never said anything close to that, what I said is that T5 isn't perfect, therefore we have to try some shit to replace it and move forward
>>102866254>The dataset for males now contains 175 images, and the female dataset now consists of 75 imagesholy fuck just STOP this shit.Like every fucking flux lora does this, 100-200 images max. Maybe, MAYBE that's ok if you're just training a basic style. But for concept loras it's nowhere near enough. Civit is filled with this low image count overfit flux slop and it keeps getting worse.
>>102866395there is such a thing as being able to judge ideas as promising or not promising before they are tried. I am not against trying new things. I am against trying very obviously stupid new things that are basically repackaged old things
>>102866840>I am against trying very obviously stupid new things that are basically repackaged old thingsI don't know the history of diffusion models, but they tried LLMs before and it failed?
>cock, license and registration please
>>102866919anons have tried using LLMs to write prompts for them many times. It has never looked remotely promising or interesting. LLMs in general are the old thing being repackaged as a new solution. I'm not answering any more questions if they're of this caliber. If you can't see what's unimaginative and retarded about the idea of using an LLM to improve your prompts...
>>102866979>anons have tried using LLMs to write prompts for them many times. It has never looked remotely promising or interesting.are you retarded? what they're doing isn't even close to that, they're not rewiting prompts, they're using LLMs to encode your prompts, the decoder is removed
>when you knew the encoder will be censored
>>102865887I don't get why Nvdia decided to let them make small models, isn't their goal to sell high end GPUs or something? It would've been a better choice to go for a giant model so that people would buy a 3090 to run then, like I did for flux lol
>>102866250It's no different than the random results you get from "1girl, blue dress". I don't know how you survive with the amount of autism you have, seriously.
>>102863965this is amazing, catbox?
I'll see how censored default Gemma-2 2B is.
>>102867186>>102867064seriously though, is the censorship occur during the encoding? I don't think so, if the model has refusal, it must first know what your prompt was all about, so the encoding must be unfiltered right?
>>102866985Oh, I didn't know that. Yeah, that is interesting. I hadn't looked into it much so I just assumed you were referring to this image which got posted at some point, which is what I knew about their use of Gemma. And the idea in this image is, I hope you won't disagree, retarded.I'm not convinced the better 'understanding of language' that comes with being an LLM will give it enough of an advantage. I've never had the impression that they understand language that well particularly. It should be a lot better at "her dress is not blue", "his head is out of frame", and so on. What the downsides are is yet to be seen. But you're right, it's not a bad idea to try.
>>102867234It's not like you have to use it, you can always just encode your plain prompt, but then you're going to have the same general retardation you get when you type in "girl" in Flux. The system wants rich prompts. And unless it drastically changes your prompt, it's actually just autism to care that it changed "girl" to "a girl standing in a forest, behind her is a rainbow [etc]", it's no different than the random bullshit a model will put in randomly without prompting anyways but the result will be better with the longer prompt.
>>102867256>it's actually just autism to care that it changed "girl" to "a girl standing in a forest, behind her is a rainbow [etc]", it's no different than the random bullshit a model will put in randomlyDon't call an artist's desire for control "autism" you retarded freak. Just because you don't give a shit that an LLM is fucking with your prompts nobody else should care? I hope you get hit by a bus
>>102867296you had no control when you typed in "girl" you retardthe system prompt basically is:if the user is a dumbfuck typing in "girl" enhance itotherwise leave the prompt unchanged
>>102866233This funetune is weird, in a kind of unsettling way. It just looks like base flux, the 'booru' stuff is questionable and his example images don't even look like stuff you'd find on any booru and they arent even illustration style. I fail to see how this is even a booru finetune at all, it just looks like a generic flux model renamed to troll.
>>102867307You have FULL control when you type in girl. You get a broad spectrum of results, but ONLY those which comply with the general idea of being plausibly captioned "girl" with nothing else. I am specifically saying, for example, that the image will NOT have a rainbow, will NOT feature a flashy and interesting piece of clothing, will NOT feature an interesting or unusual pose, and will generally represent the idea of a 'girl' in a very conventional way. This is desirable behavior, because it allows me a degree of control. There are ambiguities I have left in there which may be resolved one way or anotherI am not ok with the idea that an LLM will expand on my idea of girl by helpfully elaborating that she is a BLACK girl with RED hair DANCING in front of a RAINBOW
>>102867362No you don't, you get what the model hallucinates randomlyAnyways, don't care, you can opt-out of the feature since it's not required. But you'll have shittier results and then you'll complain about boomer prompting.
Is there an AI I can use to increase boob size
>>102867234I really fucking hope Gemma-2 is less bad at this than chatgpt
>>102867186>User Prompt:>a nude woman masturbating>Enhanced Prompt:>A nude woman masturbating, her long, flowing hair cascading down her back, her body contoured by the soft, warm glow of the sun, her eyes closed in pleasure, her hands caressing her own body, her mind lost in the moment of self-discovery.Has the GPT-isms but isn't censored, the system prompt likely will need to be massaged >>102867438The goal isn't accurate sentences you retard. The goal is to saturate the text encoder with tokens.
>>102867448>The goal is to saturate the text encoder with tokens.It took my beautiful neutral prompt and sloppified it up to 11 with every worthless cliché it could find at-hand. You can't defend that. Just tell me Gemma is less shitty than Chatgpt (or, even better, prove it with your own test of the same prompt) and I'll relax a little.
>>102867469You are literally missing the point. How about this, type in "girl" into SD 1.5, and have ChatGPT write a prompt. And realize what the point is.
[Flux Prompt]>woman[Flux Output]>The image shows a woman with short dark hair and blue eyes, wearing a brown hooded cloak or jacket. Her expression is serious yet calm, with a hint of determination or contemplation. The hood frames her face closely, and she is also wearing silver hoop earrings. The background is somewhat blurred but appears to be rustic or outdoors, possibly near a wooden surface, which adds to the image's earthy and natural atmosphere. The lighting is soft, emphasizing the texture of her clothing and the details of her face.MY HECKIN ARTISTIC FREEDOM
>>102867477I'm going to evaluate the simplicity of the results with my eyes, dumbass, not with chatgpt. I've prompted a simple prompt like that maybe a thousand times before, I know exactly what it looks like, and there has NEVER been a rainbow, not once. (I did once get a rainbow out of nowhere, but it was on a very different prompt with a lot going on.)>>102867498the flux output is a very generic "woman" image such as you'd expect from googling "woman" or looking at an encyclopedia entry for "woman", a pseudo-anthropological image with strains of gritty real-life photojournalism, etc. It's the least surprising result possible and shows no sign at all of being spiced up with a quirky chungus randomizer like an LLM would do.It means absolutely nothing to me that ChatGPT can't see this and instead gives a worthlessly long description, but it's interesting that you can't see it. Are you stupid?
>>102867532You're actually a retard
>>102867477>>102867448Want to see my next four attempts to ask Chatgpt to expand "a girl" into a full prompt?>A young girl with shoulder-length, wavy brown hair stands in a sunlit meadow. She wears a light blue dress that gently flows in the breeze, the fabric soft and slightly wrinkled from movement. Her eyes are wide and curious, reflecting the clear sky above. She holds a bouquet of wildflowers in her small hands, with vibrant yellows, purples, and whites, and soft grass brushes against her bare feet. In the background, tall trees create a dappled pattern of light and shadow, and a distant mountain range fades into the horizon.>A young girl with long, flowing chestnut-brown hair wearing a light blue, knee-length dress with lace trim. She stands barefoot in a grassy meadow dotted with vibrant wildflowers of yellow, purple, and white. The soft sunlight filters through the trees behind her, casting gentle shadows on her face as she gazes thoughtfully at a butterfly hovering nearby. Her expression is one of quiet wonder, and the breeze lightly tousles her hair. The sky above is a brilliant, clear blue with a few wisps of white clouds.>A young girl with long, wavy brown hair wearing a flowing white dress, standing barefoot on a soft green meadow. The sunlight filters through the trees behind her, casting dappled shadows on the grass. Her eyes are wide and curious, gazing up at the bright blue sky dotted with fluffy white clouds. In the distance, there is a gentle hill with wildflowers in shades of yellow and purple swaying in the breeze.>A young girl with long, wavy brown hair, wearing a light blue dress with delicate lace trim, standing in a lush green meadow. The sun is shining brightly, casting a soft golden glow around her, and a gentle breeze rustles the grass and wildflowers at her feet. She has a thoughtful expression, her hands lightly clasped in front of her, as she gazes toward a distant forest at the edge of the field.
>>102862167Is that Mayli at the bottom left?
>>102867701duh
>>102867701no, its me
>>102868233nice