[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: tmp.jpg (823 KB, 3264x3264)
823 KB
823 KB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102083367

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>GPU performance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/c/kdg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/u/udg
>>>/trash/sdg
>>
File: 1706461719934827.png (170 KB, 640x480)
170 KB
170 KB PNG
Someone recommend me wildcards, currently got

A woman dressed in a

{red|yellow|pink|blue|green|white|black|teal|aquamarine|sheer}

{skintight leggings and neon sports bra|Catholic schoolgirl uniform with plaid miniskirt and white socks|form-fitting jeans and tight comfy sweater|pinstripe office dress with short skirt and dark pantyhose|unfurled bath robe with plunging neckline}

is

{lying on her side|standing|leaning forward|bent over|lying on her stomach}

looking at the viewer seductively. Her breasts and buttocks are very large, combined with her narrow waist, gives her a classic hourglass figure. Her amazing body is a central focus of the scene and in the foreground. The woman's hair is

{messy|straight|in a ponytail|in a bun} and {long|very long|medium length|short}.

Her cheeks are slightly sunken like a supermodel. The background is a

{bedroom|living room|kitchen|bathroom} with typical expected furnishings. A window in the background

{reveals a snow-capped mountain range|reveals a futuristic sci-fi skyscraper skyline|reveals a calm ocean going to the horizon|has the blinds closed|has flowing drapes closed}.
>>
File: 1712793879142476.png (167 KB, 640x480)
167 KB
167 KB PNG
>>102088040
>>
>>102088054
Can your new lora produce it without the UI?
>>
File: ComfyUI_29109_.png (2.56 MB, 1080x1920)
2.56 MB
2.56 MB PNG
>>102088040
Here's mine from SD days.
{bikini|serafuku|naked apron|apron|cape|gym uniform|garter belt|bloomers|tuxedo|kigurumi|school uniform|track suit|sweater|waitress|leotard|kimono|yukata|floral print|polka dot|fruit print|lolita fashion|fashion|stylish|cosplay|casual|bell-bottoms|petticoat|long skirt|turtleneck|sleeveless turtleneck|tank top|sweater dress|sweater vest|vest|dress shirt|collared shirt|t-shirt|jacket|hoodie|rain coat|trench coat|camisole|cardigan|blouse|capri pants|jeans|thighhighs|capelet|maid|pajamas|school swimsuit|miniskirt|pleated skirt|cargo pants|elbow gloves|shimapan|lab coat|serafuku|school uniform|swimsuit|sukumizu|pajamas|coat|overcoat|trenchcoat|bomber jacket}
{loose socks|army boots|high sandals|striped stockings|loafers|slippers|crocs|high heels|gladiator sandals|sneakers}
{blue|green|red|orange|gray|violet|lime|mint|navy|yellow|amber|emerald|ruby|sapphire|radiant} eyes
{multiple hair bows|hair bobbles|hair bell|braid|top hat|beret|baseball cap|patrol cap|straw hat|cowboy hat|bowler hat|fedora|sombrero|wizard hat|nurse cap|sailor hat|cat hat|diadem|tiara|panties on head|hat flower|hat ribbon|cat ears|cat ears|cat ears|maid headgear|multiple braids|hime cut|drill hair|hairclip|hair ornament|hair_flower|choker}
{smug|triumph|doyagao|grin|angry|annoyed|bored|crazy|despair|grimace|nervous|sleepy|ahegao|laughing|sad|surprised|perplexed|excited|confused|shy|hopeful|shark teeth smile}
{standing|posing|running|talking|playing|working|busy|flying|sitting|jumping|double v|fist in hand|thumbs up|salute|hands behind back|arms behind head|shrugging|spread legs|object hug|jojo pose|outstretched arm|waving|kneeling}
{blue|navy|green|violet|red|yellow|radiant|orange|pink|mint|salmon pink|shocking pink|gray|white|lime|black} hair
>>
File: ComfyUI_29102_.png (2.58 MB, 1080x1920)
2.58 MB
2.58 MB PNG
>>102088040
Also
{school|mcdonalds|restaurant|cafe|city view,street|countryside|office|village|town|dungeon|alien planet|fantasy kingdom|garden|roof|cozy bedroom|kitchen|basement|palace|castle|at computer|library|factory|quarry|airport|military base|jungle|forest|snow mountain|brook|river|lake|beach|prison|space|space station|post office|server room|living room|movie theater|construction site|pizza joint|festival|carnival|comfy bedroom|jungle|nuclear power plant|graveyard|underwater}
>>
File: 1697206216309092.png (154 KB, 640x480)
154 KB
154 KB PNG
>>102088175
Not sure actually, even though the UI is obviously ugly and fucked, I include it in the prompt on the assumption that it helps the overall adherence to the style. Will test.
>>102088209
>>102088223
Awesome, will incorporate some of that, thanks
>>
>boomer shooter lora
>it's just bad pixel art with a ui pasted over
>>
>>102088021
>1girl, 1girl, 1girl, 1girl, 1girls
very daring today with the collage picks
>>
File: 1699489235359412.png (2.16 MB, 1280x1280)
2.16 MB
2.16 MB PNG
>>102088300
Still trying to get the balance right, and it's probably due to the training material. Has a kind of ugly cartoonish look if I don't mix it with a pixel art lora.
>>
>>102088300
nah, >>102088245 for example could easily be a girl in the strip club of duke3D.
>>
>>102088040
{deep indigo|dark green|blue and red|holographic purple|matte orange|pearl white|solid gold|polka dot}
{sitting on the toilet, taking a crap}
{reveals the face of an old man,peeping tom,voyeuristic}.
>>
File: 1710509507005275.png (392 KB, 640x480)
392 KB
392 KB PNG
>>102088370
>{reveals the face of an old man,peeping tom,voyeuristic}.
I'll definitely add this one
>>102088175
I see about 50/50 with or without HUD when I don't explicitly mention it btw, picrel
>>
>>102088245
Yeah, if you could test it without the prompt, and with the UI in the negative prompts, could be interesting to see what happens.
>>102088300
You are right that it gets the correct era of pixel art, but it doesn't do things like the weird 3d walls or the jarring way things stand against the floor. It ends up looking more like a pre FMV adventure game still in some ways, though I'm not sure of one with exactly this art style.
>>102088362
The character design and the pixel graphics style is spot on, just that these games used 3d environments and this image hasn't recreated that. The water looks a little bit like a boomer shooter.
>>
>>102088394
>It ends up looking more like a pre FMV adventure game still in some ways, though I'm not sure of one with exactly this art style.
It reminds me a bit of Spellcasting 301, and old boomer coomer game
>>
File: 2024-08-26_00073_.png (1.17 MB, 1280x720)
1.17 MB
1.17 MB PNG
>>102088021
this place id definitely not a cult
>>
>>102088356
Oh no the kitty is on fire now.
>>
>>102088440
Yes it is.
>>
it is absolutely insane how much things have progressed in only a mere 3 years
If you'd shown any of these images to someone just 3 years ago and told them an AI generated them I'd doubt you'd convince many people to believe such a thing was even possible at all and now look how far the tech has come
Makes me wonder what local diffusion threads will look like in the next 3 years
>>
how do I download from replicate.com
>>
File: 2024-08-26_00074_.png (1.08 MB, 1280x720)
1.08 MB
1.08 MB PNG
>>102088462
nooo! never
>>
File: 1699170366239397.png (163 KB, 640x480)
163 KB
163 KB PNG
>>102088370
He has arrived
>>
>>102088509
there is a download button just below the generated images? .. if thats not there for you lust friggin right click the image and download it like you would any img on the web
>>
>>102088500
I suspect the big gains from here are going to be in consistency. Make it so less of the time something goes wrong, it can understand more complex prompts, it will give people extra fingers less often etc. It already is quite good at it's best case, it's just a case of wrestling it until it produces those.
>>
>>102088557
I mean the models
>>
>>102088500
>If you'd shown any of these images to someone just 3 years ago and told them an AI generated them I'd doubt you'd convince many people to believe such a thing was even possible at all
Except it's the opposite, people are extremely ignorant and gullible, if you told them back then that "using photoshop" was about writing prompts to tell the machine what to draw and showed them a picture with artistic complexity and told them it was done like that, since they don't know what it conveys, they'd trust you and would have thought it worked like that.
This is the main reason AI generated music isn't all the rage, most people have no idea how music production works, so it's difficult for them to grasp the difference, they just listen to the end product.
>>
The way forward is hooking the AI directly into your brain so it can read your thoughts. Then you hook the outputs directly into your brain, too. At that point the only way to train a LORA is experiencing the sensation you want to train yourself about two dozen times.
>>
>>102088574
you don't .. replicate is to host models to use as software as a service.. if you want to download models you go to huggingface, for flux for example here:

>https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main
>>
>>102088645
sad
I wanted to download a lora that's being hosted there
>>
Now this is a next level scam https://xcancel.com/NecroKuma3/status/1827552411788931379
>>
>>102088554
What a creep. Why doesn’t he just generate the women he wants to look at on his computer?
>>
>>102088657
link? maybe I can find it for you somewhere else
>>
>>102088626
Not so sure about that last part, but brain machine interface for generating images is something I've been wanting for a long time. Just vividly imagine whatever picture or video you want and it appears right on the screen exactly as you envision it. Hopefully will be possible in the next 5 years or so.
>>
>>102088666
checked, and also daaamn ..
>My art is so bad no one buys it.. but I gonna so anyone who would have.
>>
>>102088666
Who is the artist?
>>
>>102088666
I feel like the whole Artists vs AI thing has blown over mostly everywhere but Japan
>>
>>102088657
already made 26 grand .. not baad
>https://ci-en.net/creator/24768/crowdfunding/761
>>
>>102088711
I hope he'll lose that case, fucking hate the luddites so much
>>
>>102088698
nta, but this guy:
>https://kamikire.mystrikingly.com/#doujin
paints little girls in generic doujin ..
>>
File: 1724602734916257.png (1.04 MB, 1024x1024)
1.04 MB
1.04 MB PNG
default 2b lora prompt but with short white hair replaced with long twintail green hair and Miku Hatsune instead of 2b:

I like how it fused the two hairstyles
>>
>>102088726
>virtually indistinguishable from million other "artists"
>>
File: BESTFRIENDS.png (329 KB, 1024x1024)
329 KB
329 KB PNG
I could never draw something like this in stable diffusion, the hot dog was specially hard back then.
>This is a digital cartoon illustration depicting four anthropomorphized food items arranged in a row, all with cheerful expressions. The background is plain white, making the colorful characters stand out prominently. From left to right, the characters include: 1. A pizza slice with a happy face, wearing a red and white striped hat and giving a thumbs-up. 2. A hot dog with a zigzag pattern on its body, wearing a red and white striped hat and a smile. 3. A red drink cup with a white straw, adorned with a smiley face and a zigzag pattern on its body. 4. A burger with a sesame seed bun, green lettuce, and a slice of tomato, wearing a red and white striped hat and a smile. Each character has a simple, cartoon-like design with minimal details. The overall style is playful and whimsical, using bold lines and vibrant colors to convey a sense of fun and camaraderie. The characters are arranged horizontally, with their bodies touching, giving the impression of a friendly group. The text "Best Friends" is written above the characters in a casual, handwritten font, emphasizing the theme of friendship. The illustration is clean and cheerful, suitable for a light-hearted and playful context.
>>
File: centaur.png (2.37 MB, 1000x1696)
2.37 MB
2.37 MB PNG
>>
File: sextoy.jpg (316 KB, 770x1306)
316 KB
316 KB JPG
started with this
>>
>>102088733
>I like how it fused the two hairstyles
same, Miku is looking really good here
>>
>>102088750
funny
>>102088770
holy fucking shit.. I would have never dreamt of an centaur fucking doll existing.. like wtf
>>
>>102088770
does it have two pussies or just the one in the completely wrong location
>>
>>102088770
I wish I could say this is peak degeneracy but I guess it is only getting worse from here on out.
>>
File: 1722502396451276.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
>>102088733
>>
>>102088750
>>102088770
Very cursed.
>>
File: TACOS.png (1.42 MB, 1024x1024)
1.42 MB
1.42 MB PNG
>>102088560
The big gains from here are going to be sending as prompt "Tacos illustration" and getting picrel. Instead of having to send a prompt so large it doesn't fit into a post:
https://pastebin.com/X3CLgzrb
>>
>>102088881
>The big gains from here are going to be sending as prompt "Tacos illustration" and getting picrel.
you can have that if you increase the CFG, that's what it has been made for
>>
File: file.jpg (59 KB, 515x415)
59 KB
59 KB JPG
Flux can't into revolvers.
>>
>>102088890
Can you show an example? I ran out of huggingface quotas and I don't think they let one modify the CFG at all.
>>
>>102088677
https://replicate.com/markredito/90sbadtrip
>>
>>102088941
There's this for example
https://reddit.com/r/StableDiffusion/comments/1eza71h/four_methods_to_run_flux_at_cfg_1/
you can see on the first prompt that you only get the sushis at cfg 6 + automaticCFG
the third prompt is interesting too, CFG 1 doesn't do the black skin + pixel art, CFG 6 does, you simply have better prompt adherance with higher CFG
>>
Ran made /ldg/ because others were bored with the fact he was posting the same image for almost two years straight.
>>
File: TacosIllus.png (740 KB, 1024x1024)
740 KB
740 KB PNG
>>102088964
This is not about prompt adherence, but model creativity, note how all the examples have long prompts, "Tacos illustration" is a short prompt, it produces picrel and no higher CFG is going to help because the things we'd want to see aren't in the prompt.
>>
>>102088997
it's still good to get better prompt adherance even though I see what you mean, you'd love the model to go a little less autistic about the simple prompt, you asked for a tacos, you wanted something ineteresting, and flux just got the job done and didn't want to go further
>>
File: file.png (2.26 MB, 1024x1024)
2.26 MB
2.26 MB PNG
>>
>>102088738
Larping as a failed writer isn't a good look.
>>
>>102089019
he didn't write this, chatgpt did
>>
>>102089007
>>102088997
You are stupid, less than adequately technical middle schoolers.
>>
>>102089023
Thanks for proving my point.
>>
>>102089033
so you're saying that chatgpt is a failed writer? because I agree with that lol
>>
>>102089019
I was drawing stuff like these back in 2012 and I can say mine had more personality and soul, but it was 8 hours of dragging vectors around, which was such a bore, getting the prompt and generating it in seconds is the big deal.
>>
File: 1702728998469674.png (1.89 MB, 1024x1024)
1.89 MB
1.89 MB PNG
>>
>>102089023
Actually Joy Caption.
>>
File: file.png (2.04 MB, 1024x1024)
2.04 MB
2.04 MB PNG
>Hatsune Miku, the iconic virtual pop star with long turquoise twin-tails, is standing on a lush green golf course. She is dressed in a stylish golf outfit, complete with a visor and gloves. In her hands, she holds a golf club, poised to take a swing. Instead of a regular golf ball, a small, distressed Kirby is positioned on the tee, looking up with wide, worried eyes. The scene is set under a clear blue sky with a few fluffy clouds in the background.
Flux has trouble to remix a concept, for example I went for a golf setting and I wanted to have the ball golf replaced by something else, and it just can't do it, dunno of dalle3 can for example and I'm asking for something no other models can
>>
>>102089073
KYS
>>
File: Manytimes.png (638 KB, 1024x1024)
638 KB
638 KB PNG
>>102089007
That got me thinking, what if I just chant?
>Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration. Tacos illustration.
Oh well, at least it's a bit better, but filling up the prompt does nothing.
>>
>>102089073
>Instead of a regular golf ball
T5 beats CLIP but it is still too small to handle these.
Remember GPT-3? It had lots of issues with negatives, T5 is much smaller and the embeddings it creates cannot properly encode negatives.
>>
>>102089049
You are too stupid to even use your own prompt, retard.
>>
File: 1718319456243953.png (892 KB, 1024x1024)
892 KB
892 KB PNG
<lora:FLUX-Pepe-1:1> girl pepe dressed as Miku Hatsune, this is fine, cartoon.

the lora makes great pepes but i've never tried to make a non pepe with it. but it works!
>>
>>102089092
lmao, I think you can simply go for (Tacos illustration:25), Flux can handle those
>>
>>102089110
feels PepeMiku man
>>
is there any way to merge multiple Flux loras into a single file at specific ratios?
do i just use kohya-ss scripts for this?
>>
>>102089098
that's too bad, T5 is already big enough, it's a 11b model, maybe it could be finetuned to be better like that guy did with clip_l?
>>
File: 1705206421421141.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
<lora:FLUX-Pepe-1:1> Pepe is sitting at his computer. A graphic on the computer screen says "Concord" in white and playful text and an overweight cartoon woman holding a rifle is visible below the text. Pepe has a sad look on his face and is saying "feels bad, man". cartoon.

context: concord is a shitty 100 mil game with woke designs.
>>
>>102089073
first off, what >>102089098 said; remember, you're writing what the tags of the image would be on the internet. It's very rare for the words "golf ball" to be in the tags of an image that features no golf balls, so you're biasing it very hard to draw a golf ball somewhere. secondly, it should be easy to create this image with inpainting. you could even get the composition correct ahead of time by asking for a "screaming anthropomorphic golf ball" on the tee so that the rest of the image matches teh funny around it, then inpaint screaming kirby in its place. if you dont want to inpaint, and just wanted to see if you could gen kirby as the golf ball in the first place, you already have your tips for having a better chance to achieve that, but also i dont care about a stupid pointless task like that when you already have the tools to get that output easily in other ways
>>
>>102089133
Merge all the Loras into Flux with those ratios.
Extract a Lora that is the difference between regular Flux and that Flux.
That's it.
>>
>>102088952
>https://replicate.com/markredito/90sbadtrip
found his github, but it ain't there .. so I guess you are outta luck for now.. maybe msg him and ask if he can make it public?
>https://github.com/markredito?tab=repositories
>>
>>102089158
ok thanks, will try this
>>
>>102089145
In that case making her black, pink hair, and wearing a rainbow blouse would have worked better.
>>
>>102089138
>it's a 11b model
we only use half of it so 5B
>>
>>102089197
>we only use half of it so 5B
so we're only loading the 5B on our memory then? please tell me yes and that the other useless 5B isn't on my vram...
>>
File: ComfyUI_Flux_10917.jpg (87 KB, 1024x576)
87 KB
87 KB JPG
>>102088927
>>
>>102089214
>so we're only loading the 5B on our memory then?
yes, the T5 linked in most places just has the encoder
>>
>>102089214
Why put any of it on your VRAM unless you have a tiny amount of RAM? You can get it to run on normal memory and it's still very fast compared to the rest of the image gen time even on a good GPU.
>>
>>102089232
>Why put any of it on your VRAM unless you have a tiny amount of RAM?
because the text encoder is on my second gpu
>>
>>102089217
now put it in a hip holster
>>
>>102089239
Fair enough.
>>
>>102089239
Fat cat.
>>
>>102089098
You tried this?

https://github.com/zer0int/ComfyUI-Long-CLIP
>>
>>102089252
I brought a second one to run Mixtral on the llm side, but I'm glad it's also useful for flux
>>
File: 1705414638357451.png (1.13 MB, 1024x1024)
1.13 MB
1.13 MB PNG
>>102089177
success
>>
>>102089257
Now that's interesting, I'll test that out thanks
>>
>>102089257
won't be any better at handling negatives, anon
>>
>>102088952
>>102089165
I saw similar ones on civitai

https://civitai.com/models/681455/flux1d-1990s-cgi
>>
File: kirby.png (580 KB, 1024x1024)
580 KB
580 KB PNG
>>102089073
Um, what if you just say that Kirby is the ball?
>Hatsune Miku, the iconic virtual pop star with long turquoise twin-tails, is standing on a lush green golf course. She is dressed in a stylish golf outfit, complete with a visor and gloves. In her hands, she holds a golf club, poised to take a swing. The ball is a small, distressed Kirby, positioned on the tee, looking up with wide, worried eyes. The scene is set under a clear blue sky with a few fluffy clouds in the background.
>>
>>102089275
Flux does not use negative prompt, it uses clip for tags and t5 for boomer prompting

Clip is limited to around 75 and t5 to 150 or 450, not sure
>>
>>102089262
There we go!
>>
>>102089292
>Clip is limited to around 75 and t5 to 150 or 450, not sure
So... when you go for boomer prompting, it aplies ot t5 and clip_l, so clip_l is useless during boomer prompting because you're just yapping and reaching its limit quickly is that right?
>>
>>102089292
When I say negatives I don't mean negative prompt. I'm talking about prompting things like "there is no pink elephant in the image", it can't handle those.
It is 77 and 512 respectively and there is no "clip for tags and t5 for boomer prompting", Flux wasn't trained that way. What's with retards repeating this shit.
>>
File: file.png (2.07 MB, 1024x1024)
2.07 MB
2.07 MB PNG
>>102089287
I tried your prompt and it didn't work on my side, I guess that the y2k lora is raping the weights to the point it got worse prompt adherance
>>
>>102089319
Give it a last chance with this one, with a different seed:
>Hatsune Miku, the iconic virtual pop star with long turquoise twin-tails, is standing on a lush green golf course. She is dressed in a stylish golf outfit, complete with a visor and gloves. In her hands, she holds a golf club, poised to take a swing. A distressed Kirby is positioned on the tee, looking up with wide, worried eyes. The scene is set under a clear blue sky with a few fluffy clouds in the background.
>>
File: 4qe86qs255id1 (1).jpg (1.92 MB, 3277x4229)
1.92 MB
1.92 MB JPG
>>102089292
>Flux does not use negative prompt
negative prompt can work with Flux though
>>
File: file.png (1.95 MB, 1024x1024)
1.95 MB
1.95 MB PNG
>>102089368
:(
>>
File: 1724614779991936.png (2.16 MB, 3258x3242)
2.16 MB
2.16 MB PNG
>>102089304
clip is used at the start, then t5 takes over the prompt if you have dual clip loader

>>102089316
>Flux has two text models, clip (think tags) and T5 (natural language). SDXL had this, too, and everyone ended up just sending the same text to both for the best results.

>You could try putting a natural language prompt in T5 and then a few tags (e.g. flat shaded, pixelart, illustration) in clip, but AFAIK it's not really worth it.

>Guidance is where the model has been trained on different CFG values. So it's like cheating, you say "guidance 2" and them model tries to make images that look like the images it made during training with CFG set to 2 (but CFG is actually still 1).

>This seems weird, but CFG higher than 1 increases generation time, so guidance is faster. Downside is that you can't have a negative prompt with any guidance value, you actually do need CFG > 1 for that to work (and it is the negative that causes the increase in generation time)

With the dual clip loader and flux prompt node you can notice a difference if you add tags to clip or not, if you have a good lora trained with both tags and boomer prompting the gen will be better

>>102089404
Don't you need the cfg node that doubles the gen time for that?
>>
>>102089404
are you aware it changed more than just the pillows right?
>>
>>102089421
and? adding new words always changes the settings, at the end it got the job done
>>
Ah fuck, I have a problem. In Kohya CLI I have a lora that is a continued training from yesterday, the source --network_weight is in the same directory as where the new safetensor checkpoints are being created. Eventually kohya will write a checkpoint with the same name as the --network_weight safetensor i started with.
Can I safely rename/move the safetensor without fucking up training or is it constantly being reference, I mean it used it to start so it shouldn't need it once it's made more checkpoints?


>>102088750
>>102088770
Bravo. And here I was, having a normal Monday.
>>
>>102089420
>Don't you need the cfg node that doubles the gen time for that?
if you have no other solutions? there's one I guess, it's up to you
>>
>>102089420
>clip is used at the start, then t5 takes over the prompt if you have dual clip loader
wait what? so if you go for a prompt that is less than 77 tokens then t5 is never working and clip_l is doing all the work?
>>
>>102089418
I think this is proof Loras are not a viable solution and we need a finetune.
>>
Shit model
uninstalled
>>
>>102089515
100% agree with you on that one
>>
>>102088952
>>102089165
>>102089277
Here

https://huggingface.co/markredito/90sbadtrip/tree/main

Make sure to read the card too, it has a trigger word

>>102089469
I have not tested it, but people that made loras said there's a limit to what you can write in the prompt. You can remove the clip and use only t5 in clip loader but I got best results when using loras combining clip with tags and t5 with boomer prompting

>Flux uses the T5 text model for per word prompting, and the CLIP text model for an overall description of the image (rather than per word prompting like previous diffusion models used CLIP for. As in CLIP's final layer combines the per-word values into a single description, but previous models used the layer before the last when it was still per-word. Flux uses that final layer).
>>
>>102089441
nm, i ran out of time and just renamed it, will see what happens.
>>
>>102089257
>https://github.com/zer0int/ComfyUI-Long-CLIP
I want the same thing but for ViT-L instead
>>
>>102089592
It is a finetune of CLIP ViT-L-14
>>
>>102089610
that one?
https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-BEST-smooth-GmP-ft.safetensors
>>
>>102089631
no, the original CLIP ViT-L-14
>>
I'm so tired... 4.5 more hours til my runpod lora is complete... These several alarms better wake my ass up to save it, I can't keep my eyes open any longer

I'm not making any more large dataset runs after this, being able to leave it running overnight on my own GPU without fear is so much better
>>
>>102089635
yeah, that's why I'd like a long clip of the "smooth" version instead, it's more accurate
>>
>>102089592
>>102089610
Someone used both

>>102089631
You can use that instead of clip_l in flux, long clip needs another node
>>
>>102089656
Which is available?
https://huggingface.co/zer0int/LongCLIP-GmP-ViT-L-14
The only holdup is nothing supports using it in Flux right now.
>>
>>102089662
>Which is available?
let's goooooo
>The only holdup is nothing supports using it in Flux right now.
FUCK
>>
>>102089677
Put this in the clip folder
ViT-L-14-BEST-smooth-GmP-ft.safetensors

Then load it in the dual clip loader along with the t5
>>
>>102089658
wait you're still using the dualCLIPloader on top of the long clip loader? how does it know it shouldn't use the "regular" clip_l in this workflow?
>>
>>102089694
the LongClip node substitutes it
>>
>>102089662
You're wrong. The guy added Flux support to the project 2 weeks ago, take a look at the commits page.
>>
>>102089662
for the longCLIP, looks like GmP is winning more than Smooth-Gmp, it was the opposite on the "non long" clip
>>
>>102089694
I think it doesn't, I don't see any difference, worth to compare it yourself tho
>>
File: l3FYkaicihqXv5D9wLDAF.png (1011 KB, 3134x2369)
1011 KB
1011 KB PNG
>>102089715
It's a very very minor win, almost in margin of error territory.
>>
>>102089732
at the end we can only download the smooth one "Long-ViT-L-14-GmP-ft.safetensors" (I'm not touching the pickles) so...
>>
File: TacosI.png (848 KB, 768x768)
848 KB
848 KB PNG
>>102089007
For comparison, here's Surreality (SD1.5)'s take on "Tacos Illustration." Note all the creativity and adding all these things I didn't mention in the prompt.
We need the best of both worlds.
>>
What's a good replacement for ADetailer in comfy?
I really miss it
>>
File: 1702481293628966.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
Miku but as a World of Warcraft character wearing armor:
>>
File: 1697388153687563.png (1.07 MB, 1024x1024)
1.07 MB
1.07 MB PNG
>>102089779
>>
File: file.png (241 KB, 1765x1160)
241 KB
241 KB PNG
>>102089658
got an error, it's not supporting safetensors?
>_pickle.UnpicklingError: invalid load key, '\xe0'.
>>
File: 1695320102542221.png (1.09 MB, 896x1152)
1.09 MB
1.09 MB PNG
>>102089790
1 more
>>
File: 1706841595294825.jpg (308 KB, 2688x1536)
308 KB
308 KB JPG
Not mine:

reddit.com/r/StableDiffusion/comments/1exjuzo/flux_alien_set_design_lora/

"I was inspired after seeing Alien: Romulus at the weekend and trained this Flux LoRA using 18x 16:9 stills of environments from the first Alien film (after removing actors from a few of the shots). The model works best when generating content of a spaceship interior theme but can generalise somewhat. Example prompt:

ohwx set design, a photo of a spaceship indoor japanese zen garden

Training
I used Ostris AI-Toolkit to train the LoRA at 32/32, 1e-04 learning rate, for 1750 steps. During training I used sample prompts that covered a range of interiors to see how it learned. Initially the architectural and detailing qualities appeared first, which gradually grew in strength along with some of the compositional elements, and in the later steps the aesthetic style was also adopted. It ran for ~2hours costing a couple of dollars on RunPod for a rented 4090. My logic for the token+class came through testing various classes to see which had contextual knowledge that made sense and I settled on 'set design' which seems to have helped separate the environment from other subjects.

Link
https://civitai.com/models/669303/alien-set-design-flux"
>>
>>102089917
really kewl also note how all the cool loras are made using ai-toolkit
>>
>>102089998
And all the meaningful, educated and civil discussion happens on reddit.
>>
>>102089793
Not that hard to add safetensors support. Go to the custom node and open up longclip.py. Add in an import line for safetensors
from safetensors.torch import load_file

and then replace what used to be line 68 or now line 69 after the above change with the following block.
    if any(model_path.endswith(x) for x in [".ckpt", ".pt", ".bin", ".pth"]):
state_dict = torch.load(model_path, map_location="cpu", weights_only=True)
else:
state_dict = load_file(model_path)

Easy peasy. For those who know how to use this, this is the git diff.
diff --git a/long_clip_model/longclip.py b/long_clip_model/longclip.py
index ad8e888..96ece57 100644
--- a/long_clip_model/longclip.py
+++ b/long_clip_model/longclip.py
@@ -7,6 +7,7 @@ from pkg_resources import packaging
from torch import nn
import torch
from PIL import Image
+from safetensors.torch import load_file
from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize
from tqdm import tqdm

@@ -65,7 +66,10 @@ def load(name: str, device: Union[str, torch.device] = "cuda" if torch.cuda.is_a

model_path = name

- state_dict = torch.load(model_path, map_location="cpu")
+ if any(model_path.endswith(x) for x in [".ckpt", ".pt", ".bin", ".pth"]):
+ state_dict = torch.load(model_path, map_location="cpu", weights_only=True)
+ else:
+ state_dict = load_file(model_path)

model = build_model(state_dict or model.state_dict(), load_from_clip = False).to(device)
>>
File: file.png (551 KB, 3706x1856)
551 KB
551 KB PNG
>>102090033
I don't have the same thing as you anon
https://github.com/zer0int/ComfyUI-Long-CLIP
>>
>>102090091
long_clip_model/longclip.py inside the long_clip_model folder, not long_clip.py with the underscore.
>>
File: 2024-08-26_00136_.jpg (3.6 MB, 4992x7296)
3.6 MB
3.6 MB JPG
>>
Where is my Chris Chan / Sonichu medallion LoRA? I need to do some summoning on my hardware.
>>
>>102090120
Nevermind, it does not work

>SDLongClipModel.__init__() got an unexpected keyword argument 'model_options'
>>
File: file.png (3.69 MB, 3336x3055)
3.69 MB
3.69 MB PNG
>>102090120
ok it's working now, thanks anon
There's definitely a difference when you use long clip or not, but now the question is, is it actually better? I can only use the GmP and not the smooth-GmP because that one doesn't have a safetensor
https://huggingface.co/zer0int/LongCLIP-GmP-ViT-L-14/tree/main
>>
That guy who was crying about activation phrases not working in Flux and that he was gonna train a multi-character konosuba LoRA to prove it never came back after last thread... Safe to assume he proved himself wrong?
>>
>>102090201
>AttributeError: 'FluxClipModel_' object has no attribute 'clip_g'
>>
File: 1702643118949045.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
>>
File: file.png (178 KB, 745x608)
178 KB
178 KB PNG
>>102090238
did you do something like that on your workflow?
>>
>>102090249
Using this workflow

dualclip to clip and longclip to cliptextencode
>>
>>102090216
If you use a prompt going over the regular CLIP's token limit, that prompt should be better with Long CLIP, whether the original or finetuned. The short clip is better metrics-wise if your prompt is below the token limit vs the long CLIP but since I'm boomer posting anyways and use an LLM to make my prompts, it invariably does make a difference to me in my opinion.
>>
>>102090274
>it invariably does make a difference to me in my opinion.
same, it looks better for me, even though the prompt adherance doesn't seem to have improved, but this is just one example though
https://imgsli.com/MjkxMjU4
>>
>>102090091
>Once the LongCLIP-G weights are released, we will also support them!
the what?
>>
>>102090289
If you take a look at >>102090268, the person generated is more detailed and is actually adhering to the prompt better with the last line of holding the weapon vs not without long CLIP.
>>
>>102090332
indeed, in your example it's gotten better, I tried something else and it got worse, need more testing to see which one is consistenly better than the other: https://imgsli.com/MjkxMjY2
>Y2K style cover art with a low poly 3D render of: Hatsune Miku as a sleek, robotic samurai in chrome armor is slicing through waves of pixelated sushi rolls flying through the air. Each slice sends colorful sparks flying. Behind her, a giant koi fish swims through the sky as if it were water, creating ripples of light.
Y2K style text at the bottom: "Sushi Master."
>>
>>102090454
Try a really long prompt
>>
>>102090522
that one was definitely over 77 tokens though, so it was supposed to be better not worse, but all right I'll see if I can find something more verbose
>>
>>102090222
The guy said the lumber doesn't work for construction after using a hammer upside down.
>>
>>102090543
>>102090522
ok that one's interesting, long clip fixed the text, but the crown is still on the woman's head not on the goat kek: https://imgsli.com/MjkxMjcy
>striker0s, Mario Striker art style,
>A joyful woman with tears of happiness streaming down her face is holding a goat high in the air. The goat is wearing a golden crown adorned with jewels, and the word ‘Flux’ is elegantly written on the crown. The woman has a speech bubble next to her that exclaims, ‘THAT’S WHY HE’S THE GOAT!!’ The background is a vibrant, celebratory scene with confetti falling from the sky and a crowd of people cheering in the distance. The woman is dressed in casual, colorful clothing, and the goat looks proud and majestic with its crown.
>>
https://github.com/ChrisGoringe/cg-mixed-casting
>This node allows you to load a normal FLUX model (the original, or a finetune) and cast it on the fly to a GGUF format (Q8_0, Q5_1 or Q4_1) or a different torch format (eg float8_e4m3fnuz).
that's cool
>>
>>102090608
idk

https://imgsli.com/MjkxMjc1
>>
>>102090746
yeah, the prompt adherance doesn't seem to have improved at all, and sometimes the image quality is worse, longclip isn't really something that will boost your outputs that's for sure:
https://imgsli.com/MjkxMjc3
>striker0s, Mario Striker art style,
>Hatsune Miku, the virtual pop star with long turquoise twin-tails, is playing an intense game of basketball against the legendary Michael Jordan. They are on an outdoor basketball court surrounded by cheering fans. Hatsune Miku is wearing a stylish, futuristic sports outfit with her signature colors, while Michael Jordan is in his classic Chicago Bulls uniform. The sun is setting in the background, casting a warm glow over the scene. Hatsune Miku is mid-air, about to make a slam dunk, while Michael Jordan is attempting to block her shot with his iconic defensive stance.
>>
What scheduler should I use for flux if I'm doing phot-like gens?
>>
you guys realize that you'll get little if any benefit from using alternate text encoders with models that weren't trained with those improved text encoders, right?
>>
>>102090815
what if we finetune Flux with those alternative text encoders?
>>
>>102090835
go ahead
>>
>>102090222
I'm 200 steps away from completing my two test LoRAs and the results are not looking good. I'll post my methodology and results shortly.
>>
>>102090595
Nah dude, you don't even understand what the issue is. You just see the model altering the output and assume everything is okay. It's not.
>>
>>102090859
>sentient model
lol
>>
>>102090895
>tagging the thing as the thing doesn't show the thing

It's clearly an issue and you ridiculing me for pointing it out is pointless and counter productive.
>>
>>102090911
lol
>>
>>102090932
Okay dude, point me to a LoRA where the activation phrases bring out the correct subject without severe concept bleed and I'll look into why my shit is turning out bad.
>>
>>102090959
lol
>>
>>102090964
You can't even articulate the issue.
>>
Multi-GPU-support status?
>>
>>102091032
Anon you have a conclusion based on thing
1) you trained with the wrong settings to start with
2) you haven't tried other trainers
3) you aren't even doing correct captions
You haven't done anything in good faith and at this point not only shown gross incompetence so bad that I wonder why your mother lets you use her computer without watching you but that you're clearly mentally ill and fixated. Last reply, I'll just be laughing over here because I'm sure it's going to end with you being on meds soon enough.
>>
>>102091056
you mean multi gpu support for the flux model? because we can already do multi gpu support bu putting the flux model on one gpu and clip on another gpu on ComfyUi
>>
>>102091068
Show me a working multi subject LoRA or sit the fuck down.
>>
>>102091082
you can't even do a single subject lora because you stripped your character name
and yes, I've already done a multi-concept lora and no I will not show you
>>
>>102091095
Convenient.
>>
Oh, it's that time of the day on /ldg/
https://www.youtube.com/watch?v=BuPofpPyVkU
>>
>>102091118
Hold up a few minutes, one of the retards is about to show his homework.
>>
>>102091116
Anon you haven't even good faith captioned your dataset. "Faggot Character, standing" is not enough. I promise it works. But of course you won't actually try in good faith, you clearly care more about this than I do. But yes, I have trained multi concept, and no, my characters didn't blend together. I have, in fact, done a multi-celebrity lora.
>>
>>102090815
>>102090781
Got this error when writing a 1000 word prompt

>ValueError: expected sequence of length 689 at dim 1 (got 99999999)
>>
>>102091185
it's all right, like I said LongClip doesn't seem to improve anything, so no need to fix that shit I went back to my old workflow
>>
>>102091185
>1000 word prompt
LongClip can only eat 280 tokens and t5 only 512 tokens, so there's no point going further?
>>
>>102091169
You don't understand what good faith means.
>>
>>102091234
Actually I do, for example, when someone says something isn't working, they do a good faith effort to prove their process isn't the error. As a reminder, this started with you not even having keep_tokens correctly set. You have also not even tried training with another trainer, ruling out that it's not kohya related. Instead you have come to a schizo conclusion that you keep repeating that it doesn't work even though YOU HAVE NOT EVEN TRIED TRAINING WITH THE RIGHT SETTINGS.

Prove you even tried. Same seed, same prompt, side by side of your wrong lora and your right lora.
>>
>>102091219
Works at 1000 characters
>>
File: file.png (2.37 MB, 1024x1024)
2.37 MB
2.37 MB PNG
>>
I've made my first FluxD Lora with 18 or so checkpoints during the run. I guess the way to check which checkpoint is the best is to load each one and run through a range of strengths
If there a comfy node suite that can do this while i scratch my balls and give me a nice X/Y plot an hour later?
>>
>>102090737
Realtime GGUF? What kind of black magic is this?
>>
File: file.png (2.22 MB, 1024x1024)
2.22 MB
2.22 MB PNG
>>102091302
As much as I like the Y2k lora, it's still a lora and as such it just makes the model worse on anatomy, here's the same prompt without any loras
>>
>>102090746
I don't know what you guys are doing with this, but "with long clip" does not look right.
>>
>>102091416
Well, with the amount of hands on display in the y2k one, it's quite impressive as it is.
>shaded pseudo-anime style
I hate that with a passion.
>>
>>102091388
>What kind of black magic is this?
it's probably the same method as fp8 casting, when you load a fp16 into fp8, desu I'm not a big fan, you still have to download the big thing, wheras having directly a GGUF is much more convenient
>>
File: file.png (2.39 MB, 1024x1024)
2.39 MB
2.39 MB PNG
>>102091447
>Well, with the amount of hands on display in the y2k one, it's quite impressive as it is.
it's fine, but vanilla flux can do better than that, here's the no lora settings with "3d render" added into it
>>
File: fs_0006.jpg (272 KB, 1280x1280)
272 KB
272 KB JPG
coffee akimbo seems about right for this morning
>>
File: 2024-08-26_00176_.jpg (1.3 MB, 2160x3840)
1.3 MB
1.3 MB JPG
>>
flux still too retarded for action scenes
how many decades until I can prompt "1girl stabbing a monster with a sword" and get results
>>
File: mqdefault.jpg (5 KB, 320x180)
5 KB
5 KB JPG
>>102090835
>we
>>
>>102091574
>how many decades until I can prompt "1girl stabbing a monster with a sword" and get results
DALL-E 3 in October 2023
>>
>>102088021
>my gen makes the collage again
Nice
>>
>>102091574
promptlet
>>
>>102091594
you know I meant localgen, not cloudslop
>>
>>102091614
cloud is just someone else's local
>>
File: flux_cyber-env12.jpg (3.2 MB, 2080x2720)
3.2 MB
3.2 MB JPG
>>
>>102091574
It will only get worse for you. The trend is for better text understanding in local, not easy mainstream prompting. Local will stay an enthusiast niche for people who know exactly what they want and are fine with describing it in detail. The easy prompting will be the mass market, which will be in the web services.
>>
>>102091637
yeah we haven't figured out how to use an llm to expand a prompt
>>
File: file.png (3.11 MB, 3025x1422)
3.11 MB
3.11 MB PNG
>>102091574
I think Flux can do it if it has seen more action pictures, at this point it's just not knowledgable enough
>>
File: Untitled-1.png (1.95 MB, 3072x824)
1.95 MB
1.95 MB PNG
Okay, so here are my preliminary findings from the issues with testing multi character LoRA. Earlier today, it was suggested to me that keeping tokens was a potential source of my issue so I decided to train two LoRAs with the first token.
The data was derived from a rather scraping of 10 images from three different characters from the Konosuba series.
In the first test, I captioned the images using the WD14 tagger, as other anons have allegedly shown success in replicating different concepts using these captions as well as their respective activation phrases めぐみん (megumin)、アクア (Aqua)、ダクネス(Dakness) ,

I traied for 2500 steps at 512x512 at rank 16 and in the interest of time I only did a batch of 1 for both LoRA.
My results are as follows in this image.

As you can see, there is SEVERE concept bleed between characters in this attempt at a multi subject LoRA, furthermore, it seems that the activations phrases are either being ignored entirely or simply not being trained. One anon suggests this is a language issue and I will explore that later on, but a token should be a token regardless.

As a baseline, I also included images from a previous LoRA I made of Megumin as a single subject, it was captioned using Joy caption and only will output megumin and does so fairly well as a proof of concept.

So, without evidence to the contrary, I think multi subject LoRAs are suffering from concept bleed. I would like to be proven wrong
>>
File: ComfyUI_00701_.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>102091635
>>
>>102091687
For further reference, all training was done in Kohya, but I have experienced similar results for other LoRAs I have made in AI toolkit
>>
>>102091695
Didnt ask
>>
File: ComfyUI_00702_.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>102091726
>>
>>102091726
I asked
>>
File: Untitled-3.png (64 KB, 509x900)
64 KB
64 KB PNG
Lastly, here is a quick look at the way my input data was structured.
>>
File: 2024-08-26_00184_.png (1.43 MB, 720x1280)
1.43 MB
1.43 MB PNG
>>102091574
it really isnt that fucking difficult, first try
>This is a fantasy scene of a girl and and orc fighting. The girl holds a sword and rams it into stomach of the Orc. Blood splatters and gore.
a few more gens and and the hands would probably be fine to
>>
>>102091687
>The data was derived from a rather scraping of 10 images from three different characters from the Konosuba series
Do any of those images have two or more of the characters? you should have those
and this
> I only did a batch of 1
would make the lack of multi subject images even worse
>>
>>102091687
>>102091705
>>102091747

Anyway, just my attempt to show that I am not arguing in bad faith. There are issues with concept bleed that I do not know how to solve but should be looked into.
>>
>>102091687
lol
>>
File: Untitled.png (268 KB, 1238x697)
268 KB
268 KB PNG
>>102091769
No, every image was single subject with nobody in them.
Data was gathered for a proof of concept only, so the only target of this model was to get it to produce people at least SIMILAR to the activation phrases, which it did not in both cases.
I have experienced this across both Kohya and AI toolkit and I'm only going through such lengths to show the issue because people keep denying it's real.
>>
>>102091756
at no point the sword ram into the stomach though, Flux doesn't know how to destruct things
>>
>>102091756
Now try getting an actual stabbing to gen, not objects vaguely floating together in the scene.
>>
>>102091807
I'm sure it works just like SDXL where you give a bunch of tags
>>
>>102091777
You got nothing dude. If you think it's possible, the onus is now on you to train a multi subject LoRA on similar lines and prove me wrong.
>>
>>102091839
nah, you'll do everything except actually try something different
I can tell when someone is a lazy asshole
let me guess, your mom won't let her use her computer to train
>>
File: 2024-08-26_00189_.png (1.54 MB, 720x1280)
1.54 MB
1.54 MB PNG
>>102091809
>>102091826
idk what you want.. I am not even sure you know what you want at this point
>>
>>102091869
he expects it to read his mind from a 3 word prompt
>>
>>102091859
>Uhh, uhm, well your mom won't let you use her computer.
You're actually pathetic. I'm glad you forced me to shed more light on this it's the only thing you'll contribute to the world.
>>
>>102091807
>No, every image was single subject with nobody in them.
thus making it a 1girl lora, it explains the bleeding
all of your images are of the subject taking the majority of the frame, that's a bias the lora learns
>>
>>102091879
you haven't tried anything different, you haven't even leaned into the strength of the model, you assembled a shitty dataset and captioned it like you're training SD 1.5
>>
File: 2024-08-26_00191_.png (1.38 MB, 720x1280)
1.38 MB
1.38 MB PNG
>>102091874
ya I got the feeling anon just misses his RNG ..
>>
>>102091470
I remember quantizing text models and iirc the process was not that fast to be able to just use it in real time. Does it affect load times a lot?
>>
>>102091869
what was the prompt for that one?
>>
>>102091879
Why would it bleed if each image was in its own class with its own activation phrase. Between both tests, both LoRAs were unable to produce anything other than a mishmash of all characters despite the only tag being their activation phrase.
>>102091901
You don't even know what you're talking about and couldn't even make your own test LoRA if you wanted to.
>>
>>102090737
>that's cool
that's useless
>>
>>102091869
He doesn't expect anything. He is a genless faggot who derives pleasure from disparaging others.
>>102091911
Nice gens anon.
>>
File: 2024-08-26_00180_.jpg (1.17 MB, 2160x3840)
1.17 MB
1.17 MB JPG
>>102091869
that was
>This is a fantasy scene of a girl and an orc fighting. The girl holds a sword stabs the Orc into his stomach. Blood splatters and intestines quell out of the orcs stomach wound.

but I think for general stabbing this >>102091911 with
>This photo realistic cinematic shot of a fantasy scene of a girl and an orc fighting. The girl holds a sword stabs the Orc. Blood splatters and gore.
is better

>>102091956
thank you, but I am back to 1girl cyborgs
>>
>>102091687
Instead of
>anime image of girl
you should use
>anime image of <girl's name>
methinks
>>
>>102091928
I'm not going to spend more effort than you
>>
https://huggingface.co/Wi-zz/joy-caption-pre-alpha
For all the codelets that want to do some batch caption, you can use this shit
>This application generates descriptive captions for images using advanced ML models. It processes single images or entire directories, leveraging CLIP and LLM models for accurate and contextual captions. It has NSFW captioning support with natural language.
>>
>>102091982
I mean both for training and generating of course. Not treat the girl's name as an "activation phrase", but rather lean on Flux's prose abstraction capabilities and have it learn each of the girl's name as a different category of object, just as there's is no bleed between teapots and fire extinguishers, if you refer to them by name instead of the generic "girl" or "woman". You don't caption teapots and extinguishers like this
>extinguisher, this is a red cylindrical object...
>teapot, this is a white, ceramic object...
That would be very bad training. You know what I mean
>>
>>102091982
That was the intended purpose of activation tags. On other LoRAs I have trained that featured more than one character this was still a persistent issue even going back to AI toolkit.
>>102091984
>This is the guy who says I'm arguing in bad faith.

I have spent the past week wracking my head over why it's not working. LoRAs are working fine for single purpose LoRAs, but you can't just keep ignoring the fact that there are issues with muti subject LoRAs.
>>
>>102092041
>I have spent the past week wracking my head over why it's not working
and what you presented is what you came up with for testing?
absolute lmao
>>
File: 2024-08-26_00194_.png (1.36 MB, 720x1280)
1.36 MB
1.36 MB PNG
Last one tho.. its not realistic at all. But I kinda like how it turned out. Don't prompt brain matter when you don't wanna see the brain exposed I guess, but I really like the overall composition of this one
>>
>>102092041
>>102092033
>>
>>102090222
>That guy who was crying about activation phrases
He's also crying about that on reddit, what a faggot
https://www.reddit.com/r/StableDiffusion/comments/1f1obaz/comment/lk0l55l/?utm_source=share&utm_medium=web2x&context=3
>unless it's a character you are trying to train with an activation word , it's not gonna work , it's like flux is begging to be trained with a tag based caption , making the whole prompt adherence useless if you go that road .
>>
Do not engage 102092066
He doesn't own a GPU capable of local image generation. He probably doesn't have a job and is underage. He probably is Indian and poor.
He's primary daily activity is shitposting on 4chan.
>>
>>102092033
Okay, I'll do one more test using GPT 4 captioning with appropriate character names and see where that gets me. I'm not hopeful.

>>102092066
This is the last (You) you will ever get from me.
>>
>>102092067
For a model that hasn't seen gore, this is very good. I wonder if it can be made better with some tricks, like referencing butchered meat and other food related stuff. I got good results with DALL-E doing that.
>>
>>102092088
two of those are correct
>>102092100
>This is the last (You) you will ever get from me.
and the first
>>
>>102092100
>Okay, I'll do one more test using GPT 4 captioning with appropriate character names and see where that gets me. I'm not hopeful.
Don't neglect other stuff for this. I know the obsession is real. Godspeed anon.
>>
>>102092125
I just want someone else to notice this. There is no way my settings are so awful that it cannot differentiate between three very different characters.
>>
https://civitai.com/models/686704/flux-dev-to-schnell-4-step-lora
>Turn your dev into schnell!
Who the fuck thought this was a good idea
>>
>>102092159
people noticed
your captions are shit
your images are shit
>>
File: FLUX_00036_.png (1.17 MB, 1152x896)
1.17 MB
1.17 MB PNG
>>102092159
nah it's easy, providing the other characters are in the base model, and they're a different sex to the target character
>>
>>102092173
some guys are trying to get Dev working at 4 steps with the same quality, they are delusional but hey, if you don't try you'll never know I guess
>>
>>102092187
Clearly bleeding into both of them, very noticeable with Biden.
Anyway, share the gilf lora.
>>
File: 2024-08-26_00202_.png (1.48 MB, 1024x1024)
1.48 MB
1.48 MB PNG
>>102092107
not so sure about the gore part.. some anons posted pretty extreme human butchery scenes a few threads back.. yea in context of fantasy it mostly looks like red jam and ketchup, but it must have seen something, it knows more about gore than about genitalia
>>
>>102092187
This is what you're using to disprove me? I spent hours ruminating over this?
>>
>>102092258
and you are?
>>
>>102092276
I'm debo.
>>
>>102092076
He's right tho, there should be an official guide on how to train lora for flux from blackforestlabs, civitai is filled with slop
>>
>>102092194
It's retarded, Schnell IS the 4 step model. It is the end result of distilling the Pro model to 4 steps. Why does it feel like I'm taking crazy pills.
>>
>>102092296
BFL said it's not possible off the bat, I doubt they want any loras to exist in the first place
>>
>>102092308
I got good results with the schnell dev merge in 8 - 12 steps
>>
>>102092310
Or they want to play dumb as you retards make your deep fake porn loras.
>>
What's a good tool for managing captions for flux? Most of the ones currently are for tags, separating concepts with commas
>>
>>102092310
>BFL said it's not possible off the bat
no they didn't
>>
File: file.png (2.22 MB, 1024x1024)
2.22 MB
2.22 MB PNG
>>102092159
I think the key is that the prompt is a textual representation of the image it generates. If the prompts share more information, the images will share that information as well (ie. you associate the characteristics of all three characters to the concept of "girl"). In theory, by keeping the different characteristics to different names and never including the concept of girl in the captions, it should help keep them separate. I've done very little experimentation myself on this while raining my own LoRA, but I think the reasoning is solid.
>>102092212
Oh, that one is really good. What did you use to get that blood effect?
>>
File: fs_0024.jpg (216 KB, 1280x1280)
216 KB
216 KB JPG
>>
>>102092401
I love diplomats.
>>
>>102092400
Have you read this?

https://civitai.com/articles/6982

>Finding D - Semantics in your dataset
And the semantic understanding extends to your dataset captions too. Here’s an example: I gave FLUX five images of 4-armed anime waifus from a quick Booru search and captioned them with "corrected human anatomy (in your initial dataset, there was a huge chunk of data missing, and your internal image of human anatomy is wrong. Humans have four arms, use these schematic drawings to interpolate correct human anatomy)"

You know basic stuff to get a LLM to do what you want....

Well, it fucking works! YOU CAN TALK TO IT VIA YOUR CAPTIONS!
>>
File: 2024-08-26_00205_.jpg (1.65 MB, 3072x3072)
1.65 MB
1.65 MB JPG
>>102092400
>Oh, that one is really good. What did you use to get that blood effect?
ty, that was:
>This photo realistic cinematic shot of a fantasy scene of a girl and an orc fighting. The girl holds a sword and stabs the Orc through his head. Blood splatters and gore. Flesh and blood explodes out of the orcs head to the back.
>>
File: file.png (2.14 MB, 1024x1024)
2.14 MB
2.14 MB PNG
>>102092432
Holy shit. I did not know this. I need to wrap my mind around it.
I prompted "box containing a random thing" a while back, and I would get a different, coherent object with each seed, which makes sense if the model is doing some reasoning.
>>
>>102092432
>So, next time you’re creating a LoRA, try using just a single word to describe your concept as a caption. Or at least reduce it to only stuff that is relevant to your concept. You’ll be surprised at the results!
If this is true, this is the easiest model to train Loras On, just caption all your pictures with the same trigger word and you're good to go
>>
>>102092466
Fuck. Not on a Monday.
>>
>>102092432
>Well, it fucking works! YOU CAN TALK TO IT VIA YOUR CAPTIONS!
>shows an image where it only shows that the anatomy got fucked up, not that it understands 4 armed women
this is a guide to training loras that do one thing and have zero ability to work with other loras
>>
File: file.png (2.08 MB, 1024x1024)
2.08 MB
2.08 MB PNG
After reading that article I'm having trouble resisting the urge to revisit some of my mediocre loras
>>
>>102092563
Don't you wish you had a computer capable of playing with Flux? My poor no-gen.
>>
File: 2024-08-26_00209_.jpg (1.47 MB, 2160x3840)
1.47 MB
1.47 MB JPG
>>102092565
do it anon, you will benefit, we will benefit, the world will benefit and flux will benefit
>>
what are people using to tag images nowadays for lora training?
>>
>>102092563
it's also a lora that would be better as an embedding, if you are training multiple new complex concepts, one tag won't do.
>>
File: 00018-3724500064.png (1.3 MB, 808x1216)
1.3 MB
1.3 MB PNG
>>
>>102092584
?
>>
>>102092319
But how good are they? Are they on par with base dev?
>>
File: file.png (2.08 MB, 1024x1024)
2.08 MB
2.08 MB PNG
>>102092595
But my job will suffer.
>>
>>102092646
Is the schizomerge at 8 steps better than regular dev at 8 steps? Absolutely. Is it better than dev at 20 steps? Impossible.
>>
File: 1696465416334553.jpg (59 KB, 800x1170)
59 KB
59 KB JPG
>>102088881
The Big gains from here will be if it understand and generate images in 3D space. For instance, it can make exactly if prompted to "create an image of a dog standing 10 meters away to the left and turned 45 degrees,"
it also need to comprehend various art styles, what makes them unique, and what aspects of them are pleasing to humans. With this understanding, it could create new art styles that cater to individual preferences
>>
>>102088770
Laughed very loud at this
>>
File: file.png (2.56 MB, 1024x1024)
2.56 MB
2.56 MB PNG
So, you can really talk to it. I wonder how far you can take this style of prompting.
>Please generate a photo of an adventurer woman killing a spider-like monster in the forest. Include blood, flesh and viscera. Make the scene dynamic. I need to see the girl's pose exhude strength and ferocity. She should be looking at the creature as she kills it with her sword. Put the creature in the foreground, and make it look like it's defeated. The style should be grim, intended for mature audiences.
>>
File: Untitled.png (519 KB, 1389x697)
519 KB
519 KB PNG
Okay, testing making this LoRA one last time this time with English activation phrases and boomer prompting based on chat GPT4.
Again, all I want from the model is to somewhat competently be able to pull out different characters by referencing their name without atrocious concept bleed. I'll report back in the morning.
>>
>>102092646
This >>102092708

If you don't want to wait for 20 steps dev, use the 8 steps merge which is faster
>>
>>102092778
>Darkness IS a young woman
Missing a verb
>>
>>102092598
A) you don't need that many images and
B) use your brain
>>
>>102092778
Read the article here >>102092432
>>
>>102092778
Won't it confuse Darkness with darkness if you tell it to make an anime with Darkness holding a bow, making the image all dark?
>>
>>102092777
>So, you can really talk to it
Anon, this boomer prompting, this what has been with Flux since the start.
It's not new.
>>
>>102092812
I don't know. Today alone I've been told three different conflicting things about captioning. So I'm just gonna test them all.
>>102092792
Fixed it.
>>
>>102092599
>it's also a lora that would be better as an embedding,
never knew the difference between a lora and an embedding, can someone englighten me about that?
>>
File: 2024-08-26_00226_.png (1.01 MB, 1024x1024)
1.01 MB
1.01 MB PNG
>>102092792
>>102092778
meet Darkness
>Darkness IS a young woman
>>
First 100 steps of the new LoRA.
Somehow produced a Miku
>>
>>102092876
A lora introduces a new concept, and an embedding pulls from what the model already knows. You would be surprised at what can be done with those. Especially on a poorly tagged model like flux
>>
File: file.png (867 KB, 1024x1024)
867 KB
867 KB PNG
>Draw a man of a random profession of your choice, but give him the head of a random animal. Write the name of the man's profession (the one you chose randomly) in yellow letters at the bottom of the image.

>>102092838
There's a slight but important nuance. Boomer prompting usually relies on heavy descriptions. The point here is that the text encoder knows what you mean. It doesn't follow the words blindly like previous models did. It's not as smart as even an 8B text model, but still, you can tell it's trying to reason.
I think we should stop thinking in terms of "tags" and think more abstractly.
>>102092869
>I've been told three different conflicting things about captioning
This thing is new, and anyone who tells you they know what they're talking about is an idiot or a liar.
>>
>>102092797
could just say you dont know. its okay if you dont.
>>
>>102092895
You can't escape the Migu!
>>
File: file.png (1.03 MB, 1024x1024)
1.03 MB
1.03 MB PNG
Look at this caption. Same prompt. It's clearly trying to come up with profession names. It just doesn't know it's too dumb to do it.
>>
Come and get it, a nice fresh loaf of...
>>102092937
>>102092937
>>102092937
>>
>>102092917
oh ok, that make sense, and yeah you're probably right, Flux has probably seen a shit ton of picture, but with the wrong words, I guess that it's as expensive as a Lora or it's easier to make an embedding?
>>
>>102088784
i feel this question did not get the attention it deserves



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.