[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: tmp.jpg (1.17 MB, 3264x3264)
1.17 MB
1.17 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102009692

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>GPU performance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/g/sdg
>>>/h/hdg
>>>/e/edg
>>>/c/kdg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/u/udg
>>>/trash/sdg
>>
Roundhouse kick Debo through a wall.
>>
Are we blessed
>>
File: 1708652430111543.png (1.05 MB, 1024x1024)
1.05 MB
1.05 MB PNG
someone put knowledge in my base model
>>
>>102013108
I ask this because I assume the model expects everything to be described, so if there's a broom in one corner and I don't mention it, the model will end up thinking that broom is part of something else I did describe.
>>
File: fs_0076.jpg (99 KB, 720x1024)
99 KB
99 KB JPG
>>
Blessed thread of frenship
>>
>>102013129
>someone put knowledge in my base model
you heard what that anon said? DO IT
>>
File: 1697314320809988.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
>>102013150
yeah do it
>>
File: 00013-1178910297.png (3.63 MB, 1280x1920)
3.63 MB
3.63 MB PNG
>>102013131
Lol
>>
File: Hapu_anime_SM.png (503 KB, 1280x720)
503 KB
503 KB PNG
>>102013133
The idea, at least as I've gleaned from my time training, is to describe everything in tags that isn't intrinsic to the character or concept. For example, Hapu (picrel) has brown skin, black hair, purple eyes, and a purple bonnet. You would want to tag purple bonnet, and not tag black hair, purple eyes, brown skin. Those would all fall under the "Hapu" tag you're training.
>>
>>102013192
Ah, makes sense. Thanks.
>>
>>102013192
yeah no shit
>>
>>
>>102013214
He wanted it explained, anon. Not everyone has the same level of knowledge, we all start somewhere on this shit.

>>102013213
Of course.
>>
>>102013192
If I may ask another question. How many images ideally should I use? And how long can I expect the training to take with a 3090?
>>
File: bComfyUI_107008_.jpg (985 KB, 1424x2048)
985 KB
985 KB JPG
nips
>>
File: image.jpg (107 KB, 1536x1024)
107 KB
107 KB JPG
>>
I'm seeing the reference images ai-toolkit is generating with the unquantized model (at 40s/it) and holy shit I didn't expect the quality gap with Q8_0 would be this noticeable. That or the prompts for the references are excellent.
>>
Been away for little over half a year, haven't been able to keep up, are there any good extensions to come out worth noting, especially for auto1111?
>>
File: image.jpg (108 KB, 1536x1024)
108 KB
108 KB JPG
>>102013068
>they all look germanic and shit
include an ethnicity/nationality in your prompt
>>
File: file.png (1.7 MB, 1024x1024)
1.7 MB
1.7 MB PNG
>>102012483
>why doesn't the model just make up bullshit that I didn't even ask for?
If you want random bullshit use an LLM that makes your prompt random bullshit. Don't hate a new printer because it randomly stopped spewing ink everywhere like your old one.
>>
File: flux_00318_.png (1.62 MB, 968x1120)
1.62 MB
1.62 MB PNG
>says here you were heard saying you don't approve of women in the military. care to explain that statement, soldier?
>>
>>
>>102013256
Well, people have had success with as few as 30, but the more you have, the better. With the caveat that they're all pretty high quality and diverse, too. You want your character at various angles in various states of undress, different outfits, etc. You also want to avoid multi-character pictures like the plague. Don't use too many black and white or sketchy images of the character, unless one of both of those is the style they're typically drawn in.

>How long can I expect the training to take with a 3090?
Well, I haven't trained Flux before. I did a lot of SD 1.5 training and SDXL training, but they're probably a lot faster to do than a model many times their size.
>>
>>102013256
i'm doing dim 16 768x768 batch 2 2000 steps on the full fp16 model and it takes about 3.5 hours, but that old anime lora posted last thread was trained for less than 300 steps so i clearly have no fucking idea what's going on
>>
File: file.png (10 KB, 359x104)
10 KB
10 KB PNG
>>102013311
Fug wrong thread
>>102013310
>include an ethnicity/nationality in your prompt
I meant they look like models.

>pic related
Can I expect training to take this long no matter how many images I use? I just used 11 here as a test.
>>
12gb flux lora anon reporting in
here's 30 epochs of my latest lion cosine, 8 dim/8 alpha test. captioned with both local joycaption and booru tags, using the wildcard arg. style is ibuki satsuki. genned at 512*512, dev-q4_0 & t5-v1_1-xxl-encoder-q5_k_m
>random seeds
>euler, simple
>20 steps
>prompt:
a man with long white hair and Chinese style clothing, 1boy, long hair, white hair

trained at 512*512 (I overshot the LR and got crap last run). tonight/tomorrow I will run the same settings/dataset on 1024*1024 to compare the difference

will post 1024*1024 sized gens in a bit, it'll take me a while to gen 30 of them rip. from these 512*512 gens it looks to have gotten the artists' style down pretty well, if 1024*1024 gens aren't a total shitshow I'll probably be pretty happy with these training settings
>>
File: file.png (1.75 MB, 1024x1536)
1.75 MB
1.75 MB PNG
>>
File: miniature.jpg (88 KB, 1024x1024)
88 KB
88 KB JPG
Remember back when people would just post their gens with the prompt? What i care about is the style and no VLM will give me that part.
>small miniature mario bros with girlfriend princess peach toadstool. disney pixar cartoon 3d texture details light landscape background city The little mushroom is there too. Whimsical.
>>
File: file.png (12 KB, 552x108)
12 KB
12 KB PNG
>>102013364
So far cooler than my old AMD
>>
>>102013347
why would you use different seeds for comparisons
>>
>>102013347
you're using kohya?
>>
>>102013344
>2000 steps
Jeez. Could we see the output?
>>
>>102013300
Auto1111 became obsolete because it doesn't support the model that replaced Stable Diffusion.
>>
File: fs_0100.jpg (118 KB, 1920x1080)
118 KB
118 KB JPG
>>
File: image.jpg (106 KB, 1536x1024)
106 KB
106 KB JPG
>star of david turns into triforce
What did flux mean by this?

>>102013345
>I meant they look like models.
80% flux issue 20% prompt issue. you'll need to wait for something equivalent to juggernaut to generate normal ugly people by default
for my use case it's actually a good thing they're all hyperbeautiful
>>
>>102013088
why do you leave sdg in the OP?
>>
>>102013434
when it's done sure
>>
>>102013406
I want to see how the training settings have done overall, not just compare each epoch to the other on a single seed. I will do that later to select the "best" epoch over a variety of seeds/prompts
>>102013418
yes
>>
>>102013520
why do you care?
>>
Datasetting is kind of fun.
>>
>>102013541
>yes
did you find a guide to get kohya to train flux or did you figure it out yourself? i'm running into some tokenizer error right now
>>
Apparently ai-toolkit generates a set of sample images every 250 iterations. What are these for? Seeing when the lora shat the bed in case of trouble?
>>
>>102013629
yes
>>
>>102013629
I can't imagine why you'd want to see the progress of your image model training through generated images
>>
>>102013487
Flux has given me lots of variety in faces, nothing like a lot SDXL finetunes that had that bimbo sameface engraved hard. Whenever I saw that face the model went to the recycling bin, I hated it so much.
>>
>>102013635
Well, so far, so good. People in the images have randomly lost clothing items and become cuter.
I also had to open my computer case.
>>
File: flux_00321_.png (1000 KB, 968x1120)
1000 KB
1000 KB PNG
>>
File: bComfyUI_107015_.jpg (301 KB, 768x1024)
301 KB
301 KB JPG
>>
>>102013653
>People in the images have randomly lost clothing items and become cuter.
lewd
>>
>>102013441
And which model is that?
>>
File: ComfyUI_00802_.png (1.18 MB, 896x1152)
1.18 MB
1.18 MB PNG
>>
Big if it works. IP Adapter looks like the way to get varied styles back
>>
>>102013641
it's fun(ny)
sometimes you get some crazy gens in the samples that are irreproduceable
>>
File: ifx135.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>
File: ComfyUI_00804_.png (1.75 MB, 1152x1536)
1.75 MB
1.75 MB PNG
>>
>>102013801
What does it do...?
>>
>>102013801
So wait, any character I want? just a picture, and I can prompt that character to do stuff?
DAMN. I need to copy this workflow IMMEDIATELY. This solves so many knowledge issues
>>
File: bComfyUI_107090_.jpg (274 KB, 768x1024)
274 KB
274 KB JPG
>>102013819
wtf is that
>>
>>102013810
are you training 200px images?
>>
>>102013869
1024x1024
that was an early sample
>>
fuck it. i give up. kohya traning seems impossible right now
>>
>>102013883
early samples shouldn't be that bad
>>
>>102013888
that's the spirit
>>
File: 00096-4138489953.png (2.23 MB, 1024x1528)
2.23 MB
2.23 MB PNG
>load lora
>pray memory doesn't randomly balloon out to 30+gb
I swear, using loras in forge is like playing fucking slots.
>>
>>102013834
Think of it as a 1 image Lora. It's the best way to describe it, but surely not the correct way. Anyway I've been using IpAdapter already to transfer style from a SDXL image to a Flux one,
I can't wait to try it.
>>
>>102013898
says you
>>
>images are watermarked with simple white text
>i get to train flux on text AND closeups of prime pussy at the same time
>>102013888
Use ai-toolkit. It's seriously easy. I don't know if my lora will be usable at all, but at least I'm able to do trial and error.
>>
>>102013888
kohya's bugged as shit especially if you're using the gui
>>
>>102013913
>Use ai-toolkit. It's seriously easy. I don't know if my lora will be usable at all, but at least I'm able to do trial and error.
16gb vram
>>
File: file.png (105 KB, 116x673)
105 KB
105 KB PNG
>>102013909
if you're not blowing it out it should progress nicely
>>
>>102013908
god i love girls in knitted tops
>>
>>102013908
How expensive is it to run? Would it work with 8gb of vram? I know I can run flux dev on Q_0 and 512x512 images so that's not an issue, but if IP adapter is expensive I might not be able to run this workflow
>>
File: file.png (98 KB, 116x666)
98 KB
98 KB PNG
>>102013939
etc etc etc
The progression of it learning Vaporeon
>>
>>102013921
AAAH SORRY I FORGOT
>>
>>102013939
show me some settings and i can try it your way
>>
>>102013908
Interesting. It makes me wonder: Is there a way to make it transfer the style without transferring the character in the image? It seems like it's just for the character of the image, as it is right now.
>>
>>102013959
      train:
batch_size: 4
steps: 40000 # total number of steps to train 500 - 4000 is a good range
gradient_accumulation_steps: 1
train_unet: true
train_text_encoder: false # probably won't work with flux
content_or_style: content # content, style, balanced
gradient_checkpointing: true # need the on unless you have a ton of vram
noise_scheduler: "flowmatch" # for training only
optimizer: "adamw8bit"
lr: 2.5e-4
>>
>>102013921
>>102013956
Can't you low-vram it (presumably at an extreme speed penalty)?
>>
>>102013973
the github says 24gb and the low vram option doesnt help
>>
>>102013971
>steps: 40000 # total number of steps to train 500 - 4000 is a good range
lel
wait you're not using kohya
i'll try tonight, i leave it training overnight
>>
>>102013973
Yes, but I don't know how severe the penalty will be. Only one way to find out.
>>
I am trying to use https://github.com/ostris/ai-toolkit

Why is it trying to download the full massive flux model again, why can't I point it to the already downloaded one I have?
>>
>>102013996
kohya is absolutely ass right now (always was honestly)
>>
File: d_0009.jpg (163 KB, 1920x1080)
163 KB
163 KB JPG
>>
>>102014000
      model:
# huggingface model name or path
name_or_path: "F:/ai/models/FLUX.1-dev"
>>
Do scripts or nodes for batch captioning with JoyCaption exist?
>>
File: 1724268407.png (9 KB, 287x280)
9 KB
9 KB PNG
>>102014000
you can but it needs the whole setup structured in the same way as dark forest's HF page
>>
>>102014003
are you on ai-toolkit?
i just switched to branch sd3-flux.1 on kohya and read through this
https://github.com/kohya-ss/sd-scripts/blob/sd3/README.md
i'm still testing but one model has come out ok'ish
>>
>>102013963
In the regular ipadapter node there is a list of modes: one is just the style and if works quite well. I don't now about these nodes, they seem to be all customs. I'll set everything after dinner and I'll see.
>>
>>102014043
>diffusers
>>
>>102014061
yeah ai-toolkit
tried kohya for full finetuning but it was extremely slow and they don't even have samples over time so no way to properly see if you're not wasting time
for now I'm sticking with high rank Loras
>>
>>102013609
I just followed kohya's instructions on the sd3 branch (it's flux, don't use the branch called flux) GitHub page and then edited the command line to my preference with some trial and error. make sure you update your python, accelerate etc like the page suggests, I got errors otherwise. &this is what I used for this one if you want to edit it to your liking:
files.catbox.moe/w3teku.txt
(comment too long if I put it in code tags, sorry)

Adamw8 is probably a "safer" bet. it also might be worth trying 2e-4 LR, I found 1e-4 to be undercooked but it looks like 3e-4 might've fried fingers a little (if that wasn't from the bucketing). I recommend keeping your dataset under 100 images if you want it to be done overnight. if you need longer training sessions utilize --save_state and --resume, I did 3 nights of a 500+ image dataset run without testing like a dolt only to realize I typo'd the LR and fried the entire thing. vowed to prune all my datasets after that lmao. I love making loras but I'm a certified retard desu
>>
>>102014027
yeah, I wrote one
>>
Where do I actually learn what nodes are doing? I see my prompt connect to guidance, which connects to a guider, and all I can think is "what the fuck is guidance? is 3.5 correct guidance? what the fuck is a guider?"
>>
File: ComfyUI_00727_.png (1.45 MB, 832x1216)
1.45 MB
1.45 MB PNG
from the Busty mesa
a Gooning shadow gross
standing in the Gushes
the Femboys of Dark Souls
>>
File: 00045-2927397766_cleanup.png (2.58 MB, 1280x1920)
2.58 MB
2.58 MB PNG
>>
>>102014091
Good job!
>>
>>102014043
hm I just tried this but am getting
ValueError: Z:\black-forest-labs\FLUX.1-dev\transformer\ does not appear to have a file named Z:\black-forest-labs\FLUX.1-dev\transformer\diffusion_pytorch_model-00001-of-00003.safetensors which is required according to the checkpoint index
>>
I've manually described 55 images so far and I'm hard as diamonds. I think I've got a good variety of poses and outfits. When should I stop?
>>
Any recommended temperature settings for JoyCaption?
>>
File: ComfyUI_00735_.png (1.32 MB, 832x1216)
1.32 MB
1.32 MB PNG
font reminds me of the colombian drug lord's apartment in predator 2
>>
>>102014210
>>102014043
oh ffs do I need these 3 10gb files as well as flux-1-dev in the higher level? wtf
>>
>>102013088
Kinda want to make a RuneScape Lora for no other reason than to generate funny pictures. Haven’t even touched the game in over a decade. Midjourney can do it natively but flux doesn’t know it at all.
>>
File: 2178570291.png (1.61 MB, 896x1152)
1.61 MB
1.61 MB PNG
>>
>>102014241
the whole setup, structured the same way

it's a pain but at least it works
>>
>>102014261
reeee my internet is so slow that will take hours, whatever thanks tho
>>
File: 00050-545778476.png (2.34 MB, 1280x1920)
2.34 MB
2.34 MB PNG
>>
>>102014269
it would be faster for you to learn how to modify the script to not be so retarded
GPT/Claude can help
>>
File: ComfyUI_00810_.png (1.78 MB, 1152x1536)
1.78 MB
1.78 MB PNG
>>
TroonMix 7.7
>>
>>102014269
another problem i hit was the config.json in /text_encoder etc getting downloaded as text_encoder_config.json and then ai-toolkit couldn't find it

copy/pasted the config.json and renamed it so both paths were covered just in case
>>
File: 00049-295344137_cleanup.png (2.53 MB, 1280x1920)
2.53 MB
2.53 MB PNG
>>
>>102014341
>ai face sloppa
>>
File: bComfyUI_107072_.jpg (234 KB, 768x1024)
234 KB
234 KB JPG
>>
Is there a big naturals lora for Flux yet?
>>
File: 625907938.png (1.35 MB, 832x1216)
1.35 MB
1.35 MB PNG
>>
File: ComfyUI_00814_.png (1.1 MB, 864x1152)
1.1 MB
1.1 MB PNG
>>
File: bComfyUI_106887_.jpg (952 KB, 3424x1728)
952 KB
952 KB JPG
>>102014440
>breathe the sun
>>
Literally every single Flux lora I tried completely destroys its ability to make pixel art
>>
File: flux_00345_.png (1.46 MB, 968x1120)
1.46 MB
1.46 MB PNG
>>
How do I use flux loras in comfyui? When I try to load it like a normal lora it hangs and doesn't finish generating an image. I'm using the fp8 checkpoint
>>
File: ComfyUI_00815_.png (1.54 MB, 1032x1376)
1.54 MB
1.54 MB PNG
>>
I suppose it's not ideal, but what are the implications of stopping the training at 500/2000 and adding more images to the dataset?
>>
File: 00071-AYAKON_1248183.png (1.75 MB, 1280x1280)
1.75 MB
1.75 MB PNG
>>
>>102014525
I don't actually know but do all loras work with quantized versions of the model? ie if it was trained using full dev does it work with fp8
>>
File: ComfyUI_21215_.png (2.33 MB, 1920x1080)
2.33 MB
2.33 MB PNG
>>
What's the best option for img to img, local or not? Specifically I want to turn a 3d image of a woman into a realistic one, pic rel. If you want to give it a try, i'd appreciate it.
>>
>>102014565
Yes.
>>
File: ComfyUI_21129_.png (2.35 MB, 1920x1080)
2.35 MB
2.35 MB PNG
>>
>>102014576
is this your boyfriends second life avatar?
>>
I'm having trouble with flux prompting, having been so used to SDXL. They say just use natural language but I'm stumped
>>
>>102014597
yes my username is bardfinn
>>
File: ComfyUI_21283_.png (2.54 MB, 1920x1080)
2.54 MB
2.54 MB PNG
>>
>>102014597
yes
>>
File: ComfyUI_21143_.png (2.57 MB, 1920x1080)
2.57 MB
2.57 MB PNG
>>
So, how long should I expect to wait until we start getting some finetuned models that support danbooru knowledge/tagging? I miss the power that gave me to be specific with outfits, plus character knowledge..
>>
>>102014626
two weeks
>>
>>102014576
>describe girl in prompt
>use control net to keep pose, canny, etc
>roll
>pray
>>
>>102013347
Full size: https://files.catbox.moe/brpmk3.png
here are the 1024*1024 gens with the same settings. early to mid epochs had my sides in stitches, but the later ones turned out better than I expected desu. I will be back either tomorrow night or the next day to compare how training at 1024*1024 turns out instead. fingers seem a bit wonky and the finer details are messy, but I left resizing for 512*512 to the fate of kohya's bucketing system and I can only imagine that made things worse
>>
>>102013347
>captioned with both local joycaption and booru tags, using the wildcard arg
How do you do this and what do the final prompts end up looking like?
>>
File: ComfyUI_21152_.png (1.81 MB, 1920x1080)
1.81 MB
1.81 MB PNG
>>
for flux loras is it looking better to just include the joytagger output or to also include the booru tags of the image if you have them?
>>
>>102014691
no one knows, anon. We're all just trying things out. Tbh I think captions and tags are pretty irrelevant atm, the current scripts aren't even training the text encoder and t5 is recommended not to train at all.
>>
>>102014105
>Yvonne Strahovski
Don't even need to ask if it's flux. Catbox or prompt? I really like the artstyle. Is it a lora?
>>
File: 1706834065102747.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
damn bitch you live like this?
>>
File: ComfyUI_21221_.png (2.65 MB, 1920x1080)
2.65 MB
2.65 MB PNG
>>
File: flux_00355_.png (1.35 MB, 968x1120)
1.35 MB
1.35 MB PNG
>>
>no news for months
>flux
>that's not really bringing something new to the table
Are we in an age of stale or something?
>>
File: ComfyUI_212321_.png (3.25 MB, 1920x1029)
3.25 MB
3.25 MB PNG
>>
>>102014781
kek
nothing ever happens
>>
>>102013479
And out of those 12 thousand the lowest tier they are paying is 4.50 quid while the highest tier is 1,186 quid.

So this guy must be making a minimum or 50k quid a month...

And he's STILL hungry for more.
>>
File: ComfyUI_21295_.png (2.39 MB, 1920x1080)
2.39 MB
2.39 MB PNG
>>
>>102014781
bro the boomer prompting and the text bro
>>
>>102014637
Are you new to this?
>>
>>102014781
>t5
>up to 2mp native support
>super effective loras
>trainable with local hardware
if Flux doesn't get your nipples hard, then I'm not sure what would
>>
File: 00013-1151662440.png (3.4 MB, 1280x1920)
3.4 MB
3.4 MB PNG
>>102014728
>Is it a lora?
Yes, and it's not Flux. Just copy/paste the prompt from civit.

https://civitai.com/models/87167?modelVersionId=311382
>>
>>102014691
Write comprehensive and well written descriptions of the images by hand. That’s the Flux way.
Also holy shit is this thing easy to train.
>>
File: ComfyUI_00615_.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
Where do I see which models are loaded in VRAM and which in system RAM? The stdio output of Comfy only notes if and when models are loaded (and if “completely” or “partially”), but I can't see what they're loaded into.
I tried the Q5_K_S quants for both unet and t5xxl, and my gens actually take a few seconds longer (24.5 seconds per image with Schnell and no LoRas) than with my older setup of fp8_e4m3fn for the weight type and the t5 model. (19 seconds per image)
nvidia-smi shows VRAM being almost fully utilised in either scenario, system RAM usage seems to be noticeably higher when not using quants, but the quant setup still uses some system RAM
12 GB VRAM btw
>>
File: 00138-486044689.png (2.21 MB, 1024x1528)
2.21 MB
2.21 MB PNG
>>
>>102014819
My mistake. Yvonne Strahovski shows up all the time umprompted for me on flux. I just assumed it was her again. Yeah I can do that artstyle on pony models easily enough.
>>
>>102014723
???
You don't need to train the text encoder in the same way you don't train the clip. The text encoder converts the prompt into machine language, so your captions absolute *do* matter. The only difference is it's not braindead retarded to begin with unlike SDXL which doesn't even function without training if you're doing a new concept. The T5 doesn't give a shit about new concepts it converts everything fine because it's basically trained on everything to begin with so it knows basically every concept from The Pile (or the equivilent).
>>
>>102014654
I got chatgpt to edit the joycaption script to mass queue all the images in a folder then save the caption output as individual .txt files. I do the booru tags the normal way with boorudatasettagmanager and randomly shuffle them (I made a script for this with chatgpt also but I think the setting is built into the dataset tagger, I just had these ones pretagged from sdxl). I combine them into one .txt file, that looks like this:
Line 1:
>Booru tags
New paragraph
Line 2:
>joy caption prompt (make sure there are no new paragraphs and it's all one solid block)

then in my kohya ARGs I use --enable_wildcard
this makes it so it randomly switches between the two lines for the chosen caption during training (booru tags vs boomer prompt). whether this is optimal for captioning or not I don't know, being a vramlet I haven't done extensive testing and am happy enough with this process as is

here is an example image from my ibuki satsuki dataset and the accompanying captioning: https://files.catbox.moe/8km0lp.jpg
https://files.catbox.moe/28qr0q.txt

as for how that translates to prompting when genning, well, you can see what tiny shitty prompt I used for my test images lol.. didn't need to echo the boomer prompt by any means
>>
>>102014841
Also, you can train on 24GB without low vram mode with KDE or any other light DE (or if you don’t have a display server running at all, I guess). Which is neat.
>>
First attempt testing Flux IpAdapter (focusing on style transfer for fine arts) I rate it: hey it works! and 2/10
>>
Sub 100 properly tagged/described images > quantity of sloppy automated tagging
>>
>>102014906
Thank you keep up the good work anon
>>
>>102014923
lmao whatever helps you cope
>>
>>102014890
>Yvonne Strahovski shows up all the time umprompted for me on flux
show us an image that you think looks like her
I think your brain is fried
>>
File: bComfyUI_107210_.jpg (243 KB, 768x1024)
243 KB
243 KB JPG
>>
File: flux_tmp~2.png (2.94 MB, 2304x1792)
2.94 MB
2.94 MB PNG
>>
https://education.civitai.com/quickstart-guide-to-flux-1/
>We’re finding and hearing that captionless training is better than long narrative style captions (or Danbooru captioning)! Try your next training session with no captions!
What did they mean by this?
>>
>>102014963
Neat. Prompt?
>>102014973
Neat, as always.
>>
File: Heresy detected.gif (1.56 MB, 498x498)
1.56 MB
1.56 MB GIF
>>102014978
what DID they mean by this?
>>
File: chibimetroid.png (961 KB, 1024x1024)
961 KB
961 KB PNG
>>102013314
Because if I wanted to fill up the prompt with something, I'd do it with more relevant things to get closer to what I'd imagine the prompt to produce, like picrel.
I could probably get closer if I kept removing things and adding things to the prompt, but I already have a 8MB text file with prompts to try out, and it keeps growing, so working in a single one seems like a waste.
It's just that with Flux I'm not really looking forward to see what it does for "The invention of gravity" because it'll probably be something bland.
>Pixel Art. Classic video game screenshot from the Super Nintendo. On the left there's a chibi Girl wearing a golden armor bodysuit with a red helmet and emerald pointing bazooka at giant lizard with red eyes on the right. It is night with black sky. Bricks tunnel. Blue floor made of squares, teal ceiling with bushes, a closed pink door at the right and orange spikes at the right. Life counter at the top with a checkered pattern.
>>
>>102014978
Huh? As in training with nothing associated to the images at all? I guess it works for style LoRAs and other things you want to apply indiscriminately. But not for characters or objects.
>>
>>102013088
Can I get top left with flux yet?
>>
File: grid-0463.jpg (319 KB, 1792x2304)
319 KB
319 KB JPG
>>
>>102014929
thanks fren, good luck on your ventures if you're training
>>102014978
idc what they say I'm captioning
>>
>>102014999
My point is you get the bullshit generator you wish you had by putting an unhinged LLM on your prompts. It's a stupid thing to whine about anyways, SD 1.5 still works if what you want is somewhat relevant random images.
>>
>>102014978
>Phenomenal likeness capture can be achieved with Kohya, 20 images, ~1000 steps.
I hate when people say shit like this without stating the repeat quantity. 1,000 steps can mean a lot of different combinations of repeat/epochs
>>
How many people on civitai are just generating a bunch of images with sdxl and using them and their prompts to train loras?
>>
>>102015028
You need control anyway, if you are doing something with different clothes, characters, expressions and so on, you would want it to know that when you want to generate the images lol
>>
>>102014978
Sounds absolutely retarded if you're training an Emma Watson Lora. It doesn't even make sense, words mean something, words are intent.
>>
File: 04200-Maybe an image.png (1.07 MB, 896x1152)
1.07 MB
1.07 MB PNG
>>
File: 1692978537356205.png (1.45 MB, 1536x612)
1.45 MB
1.45 MB PNG
>>102014781
Flux is a huge improvement, but it's still lacking the ability to follow prompts as well as corpo models. I know for a fact the knowledge can and will be expanded, but I'm not sure the complexity of prompts will be. The reason "bing image creator", as grubby as it sounds took off, was because people were able to type "goku robbing a gas station CCTV camera" and get close to what they were thinking about. Reminds me of craiyon which, while garbage quality, would ALWAYS try to the best of its ability to do what you were asking it to do. Flux is doing its best, but it drops concepts in the prompt left and right (in this example first person, doomguy, old doom graphics, etc).

>>102014999
very cute output, good example of what I mean about following instructions and getting the details of the prompt right
>>
File: flux_00087_.png (1.77 MB, 896x1160)
1.77 MB
1.77 MB PNG
>>102014959
You're just jealous that Yvonne never appears to you.
>>
>>102015062
There are loads using Pony images and it baffles my kind and pisses me off, so damn dumb.
>>
>craiyon
>>
>>102013681
Welp, it's like technology is supposed to work in some way but in practice people just have to point at stuff like this as proof it's all slop.
>>
>>102013088
patiently waiting for some nerd to create a local generator that is a simple .exe file with big buttons on all the commands I need. cba installing a bunch of random shit and need to do python commands etc
>>
File: flux_00104_.png (1.69 MB, 896x1160)
1.69 MB
1.69 MB PNG
>>102014959
>>
>>102015089
>>102015111
yea, your brain is fried, anon
lmao
>>
>>102015097
It was trash, if it's still around it's still trash. But I will stand behind what I said, every prompt was a complete slurry mess but by god did it somehow get within the ballpark of what the prompt asked for in natural language.
>>
>>102015082
>it doesn't do Messi drinking a can of coke in the style of Doom 92 while he's playing Blitzball where the art direction is inspired by a dunk Picasso so it's bad
>so what if you can train a Lora doing exactly what you want in an hour, I expect it to compete against megacorpo models running on A100s and trained by people who hand curate and caption datasets
>>
>>102015097
>>102015127
256x256 craiyon images were more detailed than any model on any architecture; 4 stitched together would be higher than stuff frfr ong
>>
File: flux_00110_.png (1.61 MB, 896x1160)
1.61 MB
1.61 MB PNG
>>102015126
I mean, it's not literally Yvonne Strahovski, but it definitely has some of her features right? I'm not a schizo. I'm not.
>>
I trained a lora on a particular girl who sports a healthy bush. I described her pussy as “hairy pussy” every time. But my samples are showing images of her with hair covering her whole belly and part of her nipples too more and more as the training progresses.
What might’ve I done wrong? Using the default settings from ai-toolkit.
>>
>>102015145
kek I remember that schizo
>>
>>102015152
>Yvonne Strahovski
Looks more like Megan Fox with blonde hair
>>
>>102015152
if by "some of her features" you mean generic pretty blonde then yeah... otherwise no, I'd never say any of these look like Yvonne
>>
>>102015133
>in an hour
How to without creating something shitty?

Fuck. It’s a million degrees in this room right now. Holy shit this will be great in winter.
>>
>>102015102
no, retard. that is not slop. THIS is slop. get it right, monkey.
>>
>>102014978
my first train attempt was with no captions and it was shit compared to when i added the captions
i guess if you use class images no captions might work (or not)
>>
File: 00053-2014742755.png (2.89 MB, 1280x1920)
2.89 MB
2.89 MB PNG
>>102015152
>but it definitely has some of her features right?
Other than being blonde, no.
>>
>>102015156
the model has zero concept of "hairy pussy", you will need other images of hairy pussies to demonstrate what you mean, otherwise you can keep training and eventually it will learn after overbaking on that person
>>
>>102015159
Does anyone remember the guy that always wanted people to make gens of the asian girl at his work he's obsessed ove, which he would go into great detail for a good gen.
>>
File: you.png (1.35 MB, 1036x1200)
1.35 MB
1.35 MB PNG
>>102014576
>If you want to give it a try, i'd appreciate it.
kys beggar
>>
File: FluxDev_02386_.jpg (252 KB, 832x1216)
252 KB
252 KB JPG
>>
>>102015082
>google image fx
what the fuck is that?
>>
>>102015184
>30 images of Doom screenshots
>get magic in 250 steps
>>
alright, so i have a dataset of ~150 pics i want to train a lora on
synthetic captioning hallucinates too much and makes too many errors, i want to ensure quality gens
just how detailed would my captions need to be if i'm training a character? if all the pics contain the character (but maybe different angles, or together with other characters) could i just copy-paste the character's description into each caption and be done with it? what parts would i actually need to describe?
>>
>>102015216
if it is recent then it is the Imagen 3 model by Google
>>
>>102015224
I don't mean to alarm you, but the model wasn't trained on perfect captions
>>
>>102015224
>synthetic captioning hallucinates too much and makes too many errors, i want to ensure quality gens
You can always fix the captions manually
>>
File: flux_00198_.png (1.62 MB, 896x1160)
1.62 MB
1.62 MB PNG
Yvonne Strahovski has blessed me once again. Praised be!
>>
>>102015204
From what I’m seeing so far, it will output a Chewbaka monster if I try to gen her, but it will do normal nakedness for other random girls.
But thanks. Next time I try this I will include a wider variety.
>has zero concept of hairy pussy
Or any pussy at all, really
>>
File: result.jpg (188 KB, 750x520)
188 KB
188 KB JPG
>>102014921
I don't think this can improve without an advanced ipadapter node. The text capabilities seem damaged, but it shows promise (1st image is the reference for ipadapter and the others are different attempts with different seeds, guidance and ipmodel strength)
>>
Vem är detta? Är det typ matgeeks son eller?
>>
>>102015071
my thoughts exactly. even for a style lora, my ibuki satsuki style choice is a perfect example of where captioning can help a lot - no way it can tell the androgynous ass xianxia men apart from the females in 75% of the dataset images without captioning. I'm also stubborn in that when training for SDXL no captioning on style was also "recommended" but looked like total ass compared to captioned. you just can't convince me no caption has a better outcome vs the captions were bad in their test group that saw a better outcome. but end of the day, if no captioning works for them then power to them... I'm still not sold, personally
>>
File: woodshop.png (930 KB, 1024x1024)
930 KB
930 KB PNG
>>102013345
Well, that got me curious. I wonder if there's a "wood shop" short word to make something very detailed like this but for a video game like super metroid. Also I had to change "little" for "chibi" or it would go to a place like this instead >>102013314
>Chibi girl in Super Metroid video game in a wood shop.
>>
File: joybatch.png (65 KB, 816x615)
65 KB
65 KB PNG
https://files.catbox.moe/a9tbk3.json
A simple batch Joy Caption workflow for fellow brainlets, change the path and set the batch count in the menu to match the number of images in your folder.
>>
>>102015207
Yes. Coworker anon. The girl with the green cardigan.
>>102015219
Just 250? What other params?
>>
File: 00208-2024-08-21-cJak.jpg (3.1 MB, 2048x2688)
3.1 MB
3.1 MB JPG
>>
>>102015271
The joy is in learning. Despite what they say, this space is quite creative. Want me to pick your images for you too? Maybe I'll write your captions as well.
>>
deis+ddim_uniform gives me sharper images but also gives me extra limbs, and even heads once, more often than euler+normal
>>
>>102013704
FLUX.1, what everyone in the thread is using:
https://huggingface.co/spaces/black-forest-labs/FLUX.1-dev
https://huggingface.co/black-forest-labs/FLUX.1-dev
>>
>>102015211
Actually pretty funny
>>
File: bComfyUI_107247_.jpg (262 KB, 768x1024)
262 KB
262 KB JPG
>>
>>102015234
yeah, i know, but i've done 5k steps on my synthetic captions without any manual intervention and it still hasn't learned the concept well
looked at the captions afterwards and they're often riddled with errors about what item is what and where it's placed on the character's body

>>102015242
fair, just figured it might take longer scrutinising each sentence for errors then correcting them than it would be to write something from scratch
>>
>>102015266
thanks dude
>>
>>102015252
Yeh I agree, at the end of the day we are training this loras for our own needs, and I think we know better what our needs are.
>>
File: 1695564861871046.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
>>102015133
>it doesn't do Messi drinking a can of coke in the style of Doom 92 while he's playing Blitzball where the art direction is inspired by a dunk Picasso so it's bad
I didn't say it was bad, I said it didn't improve an an area I was hoping it would. It's better than any previous SD at base. And yes, if one model can follow instructions in a prompt and the other one can't, one model doing worse than the other in that way.
>so what if you can train a Lora doing exactly what you want in an hour
LoRAs are not a magic bullet for fundamental model issues. Used right they can let you produce images better than any corpo model but the model is not going to get better at understanding NLP
>I expect it to compete against megacorpo models running on A100s and trained by people who hand curate and caption datasets
You sound offended and I have to ask why because I'm not here to support corpo models. Obviously microsoft will do better than a team of less than 10. The point is not to prove that it does worse, but to discuss in what ways it got better than SD and in what ways it didn't.

>>102015286
>The joy is in learning.
Exactly. I'm learning Flux, making comparisons with the current image gen sphere, and having fun. Don't take my explorations as a personal attack on the base model or something like that.
>>
>>102013644
I know, right? And then there's lnkdn.safetensors which gives the greatest variety of ugly faces of any image model and nobody has given it a single heart at huggingface.
>>
>>102015224
i assume that image recognition models prompted to caption a picture produce a caption structured roughly like what they were trained on, so feed a few pictures to gpt4 because it's the SOTA and ape what it does
>>
>>102015349
I'm tired with faggots that can't even run the model bitching that it doesn't compete against 80GB VRAM corpo models. You really don't have anything interesting to say because at best you have the opinion of a spoiled child.
>>
>>102015266
this is probably a dumb question, but why does that need to be in comfy? can't you just run a script to caption things?
>>
>>102015336
Nice, using the Elden Ring lora right?
>>
>>102015382
that's basically a visual programming workflow, no different that a for loop in code doing captions but took someone 5 minutes to do instead of 15
>>
>>102015286
I don’t now what any of the parameters in training mean, so you telling me why and how you can train with only 250 steps would lead me to ask other questions and learn. But enjoy that weird superiority fetish. It’s not like you’re the only source of information in the world.
>>
>>102015382
you can, i used this
https://github.com/bigdata-pw/florence-tool
it just werks
make sure you use
--task "<MORE_DETAILED_CAPTION>"
for some good captioning
>>
>>102015405
Zoomers are so pathetic its like they forgotten how to use the internet. There are better places to get your spoon feeding.
>>
I finally managed to get Flux lora training to start with kohya, but the loss is very high at 1.93. Is this normal for Flux?
>>
>>102015382
needs top_kek parameter
>>
>>102015209
hot
(nta)
>>
>>102015425
my last attempt was at 3.43 lel
it did mostly work, i'm gonna try another tonight
>>
>>102015405
>it turns out to be that weird ugly guy from github and he won't give you any info because he needs it for his patreon
>>
>>102015425
I haven't had anything above .4
>>
>>102014288
Was this in flux?
>>
>>102015453
>no I won't even do the exact instructions posted on the Github, you need to tell me what to do exactly
>I expect you to spend way more time teaching me than the time I will spend training (spoiler: I won't be training)
>>
JoyCaption or CogVLM-2? How do they compare? (And how do I get the latter running on Windows, preferably without WSL?)
>>
>>102015478
TroonMixXL
>>
File: bComfyUI_107250_.jpg (260 KB, 768x1024)
260 KB
260 KB JPG
no it's just flux, i haven't fucked with loras yet doubt i will any time soon.
>>
>>102015487
joycap is llama3.1, and it's alright
>>
>>102015423
It's because later 90s/early 00s kids were all actually using PCs a lot, for MSN messenger, video games, hacking etc Which often came with many issues that come with PCs that we all had to deal with.

Meanwhile the new generation is growing up using iphones and ipads where everything is finetunes to cater for their needs, so they never actually understood the actual workings of anything.
>>
File: 1699087107261870.png (588 KB, 821x821)
588 KB
588 KB PNG
>this is very dangerous for our democracy
>>
>>102014601
Post a pic that already looks like you want, for help.
>>
>>102015501
We're so fucked, they have zero confidence and initiative and they are paralyzed the second they have to make their own decisions. Apps have absolutely fried their brains and make them dysfunctional.
>>
File: 1699921812062305.png (55 KB, 769x863)
55 KB
55 KB PNG
>>102015480
>i wont spoonfeed, learn it yourself
>the official documentation
>>
>>102015453
You are just Jealous of my brain, come join patreon and I can teaching you 1 to 1 and maybe you brain can becomed amaze like me.

I have 12 THOUSAND patreon fans, you are NOTHING to me
>>
number of loras going up on civitai for flux is insane
>>
>>102015480
jokes on you, I'm the 12gb lora anon whose been replying with detailed explanations of what I do when anons ask. gatekeeping is dumb, so is trolling with it - shit or get off the pot
>>
>>102015532
yeah that snark and bad faith image really showed me
>>
>>102015478
No, it's 1.5

>>102015490
Get back to your containment general
>>
>>102015539
why are you asking stupid questions if you already know the answers?
>>
lel
>>
>>102014815
Something like this:
https://huggingface.co/spaces/gokaygokay/FLUX.1-dev-with-Captioner
You input a picture, you get a picture and a prompt, that's the dream.
Except right now the picture you get is completely different, specially if it's a style it doesn't recognize.
>>
>>102015538
>tfw will NEVER EVER get a nazi lora on there
my waifus...
>>
>>102015519
imagine life in 15-20 yrs
>>
>>102015558
OK I admit I'm that guy, I just a bit desperate because 12000 fans are waiting for me to teahcing them , so I need YOU TO TELL ME WHAT TO DO
>>
>>102015558
I'm not the anon asking, I'm just the one making fun of you for acting like it's so hard to give them an answer. if I had to guess, you haven't actually trained what you claimed and have no answer to give them - you'd rather troll for your (You)s for some godforsaken reason
>>
>>102015581
probably working while the zoomers get put in the pods as their parents drop dead
>>
>>102015581
The generation after won't even be capable of imagining, that's how spoonfed they will be, that's how low their attention span will be, they will have no need to actually think or imagine anything.
>>
>>102014898
>The T5 doesn't give a shit about new concepts it converts everything fine because it's basically trained on everything to begin with
It's clueless about Final Fantasy's Blitzball.
>>
>>102015591
probably just wants to scare new local users away so they give up and use corposlop instead
>>
>>102015211
did you prompt it as a body pillow? anyways i need a pillow like that in real life now
>>
can one abuse HF to train a lora?
>>
>>102015591
you have spent more time complaining about getting answers than the time it would take to google search your answers (or read the thread)
again, bad faith, disingenuous with zero intent
I'm not one of your Zoomer youtube videos that you pretend to watch to "learn" something
speaking of, there is a Youtube video for Flux Lora training, go watch it
>>
>>102014815
>if Flux doesn't get your nipples hard, then I'm not sure what would
There are no nipples
>>
Training fluxd takes too damn long desu
>>
>>102015610
yeah, tried dakimakura first but it has no idea what that is
>>
>>102015618
I made that video
>>
>>102015605
The T5 doesn't make images you ignoramus.
>>
how much faster can i gen if i rent a a100 or h100 pcie compared to say something like a 4090? this is just for genning, not for creating a lora.
>>
>102015618
confirmed troll. no more (You)s for you, you can't even read what you're replying to
>>102015622
I leave it on overnight/into the morning if needed while I do other stuff and am trying to keep to a sub 100 dataset. makes it a lot less painful. definitely miss the 30 minute sdxl training though
>>
>>102015648
You're a very helpful person anon and not at all a tool.
>>
>>102015045
>SD 1.5 still works
I still have to jump from model to model depending on what I want. If only someone gave me my Omni SD 1.5 model, a Stable Diffusion 1.5v2 of sorts, I think I could move on with my life.
But it seems like a jump like
SDXL -> Flux -> the next big thing
Seem more likely.
>>
File: ComfyUI_00627_.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
>>102014842
I took the faster original workflow and only replaced the unet with the Q5_K_M one, and generations slow down to 24.5 seconds as well
The Q5_K_M t5xxl model alone had no negative effects, generations still at 19 seconds.
Why is it that the quantized unet slows down generation by a whopping 25%? (Using the latest version of comfy, freshly git fetch/git pulled.)
>>
Is there a way to get a real time graph showing loss in ai-toolkit like koyha has?
>>
>>102015662
I'm going to give you a protip: no one is making a shitty SD model that generates random gachas 20% based on what you wrote.
>>
>>102015082
>Reminds me of craiyon which, while garbage quality, would ALWAYS try to the best of its ability to do what you were asking it to do.
Yeah, so why did nobody make some "Craiyon makes the compositions" and another model finishes the picture workflow? Wasn't DalleMini open source?
Craiyon was much better at this than Stable Diffusion 1.4 itself, why hasn't anybody been able to replicate such a thing?
>>
>>102015698
>>102015145
>>
File: bComfyUI_107292_.jpg (246 KB, 768x1024)
246 KB
246 KB JPG
>>
bumbo
>>
File: result2.jpg (311 KB, 750x520)
311 KB
311 KB JPG
Above SDXL images as reference for IPadapter. Below Flux.

I think this shows more promise than the style loras I've tested to the date.
>>
please someone tell me an addon exists for comfy where it auto generates prompts, and then gens however many you want and then goes to the next prompt. this would be a lazy but cool way of finding cool stuff.
>>
>>102015647
if you are just genning images it wouldn't actually be that much faster. H100's are 300 watt limited, they just have a ton of ram with a huge memory bus. Renting a 6000 Ada would be interesting, way cheaper than a h100 and you still get 48 GB of ram so can easily load the full model and loras.
>>
>>102015671
It dequants on-the-fly I believe.
So you might be paying the toll in the sense that there's that extra dequant compute overhead.
I think it dequants to FP16 - but maybe someone can correct me?
>>
My attempt at a porn lora is genning literal necrotic limbs.
>>
>>102015127
>if it's still around it's still trash
Last time I used craiyon it was a worse version than SDXL that rewrote your prompts to censor them causing it to not draw at all what you sent or just refused to process them, but apparently they turned it into the most profitable adventure so it turns out most people are like flies attracted to bad smell.
>>
>>102015767
dogkennel
>>
ultimate sd upscale seems to be adding ghost limbs to the right side of every gen i make now, what settings affect this? i figure it might be the sampler but i usually just set the sampler to the same one i use for imagegen.
>>
>>102013300
yeah the other guy is right, I was using a1111 for over a year right up until flux, you wanna move to comfyui immediately
>>
>>102015186
Different kinds of slop.
>>
File: 00069-993734854_cleanup.png (2.93 MB, 1280x1920)
2.93 MB
2.93 MB PNG
>>102015790
Show side by side example. What's your denoising
>>
>>102015765
Also, Flux support apparently added to SD.cpp.
>>
>>102015423
You're strawmanning hard, friend.
>>102015501
>>102015519
I still remember hopping on IRC to ask questions to more knowledgeable people in the early 2000s when I was learning to compile the kernel or whatever. This is no different. You're projecting a gripe you have onto someone who isn't displaying what you complain about, at all.
Like I said, I will figure it out. You can stop tugging on your dick now lest you break it.
Or not, keep dooming hard about zoomies. Whatever gets you off.
>>
File: its me hatsune miku.png (1.4 MB, 1024x1024)
1.4 MB
1.4 MB PNG
>>102015698
craiyon as a model has interesting qualities but sometimes you're missing the forest for the trees when you inspect a model for its specific strengths. the forest is a nightmarish slurry of pixels, i blame nobody for not iterating on this in the foss environment
>>
>>102015812
>I always needed to be spoon fed
>>
had been generating exactly what I wanted and then...
https://files.catbox.moe/52vte7.jpg
>>
My migu keeps making this face bros...
>>
okay using my lora, I can make a 1024x1024 image every <60s on my GPU, but as soon as I change the CFG to anything greater than 1, it turns into 20 minutes per image. wat
>>
File: aseet.jpg (20 KB, 542x375)
20 KB
20 KB JPG
>>102015767
>>
>>102015082
All of these closed model have a LLM layer to enhance your prompt and fill in the detailed. How hard is to understand that? Flux dev is using exactly what you put in and you're surprised? ffs
>>
>>102015228
Apparently it's been there since February?
This is some mandela effect shit, people haven't ever mentioned there being free access to Google Imagen generations and when they do it's 6 months old tech?
>>
>>102015249
>it shows promise
It's the most terrible style copier I've ever seen.
>>
>>102015811
https://github.com/leejet/stable-diffusion.cpp/tree/flux
whoa!
>>
>>102015842
I think availability is what changed.
>>
>>102015838
At this point I'm convinced they're OpenAI employees. Either that or they're retarded thinking that local AI is going to compete against models with hundreds of millions behind them and enterprise GPU hardware. Flux is here but we're already on the cusp of another major release from OpenAI.
>>
>>102015804

bad slop >>102015186

good slop >>102013681
>>
>>102015249
I'd rather just get a textual embedding of the style at that point
>>
File: file.png (2.61 MB, 1024x1024)
2.61 MB
2.61 MB PNG
>>102015782
It just doesn't understand what genitals are. It tries to emulate them by rendering deformed hands and limbs imitating the correct shape where the genitals should be. And suddenly this pic related lmao. It's really confused.
But now I think I understand how my captioning was deficient. Much of the stuff I described, the model has no reference to compare it to, and I threw complex scenes at it. With more simple images and granularity in introducing it to different things, it should work better. I think. I have no fucking clue what I'm doing. Also probably more training.
>>102015837
https://litter.catbox.moe/ds4pho.jpg
https://litter.catbox.moe/o4t8c9.jpg
>>
File: wudduhfuck.png (1.13 MB, 1463x902)
1.13 MB
1.13 MB PNG
>>102015810
yeah i think it was the sampler scheduler, i shouldve set it to normal and i think the cfg scale fucked it too
shit im still learning how all this works, i really regret going lazy mode with forge all year till the past 3 weeks.
not getting ghost limbs now but i am getting weird artifacts like that strange black thing on her foot, which came from the first pass.
>>
>>102015879
>compete against models with hundreds of millions behind them and enterprise GPU hardware
SAI has had hundreds of millions behind them, BFL has at least 33 million behind them
>>
>>
>>102015924
SAI was embezzling money. No way they spent that on training. BFL still has to stay within the realms of 24 GB of VRAM, OpenAI can train on 192GB+ VRAM beasts and inference on 80 GB H100s. It's completely different targets.
>>
>>
>>102015936
DALL-E 3 is 8B+5B
>>
>>102015948
seriously doubt that lmao
>>
>>102015951
Believe it. It's all in the training data.
>>
>>102015968
>source: my ass
Either way I win, because 12B with proper finetuning will far excel DE3 if it's 8B
>>
>>102015646
When you ask for it it doesn't have an idea of how to encode it so an image generator can make it.
>>
>>102015895
This is done on the fly. There is no substitution for proper recognition of styles but that needs a finetune or another version of flux. Are loras that different from embeddings? I have only tested loras and what I've seen has been very disappointing.
>>
>>102015879
>thinking that local AI is going to compete
Again, it's not a competition, it's a comparison. My goal isn't to find which model is "better" because that's a stupid and obvious thing to look for. I'm looking for how the local scene is improving on different fundamental skills like NLP comprehension. The other results just give a good idea of what a successful result looks like, if corpo models can't do it then it's not a good test to try and prompt Flux for it.
>>
>>102015978
It came to me in a dream and finetuning Flux won't be enough.
>>
>>102015995
???
I've already made Loras that far excelled the shit DE3 can do. Have fun with your dogs. I can't wait for DE4 with its thought crime layer.
>>
>>102015936
It's a nice meme to say SAI embezzled or sniffed their funding but the reality is they oversubscribed to AWS to feed Emad's ego on having over 4000 A100. Their AWS bill is millions per month.
>BFL still has to stay within the realms of 24 GB of VRAM
No they don't. The community has made it fit in 24GB. BFL train on A100 or H100 like everyone else.
>>
>>102016004
>talking about the censorship layered on top instead of the ability of the model
I accept your concession.
>>
>>102015981
I'm half tempted to make a blitzball lora to prove you wrong. But I won't. I'm going to give a protip, the words "blitzball" are in the pile of trillion tokens the T5 was trained on.
>>
>File upload IP range ban
Life is pain, a man can't even share some cute pictures nowadays.
>>
>>102016020
the largest model is 23.3 gb
>>
>>102016022
I already made photorealistic results based on what I wanted and well beyond the trite DE3 does
>that's a concession
I'm merely pointing out that the fun you had with DE3 won't even be possible in DE4 as problematic prompts will be stopped without even needing to dog.
>>
>>102015487
I had more results with:
https://huggingface.co/spaces/SakanaAI/Llama-3-EvoVLM-JP-v2
JoyCaption is only good if you need it to recognize characters.
>>
>>102016045
>he thinks it's about photorealism and not the control and builtin knowledge
>>
>>102016022
Why would the uncensored ability of the model matter when you'll never have access to it?
>>
File: file.png (523 KB, 1513x836)
523 KB
523 KB PNG
>>102016067
I have ultimate control with Loras, whatever cursory knowledge DE3 might have with random topics.
>pic related
>>
File: Capture.png (274 KB, 762x654)
274 KB
274 KB PNG
>>102016050
>>
>>102015611
They only allow free abuse of their CPUs, if you can train a lora on CPU, then, sure.
>>
>>102016108
it's not wrong
>>
File: helpme.jpg (9 KB, 250x237)
9 KB
9 KB JPG
>>102015909
>https://litter.catbox.moe/ds4pho.jpg
>https://litter.catbox.moe/o4t8c9.jpg
>dat body horror
>>
>>102016108
I can see why it said that but damn. Do you know any models that pass this test?
>>
>>102016068
Because I'm not an entitled shit flinging monkey that only cares about what it can grab and destroy.
>>102016100
loras will never match a good base training, how many can you stack before shit falls apart
>>
>>102016139
you don't need to stack on a 12B model
>>
>>102016133
>when you notice her face in the first pic
>>
>>102015133
holy fuck you're in every thread seething whenever someone drops a hint of criticism about the model. get over it. stop being a freetard.
>>
>>102016133
what's so bad about it?
sd3 user btw
>>
>>102016143
why, if you have loras of x, y and z how do you use them together without stacking?
>>
>>102016150
you're in every thread saying the same complaint that's not even accurate
>>
File: raygun.webm (3.52 MB, 640x385)
3.52 MB
3.52 MB WEBM
Anons is there anything as "good" as luma yet? I want to make trippy videos and laugh.
>>
>>102016185
>using a 12B model like SD
Flux Loras are exponentially more capable and have more surface area to work with as a rule
>>
File: Capture.png (877 KB, 1531x606)
877 KB
877 KB PNG
>>102016108
More detail but it's pretty garbage
>>
>>102016200
I need to see her cunny
>>
>>102016200
what do you think retard?
obviously if we had local luma there'd be an entire videogen thread. people even stopped sharing those attempts with SD a while ago we're far from there yet.
>>
>>102016201
how does change the problem if the concepts are in the x, y and z loras
how do you use them together without stacking, you haven't explained that
>>
File: 1699570651828244.png (16 KB, 435x364)
16 KB
16 KB PNG
the hell is this???
>>
>>102015834
I think I know where his left hand is.
>>
>>102016210
She's like 40 she doesn't have one, retard.
>>
>>102016200
that is not Luma tho
>>
>>102016213
maybe having a lora for every single actress is a retarded idea? start there
>>
>>102016004

This guy is right: >>102016022
If you take into account that a paid API does not need to censor the model itself, there is no way a censored local will ever surpass the potential of an API. Whether you are allowed to fully utilize it or not is a different story.
>>
>>102016215
Sirs please think to the village
>>
>>102016224
>because women lose their genitals at 40
ok, bummer
>>
>>102016224
(Editors note:cunny before the pedos took it was a general term for a perfect, petite pussy.)
>>
File: file.png (614 KB, 1532x825)
614 KB
614 KB PNG
>>102016232
lmao
>>
>>102016242
That word doesn't mean what you think it means.
>>
>>102016249
this
>>
>>102016230
ah so you'll make your own all in one lora, adding more and more data into and retraining every time you do so
is that it, did I get that right
>>
>>102016200
That took me off guard hard
>>
>>102016161
top kek
>>
>>102016256
stop being pedo
>>
File: TickleYourFancy.png (944 KB, 768x768)
944 KB
944 KB PNG
>>102015683
Well, ZootVision is really close, it can probably give me "by Flux" style on a future version and it may even be more detailed than Flux's outputs after I merge it with a model like picrel.
>chibi girl in super metroid by dalle 3
(by dalle 3 is just a style in the prompt)
>>
>>102016232
Why do hypotheticals matter? Companies will never take the risk of completely uncensoring their API.
>>
>>102016261
Every model has a cut off point for knowledge. Does Dalle-3 retrain for every new movie?
>>
File: kablam 800 percent mad.gif (657 KB, 193x298)
657 KB
657 KB GIF
you guys might think the judgement for imagegen model's knowledge is stupid here, but /lmg/ faggots are unironically asking local llm's obscure or often stupid gaming trivia then calling it shit if it doesn't get it

>mfw the faggots asking mistral large if it knows a specific quote from castlevania symphony of the night then calling it slop for making something up
>>
File: ComfyUI_09193_.jpg (1.16 MB, 2048x2048)
1.16 MB
1.16 MB JPG
>>102016271
>>
>>102016284
feel like you've completely lost track of what was being discussed anon
we were talking about loras being used to surpass DALL-E 3
>>
>>102016211
open source AI is over
>>
>>
File: file.png (2.68 MB, 1024x1024)
2.68 MB
2.68 MB PNG
>>102016312
you've lost track because you keep moving the goalpost, no one needs to stack dozens Loras and if you were you'd fucking merge them into a model
you just want to desperately win about DE3 even though it's completely shit, it doesn't even do Blitzball which is hilarious
>>
now I'm curious, what does flux do when you prompt for football players swimming in a giant sphere of water and throwing balls around
my GPU is occupied
>>
File: file.png (105 KB, 1139x461)
105 KB
105 KB PNG
Rate my temps
>>
File: file.png (584 KB, 926x1098)
584 KB
584 KB PNG
>>102016322
Flux's next model is literally video.
>>
File: 1703318082117541.png (1.28 MB, 1149x1862)
1.28 MB
1.28 MB PNG
tried to compare prompting styles I don't think it makes much difference.
yeah I know this has been done to death I'm just autistic and had to see for myself.
model is flux
>>
>>102015822
Except its creators DID iterate on this, but when it could draw a face that wasn't deformed it lost all the detail and compositions and became pointless because we already had the best SD1.5 finetunes at that point.
>>
>>102016271
never
>>
>>102016339
>you've lost track because you keep moving the goalpost
no anon, I did no such thing.
>you'd fucking merge them into a model
that's still stacking, try again
>>
>>102016362
go do something if you're bored, waiting only makes it take longer
>>
>>102015209
nice

is this flux? it keeps giving every girl a cleft chin for some reason lol
>>
>>102016385
I'm actually scared something might catch on fire. I just posted that to see what people would say.
>>
>>102016362
undervoltyourshit/1.5v
>>
>>102016407
take the tinderbox out of your pc tower and you should be good
>>
File: images.jpg (5 KB, 223x226)
5 KB
5 KB JPG
>>102015909
>https://litter.catbox.moe/ds4pho.jpg
>https://litter.catbox.moe/o4t8c9.jpg
Make it stoooop!
>>
File: Capture.png (15 KB, 239x505)
15 KB
15 KB PNG
>>102016362
>>
>>102016427
show the last images, you just did the same ones again
>>
File: file.png (11 KB, 249x138)
11 KB
11 KB PNG
How does this look? Are these sizes enough, or should I try to find larger ones?
>>102016438
Ah, OK.
>>
>>102016305
>obscure or often stupid
>"make this guy sit in a chair"
>*there is a guy and a chair*
It's not a knowledge problemmmmm it's NLP comprehensionnnnn. Knowledge of things is only going to get better and there's already injectable knowledge, but understanding how to put the concepts together intelligently as the prompt says to do so is a different skill.
>>
>>102016004
Show me your Loras surpassing this, it has the prompt and all.
>>
File: flux_00399_.png (1008 KB, 1200x936)
1008 KB
1008 KB PNG
>>
>>102016508
nah but enjoy your DE3 grain
>>
>>102016200
That kind of performance would give her this and the next four titles.
>>
>>102016108
Skill issue, now try this prompt:
>Make a full and detailed long description of everything in this picture.
>>
File: Capture.png (315 KB, 1500x817)
315 KB
315 KB PNG
>>102016589
>>
>>102015909
Looks like insufficient training.
>>
>>102016626
Yeah, that's how SD1 looked like when I asked it for something obscure.
>>
>>102016215
Civitai implemented a system so that most of the stuff in there is generated by people that only care about money, and the rest are buried out.
>>
>i can generate additional pics of my favorite porn actress past her prime
The future is here.
>>
>>102016363
fuck yea
I haven't bothered with flux at all. any good?
>>
>>102016508
>Dall-E 3 is very good at recognising characters
It should join forces with Akinator.
>>
>>102016680
>past her prime
you mean during her prime? why would you want a porn actress past her prime? you got a batwing beef flaps fetish?
>>
>>102016249
I used remote vision to look at it and it's very rough.
>>
>>102016689
>any good?
It's good but slow
>>
>>102016696
I meant past her prime, I can still get new pictures (in her prime)
>>
>>102016689
there entire thread of 400+ posts of flux fanboys posting.
>>
>>102016339
Nobody else would know, you're the only one that knows what the fuck is Blitzball.
>>
File: file.png (679 KB, 768x432)
679 KB
679 KB PNG
>>102016721
Excuse me.
>>
>>102016696
Your tastes change as you grow older anon, you will catch yourself looking at grannies in the future.
>>
>>102016721
OMG you never played final fantasy 69?
>>
>>102016735
>in the future
nah i do it now, but porn roastoids are different from hot 50-60 milfs.
>>
File: chibi girls.png (1.03 MB, 1024x1024)
1.03 MB
1.03 MB PNG
>>102016361
Who cares? But here's one with chibi girls at the start of your prompt.
>>
File: file.png (1.33 MB, 1440x832)
1.33 MB
1.33 MB PNG
>>102016689
Its meme value is off the charts and we're still figuring out how to train it.
>>
>>102016735
>your tastes change as you grow older
I'm sure nobody here would touch an actual cunny model with a ten foot pole
>>
File: ComfyUI_31_.jpg (1.54 MB, 2048x2048)
1.54 MB
1.54 MB JPG
>>102016735
>>
File: 1718185990179516.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
>>
>>102016624
Perfect.
>>
>>102016624
what is the actual term for that, a half pipe?
>>
>>102016624
>the car in the middle is slightly ahead
>also the car on the right side is further ahead
>>
>>102016695
Meta.ai is another model good at characters (it can be used from Whatsapp, for some reason) and can do Betty Boop just fine. But it could never get close to anything like this.
>>
>>102016826
>>102015145
>>
>>102016821
banking or banked corner or banked turn
>>102016799
Generate it:
 
The image depicts a thrilling scene of three cars on a winding road. Two cars are driving on the road, while one car is in the air, performing a stunt. The car in the middle is slightly ahead of the other two cars, with the car on the right side of the road is further ahead.

The road is curving and has a concrete wall on the side, adding to the sense of speed and excitement. The cars' positions and the road's layout create a dynamic and thrilling atmosphere.

The cars' colors are not specified, but their shapes and sizes are distinct, with one car being particularly large. The image captures the
>>
https://x.com/69420digits/media How is he making these?
Also this is the best or at least most interesting ai generated music I've heard so far
https://x.com/69420digits/status/1817466612921831740/video/1
>>
>>102016707
Nope, when I post I complain about how Flux doesn't cover what I want to draw.
>>
>>102016754
Would
>>
>>102016825
Further ahead compared to what? You know you can write perfect captions and it will have the same result as the imperfect caption because you're ultimately competing against the 15 million other captions which were auto generated. The model is not suddenly going to be better at positioning things. Good is enough is actually enough.
>>
>>102016850
>Good is enough is actually enough.
No.
>>
>>102016739
Characters dying on Final Fantasy 3 broke my heart and I quit the series.
>>
>>102016847
>Also this is the best or at least most interesting ai generated music I've heard so far
And the music generation has to do with the image somehow right? That's why I said this
>>
>>102016856
Actually it is, but keep letting your autism make you take 10x as long for the same result. But please, prove me wrong. I want you to do 100 autogenerated captions with a high quality caption tool and then do 100 hand crafted perfect captions, train two models, same settings, prove the difference (you can't).
>>
>>102016751
The best generations I've seen of Flux have been blurry, the crispier the faker.
>>
>>102016752
What do you mean an actual cunny model? Like Lehina Model?
>>
>>102016873
i can it's intuitive, any fuckfreak could do it
>>
>>102016873
think I care about your requests?
>>
>>102016887
it's not intuitive because otherwise you'd know your captions were competing against a million captions that said left is right

>>102016889
of course you don't care, you have autism so you'll waste time because you have a mental disorder
>>
>>102016899
who shit in your cereal
>>
>>102016906
>make stupid ass claim that will waste other people's claim
>get mad when confronted
>>
>>102016899
>>102016920
>>102016887
don't get it twisted I use Gigacaptioner, 11/10 if gets it right, that way I can set it and forget it, you lone graboid
>>
>>102016920
>make stupid ass claim
nigga I just pointed out a funny mistake by the model
>>
ayo Niggas anyone bakin?
>>
File: Fluxs.png (764 KB, 1280x720)
764 KB
764 KB PNG
>>102016837
Damn, I wish I could tell you I won't because I'm very busy doing something else, but, here it is, a perfect reproduction of your initial picture thanks the flawless description provided.
>>
baking, hold on 66% there
>>
>>102016940
yamero
>>
>>102016826
I don't care about how good it does a random character. A local base model that knows how to make every famous character that is protected by IP laws and doesn't know the name of famous painters whose works are public domain so it could imitate their style, would be quite sad.
>>
>>102016940
>but if I wrote the caption perfectly I'm sure the result would be different, it'll be worth the 5 minutes per image
>of course we'll ignore that in the process of captioning 100 images for 500 minutes I'll make lots of mistakes or omit details the auto-caption would have
>>
File: 1715566822213630.png (495 KB, 512x768)
495 KB
495 KB PNG
is it just me or is it impossible to generate a proper image with "death to israel" on it using flux?
you can generate images with death to any other country but israel tho.
conspiracy???
>>
>>102016108
Use gemini 1.5 pro. It is free
>>
>>102016994
Gemini makes stupid mistakes too.
>>
>>102016975
>I'll make lots of mistakes or omit details
no because I'm not a blind ESL
>>
>>102016975
It's the schell version, it took 2 seconds to generate.
>>
>>102016227
then what is it?
>>
>>102017007
missed a comma
>>
>>102016994
Where?
>>
>>102017023
Gen 3 or Kling
>>
>>102016994
google is an irrelevant company
>>
>>102017033
Geocities
>>
>>102017043
Do those both gen sound with it
>>
>>102017033
google aistudio
>>102017005
every vllm makes mistakes
>>
File: 1704778394833557.png (547 KB, 512x768)
547 KB
547 KB PNG
>>102016986
somehow if you lower the steps to 1 the AI gets it correctly but the more steps you add the more it wraps the text.
what is going on here??
>>
>>102016508
why would i want to gen that?
>>
>>102017110
curious...
>>
>>102017076
>Signs in to use it
>Remembers the other anon never uploaded the original picture so I can't send it.
Damn.
>>
>>102017156
Those pics sell for $50 at Alibaba.
>>
time for bed
>>
File: Rememberthese.png (661 KB, 768x768)
661 KB
661 KB PNG
>>102016938
From the 4 threads I created for /sdg/ 3 of them were nuked, I'm not risking a repeat for /ldg/
>>
I can redeem the bake
>>
>>102017300
>>
>>102016825
this is good enough for flux because it doesnt even know the difference between left and right
>>
>>102016751
>>102016707
>flux good
This is exciting. I switch between lmg and sdg and obviously I'm out of the loop, still running Pony XL
How's flux for landsdcape paintings and art and shit? Not just coomer trash or memes. Though those are important too
>>
>image thread
>not reach image limit
ancestors cry



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.