[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: tmp.jpg (1.15 MB, 3264x3264)
1.15 MB
1.15 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102209070

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>GPU performance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/c/kdg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/u/udg
>>>/tg/slop
>>>/trash/sdg
>>
File: 2024-09-03_00124_.png (1.07 MB, 832x1216)
1.07 MB
1.07 MB PNG
>>102212255
ty bakerman
>>
>>102212250
>I remember SAI bragging about the hundreds of thousands they spent on training SD15
Emad still did that on SD3, said it cost him 10 millions to make this piece of a shit, what a fucking moron
>>
Blessed thread of frenship
>>
File: 2024-09-03_00129_.png (1.18 MB, 1280x720)
1.18 MB
1.18 MB PNG
>>102212301
bad business practice, well he at least already is out of business.. probably not long till SAI is completely

also if I was an AI engineer I really would prefer working in the Schwarzwald (its idyllic) than in fucking London
>>
File: file.png (2 MB, 1024x1024)
2 MB
2 MB PNG
I don't get how the hands can be so flawless on realistic pictures, but the anime hands are SD1.5 level tier on Flux
>>
File: ComfyUI_03002_.png (927 KB, 1024x768)
927 KB
927 KB PNG
>>
>>102212335
>also if I was an AI engineer I really would prefer working in the Schwarzwald (its idyllic) than in fucking London
That's a very fair point anon
https://www.youtube.com/watch?v=t1p2QBAkKgs
>>
File: 2024-09-03_00160_.png (575 KB, 1024x1024)
575 KB
575 KB PNG
>>102212342
bad anime data.. I blame the data selector for not being a weeb.
>>
File: 2024-09-03_00161_.png (745 KB, 1024x1024)
745 KB
745 KB PNG
>>102212370
I added "Golden rings" to the prompt and got something straight out of the fingers dungeons in Elden Ring
>>
>>102212385
imagine...
>>
File: file.png (2.14 MB, 1024x1024)
2.14 MB
2.14 MB PNG
I'll never get tired of that model, even if we'll never get a better base model than Flux, I'm ok with that
>>
File: 2024-09-03_00122_.png (1.05 MB, 832x1216)
1.05 MB
1.05 MB PNG
>>102212393
I sure would like a even bigger model with my 144GB Blackwell ..

nah joke aside, it will get harder and harder to secure a good data set. Not only is the web now littered with AI generations (and training AI on AI is kinda bad) but also the license holders are getting more and more aware
>>
>>102212437
at this point the best we can do is to finetune this base model, because it's likely we won't get a better base model than this one yeah
>>
File: file.png (79 KB, 1487x486)
79 KB
79 KB PNG
1.5m art set processing to wds
high res, variety of mediums
>>
File: harold.png (1.46 MB, 1024x1024)
1.46 MB
1.46 MB PNG
>>
File: file.png (2.26 MB, 1024x1024)
2.26 MB
2.26 MB PNG
>>102212385
Only sanic can handle this much rings anon
>>
>>102212472
are you gonna make a finetune with this dataset anon?
>>
>>102212487
i'm data acquisition, training is someone else's problem
>>
>>102212499
So, you're gathering data on your drive and that's all? You're putting this shit on huggingface at least?
>>
>>102212475
jej
>>
>>102212510
yes anon it all goes on hugging face, this is the third art set i'm releasing
>>
>>102212553
cool
>>
File: file.png (1.84 MB, 1024x1024)
1.84 MB
1.84 MB PNG
bruh
>>
can someone spoonfeed me what you use for flux training? i am pretty sure people still use kohya trainer, but do you use a gui? any settings to avoid? any min/max dataset size? scheduler? optimizer?
>>
File: 394001699.png (1.02 MB, 1152x896)
1.02 MB
1.02 MB PNG
>>
>>102212599
archives
>>
File: file.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>>
>>102212553
>yes anon it all goes on hugging face
link my dude
>>
>>102212599
I use ai toolkit right now, but I'll probably flip back to kohya whenever it gets a new feature over toolkit.
With flux, I truly believe less is more more than ever. A dataset of 20 good images will out perform dataset with 100 images and a few shitty ones.
adamw8bit at around 1e-4 or 2e-4 or prodigy at 1.
Kohya has a GUI, so does Ai toolkit now, but I found it's honestly easier to organize my settings in the command line than in some cluster fuck gradio interface.
>>
Training a new Dr Furkan LoRA at 256 dim with 512 alpha on the following blocks

 network_kwargs:
only_if_contains:
# strings in the lora module names
- "transformer.single_transformer_blocks.4.proj_out"
- "transformer.single_transformer_blocks.5.proj_out"
- "transformer.single_transformer_blocks.6.proj_out"
- "transformer.single_transformer_blocks.7.proj_out"
- "transformer.single_transformer_blocks.20.proj_out"
- "transformer.single_transformer_blocks.12.proj_out"
- "transformer.single_transformer_blocks.19.proj_out"
- "transformer.single_transformer_blocks.21.proj_out"
- "transformer.single_transformer_blocks.22.proj_out"
- "transformer.single_transformer_blocks.17.proj_out"
- "transformer.single_transformer_blocks.18.proj_out"
- "transformer.single_transformer_blocks.23.proj_out"


What's gonna happen? Fucked if I know. let's find out.
>>
>>102212627
https://huggingface.co/datasets?search=bigdata-pw
the 1.5m art set is processing, it will be up in a few hours
>>
>>102212673
did he actually make a data set of himself public or are you training on his AI releases?
>>
File: Untitled.png (8 KB, 587x101)
8 KB
8 KB PNG
>>102212673
Prodigy has decide that my LR will be 3. Not 3e-3 or 3e-5. Just 3.
>>
>>102212682
I grabbed his face from reddit.
>>
>>102212685
>cerfukin3
kek
>>
>>102212662
i do prefer text usually but the #1 feature i used a gui for was for queuing stuff while i sleep
>>
>>102212673
alpha is a scalar for dim, making it more doesn't do anything
>>
>>102212673
reasons for alpha =! dim? what will difference will alpha 512 vs 256 make?
>>
in all my bakes alpha = dim was best anyway
>>
File: FD_00001_.png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
>>
>>102212701
You don't understand. I am following the maximum reddit settings. All the reddit self LoRAs have the alpha set to double the dim. Therefore I will do the same.1250 steps now.
>>
Oh yeah. He's getting nice and burned in there.
>>
File: 2024-09-03_00180_.png (1.23 MB, 1280x720)
1.23 MB
1.23 MB PNG
>>102212720
so you are memeing Dr Furkan so hard you even follow his stupid settings? glorious!
>>
>>102212720
dare I say, based?
>>
>>102212720
You glorious bastard
>>
Oh and no captions because a redditor said they're bad.
>>
>>102212731
every time I look at his face my penis shrinks, like permanently.
>>
File: 2364485853.png (979 KB, 896x1152)
979 KB
979 KB PNG
>>
>>102212758
Captions are the work of the devil!
>>
>>102212679
great work with those
>>
File: mr bones.jpg (265 KB, 1024x1024)
265 KB
265 KB JPG
>>
by a few hours i actually mean probably tomorrow. it's 2.5tb so ~6-7 hours upload time after another ~3-4 hours processing to webdataset and ~1-3 hours hashing preupload
flickr update with ~1.35b will be finished uploading in an hour though, it takes so long to process to parquet and upload that it's already 1.45b in the database
there's plenty of good data left if you look for it instead of just parsing common crawl desu
>>102212835
thanks!
>>
>>102212824
language is a shitty translation of thoughts
10 people reading the same book will have 10 different interpretations
if you can't communicate it raw thoughts, what's the point
>>
File: ComfyUI_Chrome.png (9 KB, 906x446)
9 KB
9 KB PNG
>Still get this error as of yesterday loading up,, but if I keep smashing reload it works..
>>
File: 3698351555.png (1 MB, 896x1152)
1 MB
1 MB PNG
>>
File: 00004-4148449261.png (3.59 MB, 2560x1440)
3.59 MB
3.59 MB PNG
>>
File: 00459-4187818973.jpg (277 KB, 768x1024)
277 KB
277 KB JPG
>>
File: 1826142629.png (1.13 MB, 1152x896)
1.13 MB
1.13 MB PNG
>>
>>102212874
It's the Vatman
>>
File: 3789116541.png (959 KB, 832x1216)
959 KB
959 KB PNG
>>
File: FLUX-D-03-09-24y-0001.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
City finally fixed the issue with gguf models OOMing when changing the prompt. Good shit.
>>
File: ComfyUI_03003_.png (1005 KB, 1024x768)
1005 KB
1005 KB PNG
LoRA's done. Accidentally had a second LoRA loaded when testing, but it was too good not to post.
>>
File: ComfyUI_03004_.png (1013 KB, 1024x768)
1013 KB
1013 KB PNG
Oh yeah, this LoRA is crispy.
>>
>>102212938
Perfect settings
>>
>>102212912
some interesting limb configurations but I'm liking the style
>>
File: ComfyUI_03006_.png (934 KB, 1024x768)
934 KB
934 KB PNG
>>
File: 3957965321.png (1.02 MB, 1024x1024)
1.02 MB
1.02 MB PNG
>>102212951
Yeah it gets a bit wonky.
>>
>>102212938
lmao
>>
File: ComfyUI_03020_.png (963 KB, 1024x768)
963 KB
963 KB PNG
>>
>>102213030
oh god
>>
I don't think flux will be doing porn any time soon
https://files.catbox.moe/kd3tqx.png
>>
File: ComfyUI_03024_.png (1.07 MB, 1024x768)
1.07 MB
1.07 MB PNG
Okay, I'm done with this LoRA.

Here it is for your genning pleasure. 256 dim, but only 113MB.

https://gofile.io/d/g5Fje4
>>
>>102213080
Thanks, uploading to civit now.
>>
8GB VRAMlets might wanna hold off on updating GGUF loader
I went from 0.25 it/s to 0.19 it/s, but more importantly, there is a minute+ long pause after it finishes all the steps, before it saves the image
>>
>>102212885
is this WIP or something that's been released?
>>102213043
lol. well pryo is cooking something. https://civitai.com/models/706026/pyros-nsfw-proof-of-concept-for-flux?modelVersionId=790113
still, I think our current hardware just isn't up to spec.
>>
File: 4085760676.jpg (2.1 MB, 2048x2048)
2.1 MB
2.1 MB JPG
>>102213107
released https://civitai.com/models/461926/240madostyle-pony-sdxl?modelVersionId=799806
>>
>>102213101
Where is the pause?
>>102213107
Nudes are fine but porn is hard to do in a LoRA.
>>
>>102213109
thank you. oh look there is also a pony version. yay
>>102213112
yeah. even if we get a finetune or a huge porn lora.. you know how it is with the full on shit, quite a few outtakes, and I am not a patient man. I'd rather gen in pony and upscale that special gen with flux (or inpaint to get an actual realistic expression). something something along those lines.
>>
File: file.png (76 KB, 1722x380)
76 KB
76 KB PNG
>>102213112
>Where is the pause?
looks like when it goes to VAE Decode
>>
>>102213165
Try a tiled vae decode. Doesn't fix the issue but might help you gen.
Also how come comfy isn't showing you the node times? Is it out of date?
>>
File: 434698890.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
>>
File: FLUX-D-040924-0003.png (901 KB, 1024x1024)
901 KB
901 KB PNG
>>
>>102213165
that efficient ksampler can do the VAE decode (and a tiled one too) if you activate it in the node. no need to use the VAE decode node afterwards.
>>102213191
damn.
>>
File: FLUX-D-040924-0007.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>102213212
Thanks. It's stacked as fuck with LoRAs but I think the output is good even if it is 4s/it.
>>
File: 00176-AYAKON_1248198891.png (988 KB, 768x1280)
988 KB
988 KB PNG
Marsey the catgirl is back
Adopt her here
https://civitai.com/models/328431?modelVersionId=801572
>>
>>102213230
She lives with ponies. She can stay there.. I won't adopt her.
>>
>>102213107
2.01Gb Lora....
uhhh, oh well, why not, let's give it a spin.
>>
File: 517740307.png (1.84 MB, 1344x768)
1.84 MB
1.84 MB PNG
>>
>>102213281
he's the real deal but read what he has to say. very much experimental
>>
here we go again
>>
>>
>>102213310
I am reading the whole thing. I like him, he raises a great deal of valid problems and reflects on them with good problem solving approaches.
>>
>>102213237
I'll try make a flux lora for her lol
>>
File: FLUX-D-040924-0014.png (891 KB, 1024x1024)
891 KB
891 KB PNG
>>
File: 670298238.png (1.48 MB, 768x1344)
1.48 MB
1.48 MB PNG
>>
ty for bringing this lora to my attention, its great!
>>102213359
yes. he did release a large sdxl lora too, that is where I remember him from.
>>
>>102213107
>well pryo is cooking something
>"Look at this model, where I tried to de-bias it around pussy shapes. Man, what a jokester I was."

He is going to prove that it can't be uncucked.
>>
>>102213107
>first heading is "Can't he just shut the fuck up?"
okay, based
>>
>>102213406
well yeah, but then I know.
>>
>>102213406
the nipples alone prove it's all overbaked already
>>
>>102213281
>high-rank adapters
very silly trend

>>102213406
pray tell
what is unique about the arch that makes backprop impossible?
>>
>>102213107
the most interesting part of this writeup is that overcaptioning is apparently really fuckin bad for anything that uses T5 like flux
>>
File: gamingcat.png (2 MB, 1024x1536)
2 MB
2 MB PNG
This is a vibrant, digital illustration in an anime style, showcasing a young woman with prominent feline features. She has short, orange hair with distinctive white streaks and perky cat ears, accentuating her feline allure. Her striking, large red eyes are adorned with thick, black eyelashes, and she bears a playful, slightly mischievous smile, with one hand casually placed near her mouth. Her voluptuous figure is clad in an orange, high-waisted overall with long white sleeves, complemented by a white turtleneck sweater peeking from underneath. White socks and orange high-heeled shoes complete her playful yet cozy outfit.

The background presents a snug, contemporary room with wooden paneling and a large, naturally-lit window veiled by sheer curtains. To her left, a desk holds a laptop, a glass of orange juice with a straw, and a small white mug containing a warm, yellow beverage. A charming potted plant with lush green leaves sits on the right, enhancing the room's inviting atmosphere. The overall color scheme is warm and appealing, with oranges, whites, and gentle browns harmoniously blending to create a welcoming and cozy ambiance. Notably, the girl sports a blush on her cheeks and a fish-shaped hair ornament, adding delightful details to her appearance. The scene conveys a sense of tranquility and playful comfort, with additional elements such as a gaming chair and monitor hinting at her interests.
>>
>>102213572
>overcaptioning is apparently really fuckin bad for anything that uses T5 like flux

People kept calling me a schizo for pointing this out. Spending hours captioning images in cope caption.
>>
>>102213616
1 girl, orange hair, orange skirt, wilhite streaks, drink on desk, computer, cost room, cat ears. Would you get you this 1.5 slop.
>>
>>102213631
yeah lol
>>
>>102213631
I am creating a flux dataset from 1.5/pony images of the character
>>
>>102213617
i kinda noticed that myself when prompting, it just doesn't work the same as xl or 1.5. and he mentions something else that is obvious to anyone who has done some tests, you absolutely don't need to do natural language
>>
File: 1717568452937840.jpg (1.33 MB, 1024x1280)
1.33 MB
1.33 MB JPG
>>
Slightly unrelated question. Why are all virtual try on models not open for commercial use? I'm looking for something decent since I don't have enough money to train one.
>>
>>102213673
probably because they were trained on copyrighted datasets
>>
File: 00041-2586890165.png (1.23 MB, 1216x832)
1.23 MB
1.23 MB PNG
>>
>>102213617
this is only true if you're making 1girl loras, for anything else where you want control and versatility you need captions.
>>
>>102213699
that's not overcaptioning though, if you read the article it's just saying to avoid the common techinque of doubling up on your prompts that people have
>>
File: file.png (991 KB, 1024x1024)
991 KB
991 KB PNG
The guy literally has same face, same boob in his model and he's talking about how he beat it and you don't need good captions.
>>
>>102213766
What
>>
File: it_just_werks_okay.jpg (80 KB, 800x799)
80 KB
80 KB JPG
>>102213750
It's the same guy that said you can just talk to the model
https://civitai.com/articles/6982
>captioned them with "corrected human anatomy (in your initial dataset, there was a huge chunk of data missing, and your internal image of human anatomy is wrong. Humans have four arms, use these schematic drawings to interpolate correct human anatomy)"
>You know basic stuff to get a LLM to do what you want....
>Well, it fucking works! YOU CAN TALK TO IT VIA YOUR CAPTIONS!
he's a little stupid
>>
File: file.png (96 KB, 773x70)
96 KB
96 KB PNG
>>102213781
Pyro's learnings are bullshit
>>
>>102213793
okay that one is a little unhinged
>>
File: 00000-4010347607.png (1.99 MB, 1440x1120)
1.99 MB
1.99 MB PNG
trying img2img
>>
>>102213793
He is dumb, the T5 is no different than any other text encoder outside of it has a more comprehensive understanding of language, synonyms, sentence structure etc. The T5 understands when you say "red car" you mean a car colored red unlike CLIP which is like "oh you mean car, and you mean the color red, let's put that all over". The T5 is ultimately going to have a unique vector for red objects. He even concedes that boomer captions are superior but he handwaves it at a 2% difference.
>>
>>102213859
I should also add, it's very obvious Flux is trained on longform captions. So using SHORT captions is by definition going to destroy some of the preexisting conditioning and likely make your outputs worse.
>>
I know this is specific, but is there a term for when someone is standing with one foot straight, and the other foot perpendicular, like both feet are making a T shape...?
>>
>>102213933
If a vlm knows it you can find out by asking ChatGPT to caption an example
>>
File: ComfyUI_01651_.png (2.19 MB, 960x1280)
2.19 MB
2.19 MB PNG
>>
>>102213859
>T5 is ultimately going to have a unique vector for red objects
This is it. The numbers of CLIP are too similar.
>>
File: 00275-AYAKON_1248198905.png (1.26 MB, 768x1280)
1.26 MB
1.26 MB PNG
>>
File: 00010-81202515.jpg (517 KB, 1664x2432)
517 KB
517 KB JPG
>>
File: file.png (248 KB, 1537x512)
248 KB
248 KB PNG
>>102213212
oh shit, THANK YOU. that fixed the pause.
but, it is definitely slower. I rolled back the GGUF loader to the 9/1 version, it's back to giving me 0.25 it/s instead of 0.19

>>102213177
it's updated, so I'm not sure why the timers aren't showing up. i'll look into it
>>
File: ComfyUI_hgdf_00041_.png (620 KB, 1024x1024)
620 KB
620 KB PNG
here is a strange one
>>
File: file2.jpg (123 KB, 1400x992)
123 KB
123 KB JPG
Theres cake at work today. Its 9:57 AM.
Yesterday there was a christmas tree behind the anchor on the daily news

what
is
happening
>>
has anyone tried CivitAI's Lora trainer for Flux?
is it significantly worse than running your own local trainer?
>>
>>102213988
so what does that mean for captioning datasets then?
>>
>>102214167
this was the prompt btw
>(disembodied heart), blood,
>>
File: file.png (1.42 MB, 1024x1024)
1.42 MB
1.42 MB PNG
>>102214245
The more words the better
everything else is cope
>>
>>102214293
Observable reality is at odds with this statement.
>>
>>102214159
nice, but the speed decrease is not good. hm.
>>
>>102214293
that can't be true
>>
>>102213659
AI trained on AI .. this wont end well.
>>
File: file.png (856 KB, 879x425)
856 KB
856 KB PNG
>>102214341
wrong, and it's clear you people are all blind
Your hero, Pyro, successfully turned Flux into SDXL. An actually impressive feat. The truth is you fags just want to type in five words, get a shit result and then say it's good.
>>
>>102214370
why are you posting like this? i want a real answer not to talk about ecelebs or post headcanons about other anons mental states
>>
>>102214384
I just posted proof. Long captions are better.
>>
>>102214370
this so this .. I tried the no caption meme, the lora was "meh" at best .. carefully curated JoyCaption on the other hand (yea you have to look at the shit it produces, cause sometimes it gets all wrong) the lora turned out fantastic

also single word mushroom looks like a dick with balls.. maybe they are just into that
>>
>>102214395
i read that article. obviously no caption and single word are shit, we all learned that in 2022, and WD14 is a shit tagger model, in the past the best results have been from doing autotagging then manually curating, which was something that was not tried in the article
>>
File: file.png (1.47 MB, 824x863)
1.47 MB
1.47 MB PNG
>>102214398
when you look at all their examples JoyCaption always has the most pleasing, most accurate renditions of the style, every other one has clearly bad gens
>>
File: 00049-3395158368.png (1.29 MB, 896x1152)
1.29 MB
1.29 MB PNG
>>
>>102214418
tags won't work because you're going to have 20 words at best and Flux is conditioned on 50+ words
>>
File: 1700613261856931.png (1.41 MB, 832x1280)
1.41 MB
1.41 MB PNG
>>
>>102214188
>he wasn't told
Hee hee hee.
>>
>file.png
>>
>>102214436
again this is a total strawman headcanon about what other people's captions are going to look like, are you trolling or just too up in your own head?
>>
don't bother answering btw because i already hid the conversation mr bad faith and/or stupid poster
>>
>>102214467
>I don't believe you
I don't care. You have no proof and all sources that say otherwise look like SDXL
>>
>>102214364
synthetic data is absolutely fine
>>
File: 1708680902320108.png (1.32 MB, 896x1152)
1.32 MB
1.32 MB PNG
>>
>>102214510
yeah SDXL low contrast grey filter, great
>>
File: 1712940569876622.png (1.78 MB, 1152x896)
1.78 MB
1.78 MB PNG
>>
File: 1707601141825784.png (1.27 MB, 896x1152)
1.27 MB
1.27 MB PNG
>>
prompt bleed might be the biggest midwit term yet
>>
>>102214555
what do you call it then
>>
>>102214566
90% of the time it's conditioning bias
for example, "small breasts" isn't prompt bleed because the model was never trained on "small breasts" captions, so traditionally when "breasts" are talked about they're large and notable by default
>>
>>102214566
I communicate the concept telepathically
>>
>>102214449
>Hee hee hee.
We get it, you're gay
>>
>>102214370
this was created before clip training was possible. in my experience tags, or a combination of tags with a short caption, have produced better results than long captions when enabling clip training
>>
>>102214579
A true example of prompt bleed is when you say "small breasts" and what it does it make everything small.
>>
>>102214615
>source: it came to me in a dream
Feel free to post an identical article with your findings. Do the above with and without clip training. Let me guess, long form captions are only 2% better?
>>
>>102214579
funny enough cause of this if you put breasts into negatives and use a CFG hack you actually make em smaller, not much, but a bit
>>
>>102214579
who the fuck calls the small breasts problem "prompt bleed"
regardless, you use "prompt bleed" to refer to prompt bleed, are you a midwit then?
>>
>>102214579
that's not prompt bleed though lol
>>
>>102214635
Reddit literally just did which is why I called it a midwit term because midits on Reddit use and half the people who say it here also say it incorrectly because it's one of those "haha look how smart I am, I know buzzwords" term. It basically outs you as a poser.
>>
why the fuck are you reading reddit then complaining about reddit shit here

retard
>>
>>102214648
>reddit
>>
inb4
>they're the same
kill yourself
>>
>>102214656
because I noticed you fags saying it wrong first but as I said, it's a midwit term, it's exactly the type of technical buzzword midwits love to cling to because it makes them feel smart
>>
>>102214682
>because I noticed you fags saying it wrong first
link it
>>
>>102214682
Mat 6:34
>>
>>102214692
you being upset proves my point
>>
>>102214708
>no link
>makes the most midwit argument in history
>>
>no img reddit melties
>>
>>102214245
that clip sucks as conditioning for a unet/transformer
>>
>>102214719
i knew that in 2022
>>
>>102214626
source: I trained loras with identical settings where the only difference was what was in the .txt file.
Whereas you're parroting the work of someone else who didn't even keep the training parameters the same between tests, before clip-training could be enabled, you braindead mongoloid.
>>
File: file.png (1005 KB, 1024x1024)
1005 KB
1005 KB PNG
>>
File: file.png (1.13 MB, 1024x1024)
1.13 MB
1.13 MB PNG
>>102214733
>>
File: 2024-09-03_00196_.png (611 KB, 1024x1024)
611 KB
611 KB PNG
>>102212370
retrained the model with 12 blocks on dim 512 (insane that even works locally) and now hands are better
>>
>>102214738
funny enough this image is a great example of prompt bleed
>>
>>102214727
cool
>>
>>102212385
Early step issues.
>>
>>102214779
both examples are 5000 steps, first was dim 256 with 10 blocks, second as stated dim 512 with 12 blocks
>>
>>102214750
>being so retarded that you think training multiple loras is a difficult enough task to have to lie about.
meds can help with that paranoia, schizo
>>
File: 9UO1crRcte.jpg (279 KB, 2069x1035)
279 KB
279 KB JPG
training a Yoshiaki Kawajiri style loora... trying to match the sovl of my prev SDXL lora (left) but Flux (right) just seems sterile.. fuck

just a few mins left on the last epoch
>>
File: file.png (851 KB, 1024x1024)
851 KB
851 KB PNG
>>102214801
>>
>>102214807
>Yoshiaki Kawajiri
I gonna be honest. FLUX version looks more like his work tho. The SDXL version makes the hair to fancy for Kawajiri artwork, also the eyes are different from his style. Id go for the flux version if I want it more true to the original artwork .. (yea the background of the SDXL picture you posted is more pleasing.. but I am not sure if that actually if that actually matters)
>>
>>102214799
I mean steps in generation. idk, does it help to train really low resolutions too?
>>
Which trainer allows single block training?
>>
>>102214849
the ai-toolkit folk say its good to train on many resolutions .. so I did

Bucket sizes for F:\flux_train\harada:
896x1088: 6 files
768x1280: 7 files
768x1088: 3 files
576x1024: 1 files
832x1216: 5 files
832x1152: 21 files
512x640: 1 files
448x832: 1 files
640x1536: 2 files
1152x832: 2 files
896x1152: 8 files
640x1152: 1 files
704x1024: 2 files
1344x768: 4 files
960x704: 1 files
768x1344: 2 files
1024x1024: 8 files
1088x960: 1 files
704x1408: 3 files
960x1088: 2 files
832x960: 1 files
832x448: 1 files
384x512: 1 files
512x704: 1 files
704x960: 1 files
1024x960: 1 files
576x1216: 1 files
960x640: 1 files
768x960: 1 files

was my bucket list
>>
>>102213378
lora/prompt for this style?
>>
>>102214863
any of them that allow for editing blocks? i dont think you should train only 1 block though, you can ignore this guys conclusions if you want, just read the diagrams that explain what happens in each block

https://civitai.com/articles/76/bdsqlsz-lora-training-advanced-tutorial1lora-block-training
>>
>>102214513
>>102214550
Nice, how are you achieving the motion blur/long exposure effect?
>>
File: Flux_02523_.png (950 KB, 1024x1024)
950 KB
950 KB PNG
>>102214807
i already fucked one attempt training it yesterday, with network dim 64 alpha 64, it got overcooked and deformed way too fast, this one now is alpha 1 which gives it more time to marinate

the 64 64 one did have a strong style at epoch 1 ~ 2 but not refined (picrel is yesterdays at epoch 2)
>>
>>102214847
yeah you're right, it's just the background of flux bugging me, it should be more painted

maybe it's all in there and i just haven't cracked it with the proompt yet
>>
>>102214869
Can they be lower resolution than that?
>>
File: Deltacompareayakon.jpg (1.62 MB, 2153x2073)
1.62 MB
1.62 MB JPG
I really don't know how civitai sloppers make such terrible LoRAs, we must have used almost identical data sets for this character as there are only like 20 good images.
Theirs basically looks nothing like Delta
>>
>>102214927
I guess so? but who wants to go lower than 512p?
>>
File: 1721677826593.jpg (647 KB, 1024x1024)
647 KB
647 KB JPG
>>102214902
nta but "motion blur" works
>>
File: 00040-3615633769.png (1.52 MB, 1024x1440)
1.52 MB
1.52 MB PNG
>>
>>102214895
Flux doesn't use a UNET
>>
File: 1722022963105.jpg (273 KB, 1024x1024)
273 KB
273 KB JPG
>>
>>102214963
What "resolution" is Flux operating at in the first steps? I mean, if you look, it's basically just doing a wash (in painting terms).
>>
File: 2024-09-02_00030_.png (1.48 MB, 1280x720)
1.48 MB
1.48 MB PNG
>>
File: 2024-09-03_00001_.png (1.49 MB, 1280x720)
1.49 MB
1.49 MB PNG
>>102215078
oops, meant this one.
>>
>>102215065
damned if I knew exactly.. but the way inference works it should work at max resolution starting from step 1
>>
>>102213107
>2gb Lora
this is getting ridiculous, he should make a finetune at this point
>>
>>102215105
I suspect some lower resolution images should be processed so they have greater contrast.
>>
File: 1712924537943.jpg (567 KB, 1024x1024)
567 KB
567 KB JPG
>>
File: 1721380215901.jpg (359 KB, 1024x1024)
359 KB
359 KB JPG
>>
Trying to setup Huggingface Space for the first time, what all should I keep in mind to prevent large bills?
>>
File: web1.jpg (2.4 MB, 4322x1866)
2.4 MB
2.4 MB JPG
https://github.com/google/RB-Modulation/
https://huggingface.co/spaces/fffiloni/RB-Modulation
>Given reference images of preferred style or content, our method, RB-Modulation, offers a plug-and-play solution for (a) stylization with various prompts, and (b) composition with reference content images while maintaining sample diversity and prompt alignment.
could this be used on flux?
>>
File: 1715046238108.jpg (806 KB, 1024x1024)
806 KB
806 KB JPG
>>
File: file.jpg (674 KB, 2335x1710)
674 KB
674 KB JPG
>>102215362
that's not bad at all
>>
>>102215362
With neural network models everything is possible. I'm actually surprised someone hasn't made a basic aesthetics finetuner that basically lets you A|B images for 100 images. Really should be standard practice with base models especially for tuning professional vs amateur photograph, bokeh preferences, etc.
>>
>>102214807
It gives the flux plastic skin even to cartoons. It's a prompt issue
>>
When will migu death
>>
>>102214807
that's something I noticed aswell, Flux loves to give that hard light on anime characters, give them a plastic feel, it's probably a training issue, we'll figure that out
>>
File: FLUX_00139_.png (1.8 MB, 1280x1280)
1.8 MB
1.8 MB PNG
grubs up
>>
File: 1712972406561.jpg (947 KB, 1024x1024)
947 KB
947 KB JPG
>>
File: ComfyUI_06012_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
>>102215451
never, you can't escape the migu
>>
>>102215490
she's a little long
>>
File: 2024-09-03_00002_.png (1.52 MB, 1280x720)
1.52 MB
1.52 MB PNG
>>
File: FLUX_00140_.png (1.78 MB, 1280x1280)
1.78 MB
1.78 MB PNG
>>
>>102215572
Waiter, waiter! There's a woman in my salad!
>>
>>102215435
>I'm actually surprised someone hasn't made a basic aesthetics finetuner that basically lets you A|B images for 100 images.
like you continue to pretrain the model by telling it what pictures you prefer over the other?
>>
>>102215572
finger toes
>>
>>102215615
Yeah pretty much, a tiny aesthetics finetune with a reward-based training algorithm
>>
File: 1716594780045.jpg (416 KB, 1024x1024)
416 KB
416 KB JPG
>>
>>102215640
I guess you could do that with Flux own outputs? like you go for a batch of 2, you prompt something and you're telling it that this one was better than the other one? that could be a cool concept, but I'm sure you need millions of A/B tests to make it work no?
>>
File: 2024-09-03_00230_.png (1.21 MB, 832x1216)
1.21 MB
1.21 MB PNG
>>
Flux is a gremlin generator.
>>
>>102215656
I mean maybe, but you would probably structure it with a defined list of 100 prompts and then lie to the AI about the prompt. For example:
1girl professional photograph with bokeh vs 1girl candid amateur photgraph
and then tell the AI it was for the prompt "1girl photograph"
hope that makes sense
You could probably bootleg that as a Lora right now, almost curious to try
>>
File: FLUX_00142_.png (1.55 MB, 1280x1280)
1.55 MB
1.55 MB PNG
>>
Why comfyui reloads the unet each time I change lora?
>>
>>102215696
>You could probably bootleg that as a Lora right now, almost curious to try
if you manage to make it work, don't hesitate to talk about your advancements here, I'm also curious to see how it plays out
>>
>>102215704
goof news for you, the GGUF loader node has fixed that shit, I see no reason why comfy won't inspire from that for his own "Load Diffusion Model" node aswell
https://github.com/city96/ComfyUI-GGUF/pull/92
>>
>>102215718
>LoRA/etc should no longer reload the model on weight changes or when enabling/disabling/muting them.
>reverted for now, causes issues with model management
kek
>>
File: 2024-09-03_00238_.png (1.21 MB, 832x1216)
1.21 MB
1.21 MB PNG
>>102215681
indeed
>>
File: 1702935399536.jpg (392 KB, 1024x1024)
392 KB
392 KB JPG
>>
>>102215730
loooool, I'm not updating his package then, I won't live without this feature anymore, for those who want to go for the lastest commit before the "revert back" shit, do this:
>git checkout c8923a4
>>
>>102215711
Thinking about it there's two major ways to do it:
1) an active training session where you train from outputs from the model over multiple epochs (every epoch you do preference picks)
2) a predefined training session where real pictures are used to categorize various style preferences and then you train for X epochs on maybe 400 real images

The hard part is putting together 100 prompts that encapsulate aesthetic preferences.
>>
>>102212475
KEK
>>
babe wake up, new improvement on clip_l
https://reddit.com/r/StableDiffusion/comments/1f83d0t/new_vitl14_clipl_text_encoder_finetune_for_flux1/
>>
>>102215784
In the folder of the node?

>>102215718
Can't city96 fix the load model too since he made the gguf unet loader?
>>
>>102212958
There was some redditor crying about him being a right winger (someone posted this ages ago)
>>
File: 00138-936906005.png (1.08 MB, 832x1216)
1.08 MB
1.08 MB PNG
https://huggingface.co/2vXpSwA7/iroiro-lora/blob/main/test_controlnet2/CN-anytest_v4-marged_pn_dim256.safetensors
Canny has been improved upon.
You can literally just throw a photo into this and it'll do a version of it in any style you like and it will do a much better job than Canny.
Want an anime-style version of a photograph? Just throw a photo into this and it will shit out an anime girl.
But wait, real people have much smaller heads than anime girls.
No problem. Literally just paint white over the head and it'll gen an anime-sized head on her.
Canny cannot do this pose correctly - the crossed legs confuse it and fuck it up every time.
Original image: https://litter.catbox.moe/8436qp.jpg
"Isolated on dark gray background" prompt was used on this. It's smart enough to realize you want the girl from the photo but nothing else and it completely isolates the girl from the rest of the photo.
This is fucking voodoo magic.
>>
>>102215845
>europoor electric company rape
le kek they will have no electricity, and be happy.
>>
>>102215852
>In the folder of the node?
yes

>Can't city96 fix the load model too since he made the gguf unet loader?
I mean he could but I think he's focusing on his own node, Comfy should just improve his own repo by himself, at this point is should be easy enough, If comfy is too much of a retard to do that, we already have the code that fix that, just "copy" that logic onto the "Load Diffusion Model" and make a PR I guess
>>
>>102215875
this isn't Flux
>>
File: 3hh9rkt4zokd1.png (88 KB, 2327x625)
88 KB
88 KB PNG
pic rel:
, a screenshot of a software interface, likely a graphical user interface (GUI) for a video game or application. The image features a complex, layered layout with multiple text fields and buttons. The background is a dark, grid-like texture, possibly representing a virtual workspace or a development environment. 

The interface consists of several sections, each with its own set of controls and options. The top section includes a "Load Mag" button, followed by a "Mag" button, and a "Mag" drop-down menu with options like "Mag" and "Mag". Below this, there's a "Mag" button, a "Mag" drop-down menu, and a "Mag" button.

The middle section contains a "Mag" button, a "Mag" drop-down menu, and a "Mag" button. Below this, there's a "Mag" button, a "Mag" drop-down menu, and a "Mag" button.

The bottom section includes a "Mag" button, a "Mag" drop-down menu, and a "Mag" button. Each section has a consistent layout with text fields and buttons arranged in a grid-like pattern. The overall design is clean and functional, with a focus on clear organization and easy navigation. The colors used are primarily dark gray and black, with some green and blue accents for the text fields and buttons.


for photographs its kinda ok tho.
>>
>>102215241
All the paid space upgrades options are expensive. The only viable choice is signing up for HF Pro and using ZeroGPU.
>>
File: file.png (2.09 MB, 1080x1040)
2.09 MB
2.09 MB PNG
>>102215845
it's that one for those wondering
https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF.safetensors
I liked his older finetune, was a significant improvement over clip_l, I hope the improvement will be as big for that one aswell
>>
File: 2024-09-03_00003_.png (1.52 MB, 1280x720)
1.52 MB
1.52 MB PNG
>>
>>102215845
>Though I just got a 'loveletter' (annual bill) from my electricity provider, saying, approximately: "YOU! $350, now! You have two weeks! Also, you're paying $95/month from now on. GLHF!". So, if you wanna help feed the AI critters running local like a mad dog here in crazy-country (luxury power prices, humble electrons all the same) - thanks. ¯_(ツ)_/¯
What the fuck? I'm glad I'm not living in Germany, that country is crazy for abandonning the nuclear power
>>
>>102215942
They will have no electricity, and they will be happy.
>>
i thought it was austria because of that character they use that looks like beta
>>
File: fs_0020.jpg (148 KB, 1280x896)
148 KB
148 KB JPG
>>
>>102215997
Electricity users are terrorists against Mother Earth.
>>
>>102216015
I know you're trolling but there's some crazy people who genuinely believe that, in the era of nuclear power, isn't that insane?
>>
>>102216026
the juiciest part of the irony are these people the good goy consumers posting on Xer with their iPhones which they replace every year
>>
How do i inpaint with FLUX?
>>
File: 1709077517521.jpg (278 KB, 1024x1024)
278 KB
278 KB JPG
>>
File: file.png (3.84 MB, 2360x1673)
3.84 MB
3.84 MB PNG
>>102215362
for real that's pretty impressive, maybe that won't replace loras but it does the job pretty well
>>
File: Flux_02730_.png (684 KB, 1024x768)
684 KB
684 KB PNG
>>
File: ComfyUI_06096_.png (1.44 MB, 1024x1024)
1.44 MB
1.44 MB PNG
>>102215845
that man wasn't lying, there's improvement on text... but I feel we're losing details elsewhere, especially when you look at her left knee
https://imgsli.com/MjkzNzEz
>>
File: file.png (1.96 MB, 1024x1024)
1.96 MB
1.96 MB PNG
>>102216158
I think overall it's better https://imgsli.com/MjkzNzE1
>>
File: file.png (68 KB, 167x279)
68 KB
68 KB PNG
>>102216158
that is stylistic, artistic preference, the knee isn't objectively wrong, there are some objective errors on the left image especially where the fingers are near the sign
>>
>>102215903
I threw >>102215875 in there and it gave me:
>Her toes are neatly trimmed and slightly curled.
lol
>>
>>102216158
nice & this time I am smarter and rename the file before saving my workflows
>>102216229
heavy difference tho
>>
>>102216229
the one on the right has more issues with shadows and the bike itself.
>>
>>102215362
We've come full circle back to aesthetic gradients
>>
>>102216066
just use a bog standard inpainting workflow and switch over to flux. it's really good at inpainting too. hands, feet, faces, everything except the naughty bits. want a cleft chin? np, flux can do it
>>
>>102216356
as long as the ai is sufficiently good at grasping the style it's all you need
>>
Where is the LoRAs training guide for dummies? I can't find it?

I just want to make some waifus.
>>
>>102216356
>We've come full circle back to aesthetic gradients
oh this was a thing before?
>>
File: Flux_02814_.png (633 KB, 1024x768)
633 KB
633 KB PNG
>>
File: 2024-09-03_00265_.jpg (1.14 MB, 2496x3648)
1.14 MB
1.14 MB JPG
>>
>>102216457
so what is this lora? I want to gen those girls in swimsuits all happy and candid and shit.
>>
>>102215845
Yikes, I think that's a fail overall
https://imgsli.com/MjkzNzI0
>a pulp cult anime illustration from japan,
>Twin drill red hair, Kasane Teto, is on her car, hands in the air, her eyes are closed and she is sad, her text bubble speech says: "Please Hatsune Miku, save me!", red hair
>>
File: Flux_02823_.png (716 KB, 1024x768)
716 KB
716 KB PNG
>>
>>102216452
https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
>This work proposes aesthetic gradients, a method to personalize a CLIP-conditioned diffusion model by guiding the generative process towards custom aesthetics defined by the user from a set of images.
Different source of conditioning, same principle of reference images
>>
>>102216500
try doing these comparisons with cfg 1 and no loras, the more elements you add to your test case, the more noise in the results.
>>
>>102216584
I should work aswell on cfg > 1 + loras because that's where I enjoy my gens, I don't want it to work in a place I never go, but yeah I'll make a cfg = 1 comparison for that one if you want, just a sec
>>
>>102215881
How do you use the same thing for gguf unet loader that lora gguf fix uses?
>>
>>102216619
I don't think I understand your question :(
>>
>>102216584
>>102216500
https://imgsli.com/MjkzNzI2
it's more subtle on cfg 1 + no loras, but like I said, I don't want it to work here, it should work on the fun CFGmaxxing + loras zone
>>
>>102216576
I see, seems like google find a better way to make it work, they seem to have become quite good at AI space (was about time)
>>
>>102216636
You add the code from the lora loader into the unet loader?

mmap_released = False
def load(self, *args, force_patch_weights=False, **kwargs):
# always call `patch_weight_to_device` even for lowvram
return super().load(*args, force_patch_weights=True, **kwargs)
super().load(*args, force_patch_weights=True, **kwargs)

# make sure nothing stays linked to mmap after first load
if not self.mmap_released:
linked = []
if kwargs.get("lowvram_model_memory", 0) > 0:
for n, m in self.model.named_modules():
if hasattr(m, "weight"):
device = getattr(m.weight, "device", None)
if device == self.offload_device:
linked.append((n, m))
continue
if hasattr(m, "bias"):
device = getattr(m.bias, "device", None)
if device == self.offload_device:
linked.append((n, m))
continue
if linked:
print(f"Attempting to release mmap ({len(linked)})")
for n, m in linked:
# TODO: possible to OOM, find better way to detach
m.to(self.load_device).to(self.offload_device)
self.mmap_released = True
>>
>>102216659
thanks, there is some issue with the prompt, it's easier to notice without the loras and cfg.
>>
>>102216704
>You add the code from the lora loader into the unet loader?
>lora loader
you mean the "Unet loader GGUF"? because that's the node that has the preventing of unloading/reloading fix inside of it
>unet loader
you mean the "Load Diffusion Model" node?
>>
>>102216715
>there is some issue with the prompt
what do you mean? the prompt was that one
>a pulp cult anime illustration from japan,
>Hatsune Miku is on her car, hands in the air, her eyes are closed and she is sad, her text bubble speech says: "Please Hatsune Miku, save me!", red hair
I didn't put Teto this time because vanilla flux doesn't have it
>>
>>102216734
Didn't he fix only the lora node? There's a fix for the gguf unet loader node too?
>>
>>102216758
>Didn't he fix only the lora node?
how could he do that? his GGUF repo doesn't have a lora node, it only has a GGUF unet loader node
>>
>>102212437
I'm curious what prompts you use to get it to do art in Disgaea style.
>>
File: fs_0024.jpg (140 KB, 1280x896)
140 KB
140 KB JPG
>>
>>102216754
i mean the way the prompt is being interpreted, that prompt needs to change to recognize the interior of the car. it's one of those cases where flux just doesn't get it with a simple "on her car"
>>
File: file.png (66 KB, 953x693)
66 KB
66 KB PNG
>>102216700
>find a better way
Maybe, there's limited examples of aesthetic gradients because it never took off for some reason so it's not a fair comparison. Also looks like Google's only works on Stable Cascade until someone does the needful, whereas aesthetic gradient should still work for any model using clip.
>>
>>102216754
hands in the air might be conflicting with the car prompt
>>
>>102216810
finetuning clip_l won't make such a drastic difference/improvement, finetuning t5 however...
>>
>>102216827
in reality that's possible though kek
https://www.youtube.com/watch?v=rSoo9WS8t7k
>>
>>102216841
yeah but flux follows llm logic and not real logic
>>
>>102216817
>Since Flux was released after we had completed this project, I'm curious to see what will happen if we integrate Flux in RB-Modulation
hype
>>
>>102216863
Except by we he means someone who isn't me
>>
>>102216885
yeah you may be right :(
>>
Oven ready bread...
>>102216905
>>102216905
>>102216905
>>
>>102216817
it didn't take off because it didn't really work
>>
>>102216841
for flux how many examples do you think the llm tagged with "hands in the air" inside a car instead of "hands up"
>>
File: 00043-2531242920.png (1.14 MB, 1216x832)
1.14 MB
1.14 MB PNG
>>102216909
:3
>>
File: 00020-2100863672.jpg (716 KB, 2432x1664)
716 KB
716 KB JPG
>>
>>102216766
I use this lora
>https://civitai.com/models/709964/disgea-style-by-takehito-harada-for-flux
>>
>>102216404
Which inpainting workflow is a good starter?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.