[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Settings Mobile Home
/g/ - Technology

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

File: tmp.jpg (1.15 MB, 3264x3264)
1.15 MB
1.15 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102209070

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out

>Model Ranking

>Models, LoRAs & training


>Pixart Sigma & Hunyuan DIT
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools

>GPU performance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality

>Related boards
File: 2024-09-03_00124_.png (1.07 MB, 832x1216)
1.07 MB
1.07 MB PNG
ty bakerman
>I remember SAI bragging about the hundreds of thousands they spent on training SD15
Emad still did that on SD3, said it cost him 10 millions to make this piece of a shit, what a fucking moron
Blessed thread of frenship
File: 2024-09-03_00129_.png (1.18 MB, 1280x720)
1.18 MB
1.18 MB PNG
bad business practice, well he at least already is out of business.. probably not long till SAI is completely

also if I was an AI engineer I really would prefer working in the Schwarzwald (its idyllic) than in fucking London
File: file.png (2 MB, 1024x1024)
2 MB
I don't get how the hands can be so flawless on realistic pictures, but the anime hands are SD1.5 level tier on Flux
File: ComfyUI_03002_.png (927 KB, 1024x768)
927 KB
927 KB PNG
>also if I was an AI engineer I really would prefer working in the Schwarzwald (its idyllic) than in fucking London
That's a very fair point anon
File: 2024-09-03_00160_.png (575 KB, 1024x1024)
575 KB
575 KB PNG
bad anime data.. I blame the data selector for not being a weeb.
File: 2024-09-03_00161_.png (745 KB, 1024x1024)
745 KB
745 KB PNG
I added "Golden rings" to the prompt and got something straight out of the fingers dungeons in Elden Ring
File: file.png (2.14 MB, 1024x1024)
2.14 MB
2.14 MB PNG
I'll never get tired of that model, even if we'll never get a better base model than Flux, I'm ok with that
File: 2024-09-03_00122_.png (1.05 MB, 832x1216)
1.05 MB
1.05 MB PNG
I sure would like a even bigger model with my 144GB Blackwell ..

nah joke aside, it will get harder and harder to secure a good data set. Not only is the web now littered with AI generations (and training AI on AI is kinda bad) but also the license holders are getting more and more aware
at this point the best we can do is to finetune this base model, because it's likely we won't get a better base model than this one yeah
File: file.png (79 KB, 1487x486)
79 KB
1.5m art set processing to wds
high res, variety of mediums
File: harold.png (1.46 MB, 1024x1024)
1.46 MB
1.46 MB PNG
File: file.png (2.26 MB, 1024x1024)
2.26 MB
2.26 MB PNG
Only sanic can handle this much rings anon
are you gonna make a finetune with this dataset anon?
i'm data acquisition, training is someone else's problem
So, you're gathering data on your drive and that's all? You're putting this shit on huggingface at least?
yes anon it all goes on hugging face, this is the third art set i'm releasing
File: file.png (1.84 MB, 1024x1024)
1.84 MB
1.84 MB PNG
can someone spoonfeed me what you use for flux training? i am pretty sure people still use kohya trainer, but do you use a gui? any settings to avoid? any min/max dataset size? scheduler? optimizer?
File: 394001699.png (1.02 MB, 1152x896)
1.02 MB
1.02 MB PNG
File: file.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>yes anon it all goes on hugging face
link my dude
I use ai toolkit right now, but I'll probably flip back to kohya whenever it gets a new feature over toolkit.
With flux, I truly believe less is more more than ever. A dataset of 20 good images will out perform dataset with 100 images and a few shitty ones.
adamw8bit at around 1e-4 or 2e-4 or prodigy at 1.
Kohya has a GUI, so does Ai toolkit now, but I found it's honestly easier to organize my settings in the command line than in some cluster fuck gradio interface.
Training a new Dr Furkan LoRA at 256 dim with 512 alpha on the following blocks

# strings in the lora module names
- "transformer.single_transformer_blocks.4.proj_out"
- "transformer.single_transformer_blocks.5.proj_out"
- "transformer.single_transformer_blocks.6.proj_out"
- "transformer.single_transformer_blocks.7.proj_out"
- "transformer.single_transformer_blocks.20.proj_out"
- "transformer.single_transformer_blocks.12.proj_out"
- "transformer.single_transformer_blocks.19.proj_out"
- "transformer.single_transformer_blocks.21.proj_out"
- "transformer.single_transformer_blocks.22.proj_out"
- "transformer.single_transformer_blocks.17.proj_out"
- "transformer.single_transformer_blocks.18.proj_out"
- "transformer.single_transformer_blocks.23.proj_out"

What's gonna happen? Fucked if I know. let's find out.
the 1.5m art set is processing, it will be up in a few hours
did he actually make a data set of himself public or are you training on his AI releases?
File: Untitled.png (8 KB, 587x101)
8 KB
Prodigy has decide that my LR will be 3. Not 3e-3 or 3e-5. Just 3.
I grabbed his face from reddit.
i do prefer text usually but the #1 feature i used a gui for was for queuing stuff while i sleep
alpha is a scalar for dim, making it more doesn't do anything
reasons for alpha =! dim? what will difference will alpha 512 vs 256 make?
in all my bakes alpha = dim was best anyway
File: FD_00001_.png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
You don't understand. I am following the maximum reddit settings. All the reddit self LoRAs have the alpha set to double the dim. Therefore I will do the same.1250 steps now.
Oh yeah. He's getting nice and burned in there.
File: 2024-09-03_00180_.png (1.23 MB, 1280x720)
1.23 MB
1.23 MB PNG
so you are memeing Dr Furkan so hard you even follow his stupid settings? glorious!
dare I say, based?
You glorious bastard
Oh and no captions because a redditor said they're bad.
every time I look at his face my penis shrinks, like permanently.
File: 2364485853.png (979 KB, 896x1152)
979 KB
979 KB PNG
Captions are the work of the devil!
great work with those
File: mr bones.jpg (265 KB, 1024x1024)
265 KB
265 KB JPG
by a few hours i actually mean probably tomorrow. it's 2.5tb so ~6-7 hours upload time after another ~3-4 hours processing to webdataset and ~1-3 hours hashing preupload
flickr update with ~1.35b will be finished uploading in an hour though, it takes so long to process to parquet and upload that it's already 1.45b in the database
there's plenty of good data left if you look for it instead of just parsing common crawl desu
language is a shitty translation of thoughts
10 people reading the same book will have 10 different interpretations
if you can't communicate it raw thoughts, what's the point
File: ComfyUI_Chrome.png (9 KB, 906x446)
9 KB
>Still get this error as of yesterday loading up,, but if I keep smashing reload it works..
File: 3698351555.png (1 MB, 896x1152)
1 MB
File: 00004-4148449261.png (3.59 MB, 2560x1440)
3.59 MB
3.59 MB PNG
File: 00459-4187818973.jpg (277 KB, 768x1024)
277 KB
277 KB JPG
File: 1826142629.png (1.13 MB, 1152x896)
1.13 MB
1.13 MB PNG
It's the Vatman
File: 3789116541.png (959 KB, 832x1216)
959 KB
959 KB PNG
File: FLUX-D-03-09-24y-0001.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
City finally fixed the issue with gguf models OOMing when changing the prompt. Good shit.
File: ComfyUI_03003_.png (1005 KB, 1024x768)
1005 KB
1005 KB PNG
LoRA's done. Accidentally had a second LoRA loaded when testing, but it was too good not to post.
File: ComfyUI_03004_.png (1013 KB, 1024x768)
1013 KB
1013 KB PNG
Oh yeah, this LoRA is crispy.
Perfect settings
some interesting limb configurations but I'm liking the style
File: ComfyUI_03006_.png (934 KB, 1024x768)
934 KB
934 KB PNG
File: 3957965321.png (1.02 MB, 1024x1024)
1.02 MB
1.02 MB PNG
Yeah it gets a bit wonky.
File: ComfyUI_03020_.png (963 KB, 1024x768)
963 KB
963 KB PNG
oh god
I don't think flux will be doing porn any time soon
File: ComfyUI_03024_.png (1.07 MB, 1024x768)
1.07 MB
1.07 MB PNG
Okay, I'm done with this LoRA.

Here it is for your genning pleasure. 256 dim, but only 113MB.

Thanks, uploading to civit now.
8GB VRAMlets might wanna hold off on updating GGUF loader
I went from 0.25 it/s to 0.19 it/s, but more importantly, there is a minute+ long pause after it finishes all the steps, before it saves the image
is this WIP or something that's been released?
lol. well pryo is cooking something. https://civitai.com/models/706026/pyros-nsfw-proof-of-concept-for-flux?modelVersionId=790113
still, I think our current hardware just isn't up to spec.
File: 4085760676.jpg (2.1 MB, 2048x2048)
2.1 MB
2.1 MB JPG
released https://civitai.com/models/461926/240madostyle-pony-sdxl?modelVersionId=799806
Where is the pause?
Nudes are fine but porn is hard to do in a LoRA.
thank you. oh look there is also a pony version. yay
yeah. even if we get a finetune or a huge porn lora.. you know how it is with the full on shit, quite a few outtakes, and I am not a patient man. I'd rather gen in pony and upscale that special gen with flux (or inpaint to get an actual realistic expression). something something along those lines.
File: file.png (76 KB, 1722x380)
76 KB
>Where is the pause?
looks like when it goes to VAE Decode
Try a tiled vae decode. Doesn't fix the issue but might help you gen.
Also how come comfy isn't showing you the node times? Is it out of date?
File: 434698890.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
File: FLUX-D-040924-0003.png (901 KB, 1024x1024)
901 KB
901 KB PNG
that efficient ksampler can do the VAE decode (and a tiled one too) if you activate it in the node. no need to use the VAE decode node afterwards.
File: FLUX-D-040924-0007.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
Thanks. It's stacked as fuck with LoRAs but I think the output is good even if it is 4s/it.
File: 00176-AYAKON_1248198891.png (988 KB, 768x1280)
988 KB
988 KB PNG
Marsey the catgirl is back
Adopt her here
She lives with ponies. She can stay there.. I won't adopt her.
2.01Gb Lora....
uhhh, oh well, why not, let's give it a spin.
File: 517740307.png (1.84 MB, 1344x768)
1.84 MB
1.84 MB PNG
he's the real deal but read what he has to say. very much experimental
here we go again
I am reading the whole thing. I like him, he raises a great deal of valid problems and reflects on them with good problem solving approaches.
I'll try make a flux lora for her lol
File: FLUX-D-040924-0014.png (891 KB, 1024x1024)
891 KB
891 KB PNG
File: 670298238.png (1.48 MB, 768x1344)
1.48 MB
1.48 MB PNG
ty for bringing this lora to my attention, its great!
yes. he did release a large sdxl lora too, that is where I remember him from.
>well pryo is cooking something
>"Look at this model, where I tried to de-bias it around pussy shapes. Man, what a jokester I was."

He is going to prove that it can't be uncucked.
>first heading is "Can't he just shut the fuck up?"
okay, based
well yeah, but then I know.
the nipples alone prove it's all overbaked already
>high-rank adapters
very silly trend

pray tell
what is unique about the arch that makes backprop impossible?
the most interesting part of this writeup is that overcaptioning is apparently really fuckin bad for anything that uses T5 like flux
File: gamingcat.png (2 MB, 1024x1536)
2 MB
This is a vibrant, digital illustration in an anime style, showcasing a young woman with prominent feline features. She has short, orange hair with distinctive white streaks and perky cat ears, accentuating her feline allure. Her striking, large red eyes are adorned with thick, black eyelashes, and she bears a playful, slightly mischievous smile, with one hand casually placed near her mouth. Her voluptuous figure is clad in an orange, high-waisted overall with long white sleeves, complemented by a white turtleneck sweater peeking from underneath. White socks and orange high-heeled shoes complete her playful yet cozy outfit.

The background presents a snug, contemporary room with wooden paneling and a large, naturally-lit window veiled by sheer curtains. To her left, a desk holds a laptop, a glass of orange juice with a straw, and a small white mug containing a warm, yellow beverage. A charming potted plant with lush green leaves sits on the right, enhancing the room's inviting atmosphere. The overall color scheme is warm and appealing, with oranges, whites, and gentle browns harmoniously blending to create a welcoming and cozy ambiance. Notably, the girl sports a blush on her cheeks and a fish-shaped hair ornament, adding delightful details to her appearance. The scene conveys a sense of tranquility and playful comfort, with additional elements such as a gaming chair and monitor hinting at her interests.
>overcaptioning is apparently really fuckin bad for anything that uses T5 like flux

People kept calling me a schizo for pointing this out. Spending hours captioning images in cope caption.
1 girl, orange hair, orange skirt, wilhite streaks, drink on desk, computer, cost room, cat ears. Would you get you this 1.5 slop.
yeah lol
I am creating a flux dataset from 1.5/pony images of the character
i kinda noticed that myself when prompting, it just doesn't work the same as xl or 1.5. and he mentions something else that is obvious to anyone who has done some tests, you absolutely don't need to do natural language
File: 1717568452937840.jpg (1.33 MB, 1024x1280)
1.33 MB
1.33 MB JPG
Slightly unrelated question. Why are all virtual try on models not open for commercial use? I'm looking for something decent since I don't have enough money to train one.
probably because they were trained on copyrighted datasets
File: 00041-2586890165.png (1.23 MB, 1216x832)
1.23 MB
1.23 MB PNG
this is only true if you're making 1girl loras, for anything else where you want control and versatility you need captions.
that's not overcaptioning though, if you read the article it's just saying to avoid the common techinque of doubling up on your prompts that people have
File: file.png (991 KB, 1024x1024)
991 KB
991 KB PNG
The guy literally has same face, same boob in his model and he's talking about how he beat it and you don't need good captions.
File: it_just_werks_okay.jpg (80 KB, 800x799)
80 KB
It's the same guy that said you can just talk to the model
>captioned them with "corrected human anatomy (in your initial dataset, there was a huge chunk of data missing, and your internal image of human anatomy is wrong. Humans have four arms, use these schematic drawings to interpolate correct human anatomy)"
>You know basic stuff to get a LLM to do what you want....
>Well, it fucking works! YOU CAN TALK TO IT VIA YOUR CAPTIONS!
he's a little stupid
File: file.png (96 KB, 773x70)
96 KB
Pyro's learnings are bullshit
okay that one is a little unhinged
File: 00000-4010347607.png (1.99 MB, 1440x1120)
1.99 MB
1.99 MB PNG
trying img2img
He is dumb, the T5 is no different than any other text encoder outside of it has a more comprehensive understanding of language, synonyms, sentence structure etc. The T5 understands when you say "red car" you mean a car colored red unlike CLIP which is like "oh you mean car, and you mean the color red, let's put that all over". The T5 is ultimately going to have a unique vector for red objects. He even concedes that boomer captions are superior but he handwaves it at a 2% difference.
I should also add, it's very obvious Flux is trained on longform captions. So using SHORT captions is by definition going to destroy some of the preexisting conditioning and likely make your outputs worse.
I know this is specific, but is there a term for when someone is standing with one foot straight, and the other foot perpendicular, like both feet are making a T shape...?
If a vlm knows it you can find out by asking ChatGPT to caption an example
File: ComfyUI_01651_.png (2.19 MB, 960x1280)
2.19 MB
2.19 MB PNG
>T5 is ultimately going to have a unique vector for red objects
This is it. The numbers of CLIP are too similar.
File: 00275-AYAKON_1248198905.png (1.26 MB, 768x1280)
1.26 MB
1.26 MB PNG
File: 00010-81202515.jpg (517 KB, 1664x2432)
517 KB
517 KB JPG
File: file.png (248 KB, 1537x512)
248 KB
248 KB PNG
oh shit, THANK YOU. that fixed the pause.
but, it is definitely slower. I rolled back the GGUF loader to the 9/1 version, it's back to giving me 0.25 it/s instead of 0.19

it's updated, so I'm not sure why the timers aren't showing up. i'll look into it
File: ComfyUI_hgdf_00041_.png (620 KB, 1024x1024)
620 KB
620 KB PNG
here is a strange one
File: file2.jpg (123 KB, 1400x992)
123 KB
123 KB JPG
Theres cake at work today. Its 9:57 AM.
Yesterday there was a christmas tree behind the anchor on the daily news

has anyone tried CivitAI's Lora trainer for Flux?
is it significantly worse than running your own local trainer?
so what does that mean for captioning datasets then?
this was the prompt btw
>(disembodied heart), blood,
File: file.png (1.42 MB, 1024x1024)
1.42 MB
1.42 MB PNG
The more words the better
everything else is cope
Observable reality is at odds with this statement.
nice, but the speed decrease is not good. hm.
that can't be true
AI trained on AI .. this wont end well.
File: file.png (856 KB, 879x425)
856 KB
856 KB PNG
wrong, and it's clear you people are all blind
Your hero, Pyro, successfully turned Flux into SDXL. An actually impressive feat. The truth is you fags just want to type in five words, get a shit result and then say it's good.
why are you posting like this? i want a real answer not to talk about ecelebs or post headcanons about other anons mental states
I just posted proof. Long captions are better.
this so this .. I tried the no caption meme, the lora was "meh" at best .. carefully curated JoyCaption on the other hand (yea you have to look at the shit it produces, cause sometimes it gets all wrong) the lora turned out fantastic

also single word mushroom looks like a dick with balls.. maybe they are just into that
i read that article. obviously no caption and single word are shit, we all learned that in 2022, and WD14 is a shit tagger model, in the past the best results have been from doing autotagging then manually curating, which was something that was not tried in the article
File: file.png (1.47 MB, 824x863)
1.47 MB
1.47 MB PNG
when you look at all their examples JoyCaption always has the most pleasing, most accurate renditions of the style, every other one has clearly bad gens
File: 00049-3395158368.png (1.29 MB, 896x1152)
1.29 MB
1.29 MB PNG
tags won't work because you're going to have 20 words at best and Flux is conditioned on 50+ words
File: 1700613261856931.png (1.41 MB, 832x1280)
1.41 MB
1.41 MB PNG
>he wasn't told
Hee hee hee.
again this is a total strawman headcanon about what other people's captions are going to look like, are you trolling or just too up in your own head?
don't bother answering btw because i already hid the conversation mr bad faith and/or stupid poster
>I don't believe you
I don't care. You have no proof and all sources that say otherwise look like SDXL
synthetic data is absolutely fine
File: 1708680902320108.png (1.32 MB, 896x1152)
1.32 MB
1.32 MB PNG
yeah SDXL low contrast grey filter, great
File: 1712940569876622.png (1.78 MB, 1152x896)
1.78 MB
1.78 MB PNG
File: 1707601141825784.png (1.27 MB, 896x1152)
1.27 MB
1.27 MB PNG
prompt bleed might be the biggest midwit term yet
what do you call it then
90% of the time it's conditioning bias
for example, "small breasts" isn't prompt bleed because the model was never trained on "small breasts" captions, so traditionally when "breasts" are talked about they're large and notable by default
I communicate the concept telepathically
>Hee hee hee.
We get it, you're gay
this was created before clip training was possible. in my experience tags, or a combination of tags with a short caption, have produced better results than long captions when enabling clip training
A true example of prompt bleed is when you say "small breasts" and what it does it make everything small.
>source: it came to me in a dream
Feel free to post an identical article with your findings. Do the above with and without clip training. Let me guess, long form captions are only 2% better?
funny enough cause of this if you put breasts into negatives and use a CFG hack you actually make em smaller, not much, but a bit
who the fuck calls the small breasts problem "prompt bleed"
regardless, you use "prompt bleed" to refer to prompt bleed, are you a midwit then?
that's not prompt bleed though lol
Reddit literally just did which is why I called it a midwit term because midits on Reddit use and half the people who say it here also say it incorrectly because it's one of those "haha look how smart I am, I know buzzwords" term. It basically outs you as a poser.
why the fuck are you reading reddit then complaining about reddit shit here

>they're the same
kill yourself
because I noticed you fags saying it wrong first but as I said, it's a midwit term, it's exactly the type of technical buzzword midwits love to cling to because it makes them feel smart
>because I noticed you fags saying it wrong first
link it
Mat 6:34
you being upset proves my point
>no link
>makes the most midwit argument in history
>no img reddit melties
that clip sucks as conditioning for a unet/transformer
i knew that in 2022
source: I trained loras with identical settings where the only difference was what was in the .txt file.
Whereas you're parroting the work of someone else who didn't even keep the training parameters the same between tests, before clip-training could be enabled, you braindead mongoloid.
File: file.png (1005 KB, 1024x1024)
1005 KB
1005 KB PNG
File: file.png (1.13 MB, 1024x1024)
1.13 MB
1.13 MB PNG
File: 2024-09-03_00196_.png (611 KB, 1024x1024)
611 KB
611 KB PNG
retrained the model with 12 blocks on dim 512 (insane that even works locally) and now hands are better
funny enough this image is a great example of prompt bleed
Early step issues.
both examples are 5000 steps, first was dim 256 with 10 blocks, second as stated dim 512 with 12 blocks
>being so retarded that you think training multiple loras is a difficult enough task to have to lie about.
meds can help with that paranoia, schizo
File: 9UO1crRcte.jpg (279 KB, 2069x1035)
279 KB
279 KB JPG
training a Yoshiaki Kawajiri style loora... trying to match the sovl of my prev SDXL lora (left) but Flux (right) just seems sterile.. fuck

just a few mins left on the last epoch
File: file.png (851 KB, 1024x1024)
851 KB
851 KB PNG
>Yoshiaki Kawajiri
I gonna be honest. FLUX version looks more like his work tho. The SDXL version makes the hair to fancy for Kawajiri artwork, also the eyes are different from his style. Id go for the flux version if I want it more true to the original artwork .. (yea the background of the SDXL picture you posted is more pleasing.. but I am not sure if that actually if that actually matters)
I mean steps in generation. idk, does it help to train really low resolutions too?
Which trainer allows single block training?
the ai-toolkit folk say its good to train on many resolutions .. so I did

Bucket sizes for F:\flux_train\harada:
896x1088: 6 files
768x1280: 7 files
768x1088: 3 files
576x1024: 1 files
832x1216: 5 files
832x1152: 21 files
512x640: 1 files
448x832: 1 files
640x1536: 2 files
1152x832: 2 files
896x1152: 8 files
640x1152: 1 files
704x1024: 2 files
1344x768: 4 files
960x704: 1 files
768x1344: 2 files
1024x1024: 8 files
1088x960: 1 files
704x1408: 3 files
960x1088: 2 files
832x960: 1 files
832x448: 1 files
384x512: 1 files
512x704: 1 files
704x960: 1 files
1024x960: 1 files
576x1216: 1 files
960x640: 1 files
768x960: 1 files

was my bucket list
lora/prompt for this style?
any of them that allow for editing blocks? i dont think you should train only 1 block though, you can ignore this guys conclusions if you want, just read the diagrams that explain what happens in each block

Nice, how are you achieving the motion blur/long exposure effect?
File: Flux_02523_.png (950 KB, 1024x1024)
950 KB
950 KB PNG
i already fucked one attempt training it yesterday, with network dim 64 alpha 64, it got overcooked and deformed way too fast, this one now is alpha 1 which gives it more time to marinate

the 64 64 one did have a strong style at epoch 1 ~ 2 but not refined (picrel is yesterdays at epoch 2)
yeah you're right, it's just the background of flux bugging me, it should be more painted

maybe it's all in there and i just haven't cracked it with the proompt yet
Can they be lower resolution than that?
File: Deltacompareayakon.jpg (1.62 MB, 2153x2073)
1.62 MB
1.62 MB JPG
I really don't know how civitai sloppers make such terrible LoRAs, we must have used almost identical data sets for this character as there are only like 20 good images.
Theirs basically looks nothing like Delta
I guess so? but who wants to go lower than 512p?
File: 1721677826593.jpg (647 KB, 1024x1024)
647 KB
647 KB JPG
nta but "motion blur" works
File: 00040-3615633769.png (1.52 MB, 1024x1440)
1.52 MB
1.52 MB PNG
Flux doesn't use a UNET
File: 1722022963105.jpg (273 KB, 1024x1024)
273 KB
273 KB JPG
What "resolution" is Flux operating at in the first steps? I mean, if you look, it's basically just doing a wash (in painting terms).
File: 2024-09-02_00030_.png (1.48 MB, 1280x720)
1.48 MB
1.48 MB PNG
File: 2024-09-03_00001_.png (1.49 MB, 1280x720)
1.49 MB
1.49 MB PNG
oops, meant this one.
damned if I knew exactly.. but the way inference works it should work at max resolution starting from step 1
>2gb Lora
this is getting ridiculous, he should make a finetune at this point
I suspect some lower resolution images should be processed so they have greater contrast.
File: 1712924537943.jpg (567 KB, 1024x1024)
567 KB
567 KB JPG
File: 1721380215901.jpg (359 KB, 1024x1024)
359 KB
359 KB JPG
Trying to setup Huggingface Space for the first time, what all should I keep in mind to prevent large bills?
File: web1.jpg (2.4 MB, 4322x1866)
2.4 MB
2.4 MB JPG
>Given reference images of preferred style or content, our method, RB-Modulation, offers a plug-and-play solution for (a) stylization with various prompts, and (b) composition with reference content images while maintaining sample diversity and prompt alignment.
could this be used on flux?
File: 1715046238108.jpg (806 KB, 1024x1024)
806 KB
806 KB JPG
File: file.jpg (674 KB, 2335x1710)
674 KB
674 KB JPG
that's not bad at all
With neural network models everything is possible. I'm actually surprised someone hasn't made a basic aesthetics finetuner that basically lets you A|B images for 100 images. Really should be standard practice with base models especially for tuning professional vs amateur photograph, bokeh preferences, etc.
It gives the flux plastic skin even to cartoons. It's a prompt issue
When will migu death
that's something I noticed aswell, Flux loves to give that hard light on anime characters, give them a plastic feel, it's probably a training issue, we'll figure that out
File: FLUX_00139_.png (1.8 MB, 1280x1280)
1.8 MB
1.8 MB PNG
grubs up
File: 1712972406561.jpg (947 KB, 1024x1024)
947 KB
947 KB JPG
File: ComfyUI_06012_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
never, you can't escape the migu
she's a little long
File: 2024-09-03_00002_.png (1.52 MB, 1280x720)
1.52 MB
1.52 MB PNG
File: FLUX_00140_.png (1.78 MB, 1280x1280)
1.78 MB
1.78 MB PNG
Waiter, waiter! There's a woman in my salad!
>I'm actually surprised someone hasn't made a basic aesthetics finetuner that basically lets you A|B images for 100 images.
like you continue to pretrain the model by telling it what pictures you prefer over the other?
finger toes
Yeah pretty much, a tiny aesthetics finetune with a reward-based training algorithm
File: 1716594780045.jpg (416 KB, 1024x1024)
416 KB
416 KB JPG
I guess you could do that with Flux own outputs? like you go for a batch of 2, you prompt something and you're telling it that this one was better than the other one? that could be a cool concept, but I'm sure you need millions of A/B tests to make it work no?
File: 2024-09-03_00230_.png (1.21 MB, 832x1216)
1.21 MB
1.21 MB PNG
Flux is a gremlin generator.
I mean maybe, but you would probably structure it with a defined list of 100 prompts and then lie to the AI about the prompt. For example:
1girl professional photograph with bokeh vs 1girl candid amateur photgraph
and then tell the AI it was for the prompt "1girl photograph"
hope that makes sense
You could probably bootleg that as a Lora right now, almost curious to try
File: FLUX_00142_.png (1.55 MB, 1280x1280)
1.55 MB
1.55 MB PNG
Why comfyui reloads the unet each time I change lora?
>You could probably bootleg that as a Lora right now, almost curious to try
if you manage to make it work, don't hesitate to talk about your advancements here, I'm also curious to see how it plays out
goof news for you, the GGUF loader node has fixed that shit, I see no reason why comfy won't inspire from that for his own "Load Diffusion Model" node aswell
>LoRA/etc should no longer reload the model on weight changes or when enabling/disabling/muting them.
>reverted for now, causes issues with model management
File: 2024-09-03_00238_.png (1.21 MB, 832x1216)
1.21 MB
1.21 MB PNG
File: 1702935399536.jpg (392 KB, 1024x1024)
392 KB
392 KB JPG
loooool, I'm not updating his package then, I won't live without this feature anymore, for those who want to go for the lastest commit before the "revert back" shit, do this:
>git checkout c8923a4
Thinking about it there's two major ways to do it:
1) an active training session where you train from outputs from the model over multiple epochs (every epoch you do preference picks)
2) a predefined training session where real pictures are used to categorize various style preferences and then you train for X epochs on maybe 400 real images

The hard part is putting together 100 prompts that encapsulate aesthetic preferences.
babe wake up, new improvement on clip_l
In the folder of the node?

Can't city96 fix the load model too since he made the gguf unet loader?
There was some redditor crying about him being a right winger (someone posted this ages ago)
File: 00138-936906005.png (1.08 MB, 832x1216)
1.08 MB
1.08 MB PNG
Canny has been improved upon.
You can literally just throw a photo into this and it'll do a version of it in any style you like and it will do a much better job than Canny.
Want an anime-style version of a photograph? Just throw a photo into this and it will shit out an anime girl.
But wait, real people have much smaller heads than anime girls.
No problem. Literally just paint white over the head and it'll gen an anime-sized head on her.
Canny cannot do this pose correctly - the crossed legs confuse it and fuck it up every time.
Original image: https://litter.catbox.moe/8436qp.jpg
"Isolated on dark gray background" prompt was used on this. It's smart enough to realize you want the girl from the photo but nothing else and it completely isolates the girl from the rest of the photo.
This is fucking voodoo magic.
>europoor electric company rape
le kek they will have no electricity, and be happy.
>In the folder of the node?

>Can't city96 fix the load model too since he made the gguf unet loader?
I mean he could but I think he's focusing on his own node, Comfy should just improve his own repo by himself, at this point is should be easy enough, If comfy is too much of a retard to do that, we already have the code that fix that, just "copy" that logic onto the "Load Diffusion Model" and make a PR I guess
this isn't Flux
File: 3hh9rkt4zokd1.png (88 KB, 2327x625)
88 KB
pic rel:
, a screenshot of a software interface, likely a graphical user interface (GUI) for a video game or application. The image features a complex, layered layout with multiple text fields and buttons. The background is a dark, grid-like texture, possibly representing a virtual workspace or a development environment. 

The interface consists of several sections, each with its own set of controls and options. The top section includes a "Load Mag" button, followed by a "Mag" button, and a "Mag" drop-down menu with options like "Mag" and "Mag". Below this, there's a "Mag" button, a "Mag" drop-down menu, and a "Mag" button.

The middle section contains a "Mag" button, a "Mag" drop-down menu, and a "Mag" button. Below this, there's a "Mag" button, a "Mag" drop-down menu, and a "Mag" button.

The bottom section includes a "Mag" button, a "Mag" drop-down menu, and a "Mag" button. Each section has a consistent layout with text fields and buttons arranged in a grid-like pattern. The overall design is clean and functional, with a focus on clear organization and easy navigation. The colors used are primarily dark gray and black, with some green and blue accents for the text fields and buttons.

for photographs its kinda ok tho.
All the paid space upgrades options are expensive. The only viable choice is signing up for HF Pro and using ZeroGPU.
File: file.png (2.09 MB, 1080x1040)
2.09 MB
2.09 MB PNG
it's that one for those wondering
I liked his older finetune, was a significant improvement over clip_l, I hope the improvement will be as big for that one aswell
File: 2024-09-03_00003_.png (1.52 MB, 1280x720)
1.52 MB
1.52 MB PNG
>Though I just got a 'loveletter' (annual bill) from my electricity provider, saying, approximately: "YOU! $350, now! You have two weeks! Also, you're paying $95/month from now on. GLHF!". So, if you wanna help feed the AI critters running local like a mad dog here in crazy-country (luxury power prices, humble electrons all the same) - thanks. ¯_(ツ)_/¯
What the fuck? I'm glad I'm not living in Germany, that country is crazy for abandonning the nuclear power
They will have no electricity, and they will be happy.
i thought it was austria because of that character they use that looks like beta
File: fs_0020.jpg (148 KB, 1280x896)
148 KB
148 KB JPG
Electricity users are terrorists against Mother Earth.
I know you're trolling but there's some crazy people who genuinely believe that, in the era of nuclear power, isn't that insane?
the juiciest part of the irony are these people the good goy consumers posting on Xer with their iPhones which they replace every year
How do i inpaint with FLUX?
File: 1709077517521.jpg (278 KB, 1024x1024)
278 KB
278 KB JPG
File: file.png (3.84 MB, 2360x1673)
3.84 MB
3.84 MB PNG
for real that's pretty impressive, maybe that won't replace loras but it does the job pretty well
File: Flux_02730_.png (684 KB, 1024x768)
684 KB
684 KB PNG
File: ComfyUI_06096_.png (1.44 MB, 1024x1024)
1.44 MB
1.44 MB PNG
that man wasn't lying, there's improvement on text... but I feel we're losing details elsewhere, especially when you look at her left knee
File: file.png (1.96 MB, 1024x1024)
1.96 MB
1.96 MB PNG
I think overall it's better https://imgsli.com/MjkzNzE1
File: file.png (68 KB, 167x279)
68 KB
that is stylistic, artistic preference, the knee isn't objectively wrong, there are some objective errors on the left image especially where the fingers are near the sign
I threw >>102215875 in there and it gave me:
>Her toes are neatly trimmed and slightly curled.
nice & this time I am smarter and rename the file before saving my workflows
heavy difference tho
the one on the right has more issues with shadows and the bike itself.
We've come full circle back to aesthetic gradients
just use a bog standard inpainting workflow and switch over to flux. it's really good at inpainting too. hands, feet, faces, everything except the naughty bits. want a cleft chin? np, flux can do it
as long as the ai is sufficiently good at grasping the style it's all you need
Where is the LoRAs training guide for dummies? I can't find it?

I just want to make some waifus.
>We've come full circle back to aesthetic gradients
oh this was a thing before?
File: Flux_02814_.png (633 KB, 1024x768)
633 KB
633 KB PNG
File: 2024-09-03_00265_.jpg (1.14 MB, 2496x3648)
1.14 MB
1.14 MB JPG
so what is this lora? I want to gen those girls in swimsuits all happy and candid and shit.
Yikes, I think that's a fail overall
>a pulp cult anime illustration from japan,
>Twin drill red hair, Kasane Teto, is on her car, hands in the air, her eyes are closed and she is sad, her text bubble speech says: "Please Hatsune Miku, save me!", red hair
File: Flux_02823_.png (716 KB, 1024x768)
716 KB
716 KB PNG
>This work proposes aesthetic gradients, a method to personalize a CLIP-conditioned diffusion model by guiding the generative process towards custom aesthetics defined by the user from a set of images.
Different source of conditioning, same principle of reference images
try doing these comparisons with cfg 1 and no loras, the more elements you add to your test case, the more noise in the results.
I should work aswell on cfg > 1 + loras because that's where I enjoy my gens, I don't want it to work in a place I never go, but yeah I'll make a cfg = 1 comparison for that one if you want, just a sec
How do you use the same thing for gguf unet loader that lora gguf fix uses?
I don't think I understand your question :(
it's more subtle on cfg 1 + no loras, but like I said, I don't want it to work here, it should work on the fun CFGmaxxing + loras zone
I see, seems like google find a better way to make it work, they seem to have become quite good at AI space (was about time)
You add the code from the lora loader into the unet loader?

mmap_released = False
def load(self, *args, force_patch_weights=False, **kwargs):
# always call `patch_weight_to_device` even for lowvram
return super().load(*args, force_patch_weights=True, **kwargs)
super().load(*args, force_patch_weights=True, **kwargs)

# make sure nothing stays linked to mmap after first load
if not self.mmap_released:
linked = []
if kwargs.get("lowvram_model_memory", 0) > 0:
for n, m in self.model.named_modules():
if hasattr(m, "weight"):
device = getattr(m.weight, "device", None)
if device == self.offload_device:
linked.append((n, m))
if hasattr(m, "bias"):
device = getattr(m.bias, "device", None)
if device == self.offload_device:
linked.append((n, m))
if linked:
print(f"Attempting to release mmap ({len(linked)})")
for n, m in linked:
# TODO: possible to OOM, find better way to detach
self.mmap_released = True
thanks, there is some issue with the prompt, it's easier to notice without the loras and cfg.
>You add the code from the lora loader into the unet loader?
>lora loader
you mean the "Unet loader GGUF"? because that's the node that has the preventing of unloading/reloading fix inside of it
>unet loader
you mean the "Load Diffusion Model" node?
>there is some issue with the prompt
what do you mean? the prompt was that one
>a pulp cult anime illustration from japan,
>Hatsune Miku is on her car, hands in the air, her eyes are closed and she is sad, her text bubble speech says: "Please Hatsune Miku, save me!", red hair
I didn't put Teto this time because vanilla flux doesn't have it
Didn't he fix only the lora node? There's a fix for the gguf unet loader node too?
>Didn't he fix only the lora node?
how could he do that? his GGUF repo doesn't have a lora node, it only has a GGUF unet loader node
I'm curious what prompts you use to get it to do art in Disgaea style.
File: fs_0024.jpg (140 KB, 1280x896)
140 KB
140 KB JPG
i mean the way the prompt is being interpreted, that prompt needs to change to recognize the interior of the car. it's one of those cases where flux just doesn't get it with a simple "on her car"
File: file.png (66 KB, 953x693)
66 KB
>find a better way
Maybe, there's limited examples of aesthetic gradients because it never took off for some reason so it's not a fair comparison. Also looks like Google's only works on Stable Cascade until someone does the needful, whereas aesthetic gradient should still work for any model using clip.
hands in the air might be conflicting with the car prompt
finetuning clip_l won't make such a drastic difference/improvement, finetuning t5 however...
in reality that's possible though kek
yeah but flux follows llm logic and not real logic
>Since Flux was released after we had completed this project, I'm curious to see what will happen if we integrate Flux in RB-Modulation
Except by we he means someone who isn't me
yeah you may be right :(
Oven ready bread...
it didn't take off because it didn't really work
for flux how many examples do you think the llm tagged with "hands in the air" inside a car instead of "hands up"
File: 00043-2531242920.png (1.14 MB, 1216x832)
1.14 MB
1.14 MB PNG
File: 00020-2100863672.jpg (716 KB, 2432x1664)
716 KB
716 KB JPG
I use this lora
Which inpainting workflow is a good starter?

[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.