[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: tmp.jpg (1.24 MB, 3264x3264)
1.24 MB
1.24 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102017300

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>GPU performance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/c/kdg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/u/udg
>>>/trash/sdg
>>
https://civitai.com/models/672541/may-li-flux?modelVersionId=752879
Bros
>>
>>102021045
I hate IP Range Ban
>>
File: delux_flebo_00034_.png (1.5 MB, 1216x832)
1.5 MB
1.5 MB PNG
>mfw
>>
cursed thread of avatarfags
>>
>>102021057
I like it, without that the schizos would've destroyed 4chan
>>
>>102021043
I'm running it through joy-caption, which arguably isn't that great but its good enough.
>>102021044
It's mostly that some elements are just straight up missing, so I'm having to add in more context. Also not sure if I should get rid of the flavored descriptions it tends to output.
>>
>>102020766
looks like bernie, kinda kek
>>
flux, upscaled, no interp. original 48x96
>>
>>102021087
i'm currently messing with getting minicpm to take an image + some manual boorutag style captions i made and getting it to spit out a natural language caption. seems to work decently well for adding in info that the vlm wouldn't ever be able to get on its own, though i havent trained on it yet or tried it out too much.
i'd think the flavored text is moderately good since it's kinda what the model was trained on?
>>
File: ComfyUI_00105_lg.png (17 KB, 480x960)
17 KB
17 KB PNG
>>102021135
>>
>>102021070
Yeah but they are banning the wrong schizos man, now some retard is using my IP to upload shit somewhere here and I am here seething
>>
File: ComfyUI_00109_.png (14 KB, 96x96)
14 KB
14 KB PNG
tiny.
>>
>>102021136
>It's kinda what the model was trained on?
Well we're not sure what the model was trained on. It doesn't seem to be impacting it negatively though
>>
File: ComfyUI_32712_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
>>
File: 00046-2024-08-22-cJak.jpg (2.36 MB, 2048x2688)
2.36 MB
2.36 MB JPG
>>
>>102021197
That will make some ants happy
>>
File: ComfyUI_00008_.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>
>>102021054
this one actually looks decent
>>
>>102020975
Cool
>>102021213
Nice expression
>>
>>102021216
Fusion generates strange results with basically icon sizes. It's not what you expect. I can upscale. This is a 20 step Flux image originally 96x96, upscaled without interpolation. The prompt was "anime feet, painted toenails"

>>102021229
lol
>>
File: ComfyUI_00111_lg.png (29 KB, 960x960)
29 KB
29 KB PNG
>>102021236
>>
File: Capture.png (36 KB, 895x419)
36 KB
36 KB PNG
>>102021229
I see you're improving your workflow on AutomaticCFG, that's cool, what are your parameters right now? For me it's cfg 6 + this piecel picture
>>
File: temp_hayxh.png (2.13 MB, 1120x1440)
2.13 MB
2.13 MB PNG
>>
>>102020684
Can this be mitigated by having some training examples that have the lora subject beside other unrelated people?
>>
What's the experience of running flux on mid end hardware? I can run sd-xl on a 3050 with 16 gigs of ram, but big F looks like a way bigger model.
>>
File: settings.jpg (118 KB, 776x718)
118 KB
118 KB JPG
>>102021245
These are my current settings. Seems i need to have a lot of steps to get the text not fucked up. Using q8 and t516 not quanted
>>
>>102021245
for me it's Basic simpler + Basic guider
it give much better results and is faster, what does autocfg change in that aspect?
>>
File: 3_methods.jpg (3.94 MB, 7381x6083)
3.94 MB
3.94 MB JPG
>>102021289
>for me it's Basic simpler + Basic guider
can you elaborate on that? is this a workflow on CFG > 1

>it give much better results and is faster, what does autocfg change in that aspect?
it allows CFG > 1 and therefore better prompt understanding
>>
File: ComfyUI_32714_.png (2.44 MB, 1920x960)
2.44 MB
2.44 MB PNG
>>
>>102021288
>Seems i need to have a lot of steps to get the text not fucked up. Using q8 and t516 not quanted
75 steps is a lot to get good text, but yeah I also noticed that you need more steps to get consistency in text, I stopped at 30 steps because it's already long enough kek
>>
File: 00062-2024-08-22-cJak.jpg (2.51 MB, 2048x2688)
2.51 MB
2.51 MB JPG
>>
File: file.png (1.04 MB, 1024x1024)
1.04 MB
1.04 MB PNG
>>
>>102021273
>Can this be mitigated by having some training examples that have the lora subject beside other unrelated people?
I think the best solution to that is to simply pray for some autist to finetune Flux to the point it knows every single celebrities/anime characters
>>
>>102021309
>better prompt understanding
You can solve that by letting an LLM generate the prompt for you and then play around with it, like joycaption for example, you don't really need a high CFG to get the desired result
>>
>>102021321
Im willing to wait to bring my artistic vision to life lol
>>
>>102021338
boomer prompting is one tool, increasing CFG is another tool, what about boomer prompting + CFG? I do that and my Flux model never miss the details now :^)
>>
File: file.png (1.36 MB, 1536x1024)
1.36 MB
1.36 MB PNG
>>
File: GriftiusMaxiumus.png (1.4 MB, 800x804)
1.4 MB
1.4 MB PNG
I heard you were talking shit about me.
>>
>>102021315
>>102021357
:o !
>>
>>102021359
>that file name
kek
>>
>>102021349
Well, if you feel more comfortable with that then why not, but elaborating more on my point, let's say you want a girl with a chinese dress, not describing it well won't give you what you have in mind, but passing a photo of that specific dress to an LLM will output the keyword they use for it, which in turn you can easily use for your own prompt to get more accurate results. So with that it's still necessary to have proper prompt.
>>
File: 00067-2024-08-22-cJak.jpg (2.44 MB, 2048x2688)
2.44 MB
2.44 MB JPG
>>102021359
I still mog him easily
>>
>>102021391
>let's say you want a girl with a chinese dress, not describing it well won't give you what you have in mind
at least when you prompt "a girl with a chiness dress" it should give a girl with chinese dress, and that's not what's happening with CFG = 1, you ask it "pixel art style" and it gives you a regular drawing, wheras CFG = 6 gives you the pixel art styles, I'm sorry, but when you ask for something you should get it, that's how a good prompt adherance should be >>102021309
>>
>>102021408
Maybe it's your workflow at fault, I tried the pixel art prompt and got gens according to it.
>>
>>102021430
Here's my workflow, if you find something suspicious then tell me: https://files.catbox.moe/e3vvik.png
>>
File: file.png (3.2 MB, 1536x2048)
3.2 MB
3.2 MB PNG
>>
can't you invert a TE to get an image classifier
>>
File: ComfyUI_00011_.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
close enough
>>
>>102021045
Can a homie get the prompt on the picture on right?
>>
File: 19240467.jpg (21 KB, 460x460)
21 KB
21 KB JPG
>Casually discovers how to train on 8gb GPUs and also discovers how to increase speed and lower VRAM usage (By upgrading to the latest pytorch) all by himself on one day and gives out the secrets on his patreon.

Are you ready to apologize?
>>
>>102021602
https://pastebin.com/xUXWBDy2
>>
>>102021608
Grifter.
>>
>>102021608
>ENTERS your github discussion
>POSTS ai pictures of his own face
>DOESN'T understand how anything works
>STEALS your helpful advice
>SELLS it on patreon
>>
File: ComfyUI_32715_.png (3.22 MB, 1920x1080)
3.22 MB
3.22 MB PNG
>>
>>102021638
One day he will come here.
>>
>>102021359
This dude is more interested in money than AI. All of his images are portraits of himself. He doesn't make anything else.
>>
>>102021638
You forgot the part where he posts about it on reddit and refuses to elaborate anywhere but within his patreon.

Also, it's as simple as
 pip install torch==2.4.0 --extra-index-url https://download.pytorch.org/whl/cu118 
in your Kohya venv and disabling the validation when you launch the GUI, if you even use the GUI.
>>
>>102021653
I have no idea how he managed to get thousands of dollars per month on Patreon, his pictures are everything but interesting, always the same pose
>>
>>102021658
It's because he's successfully inserted himself as the authority on LoRAs. If you lacked shame, you could do it to.
>>
these stupid fucking slop AI threads somehow account for around 50% of traffic on /g/
>>
File: ComfyUI_00013_.png (1.03 MB, 1024x1024)
1.03 MB
1.03 MB PNG
>>
File: ComfyUI_04831_.png (841 KB, 1024x1024)
841 KB
841 KB PNG
Stop fixating on some literal whos. You anons are stupid as fuck wasting attention span on any grifting pajeets you comes across.
>>
>>102021686
You probably don't keep up today with Kohya and training development so you're insulated from him, but the man is a legitimate menace.
>>
>>102021698
>but the man is a legitimate menace.
what do you mean?
>>
>>102021675
Yes sir, sorry sir
>>
>>102021704
See
>>102021638
Imagine you go into the github issues section looking at the discussion around training Flux and this guy is every third post, some of which contain giant collages of his face and other just asking the owner of the repo for something barely on topic and other just straight up linking to his patreon.
Then he goes on reddit and teases some fix or secret sauce he's learned and links to his patreon again. You cannot escape him and I am not speaking in hyperbole.
>>
What's your guys biggest LoRA request for Flux? Gonna train something but don't know what.
>>
File: file.png (1.08 MB, 1024x1024)
1.08 MB
1.08 MB PNG
>>102021686
>>
>>102021745
Can you make a 1930's Germany LoRA?
>>
>>102021745
Always gonna be the WoW screenshot LoRA for me because I can't be bothered to long on and grab the screenshots I need.
>>
>>102021762
I can but it's already good at that so it will be as useless as the Donald Trump LoRA
>>
>>102021763
I can consider that. New WoW or WoW classic?
>>
>>102021745
Frutiger Aero lora
>>
>>102021745
FFX cutscenes LoRA
>>
>>102021763
>>102021781
How did you miss the WoW Mikus they're posted at least once every two days
>>
File: 1713175077529310.png (899 KB, 768x1024)
899 KB
899 KB PNG
oh I figured out what I was doing wrong
>>
>>102021745
>biggest LoRA
no LoRA. finetune the whole thing.
>>
File: ComfyUI_00014_.png (1018 KB, 1024x1024)
1018 KB
1018 KB PNG
>>102021799
special delivery from israel
>>
>>102021781
Classic WoW, of course.

>>102021796
I didn't, but it made me realize that an actuall WoW LoRA would be mind-blowingly accurate because of the fact there is already knowledge on the subject. Right now, the images are only wow-esque
>>
File: ComfyUI_00015_.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
>>
>>102021851
>>102021820

You can probably take these on /pol/ desu.
>>
>>102021861
does /pol/ even have an AI generated shitpost thread?
>>
>>102021861
oh i will
>>
>>102021871
It did when DALL3 came out.
>>
>>102021885
/pol/ should jump on the Flux train
>>
>>102021871
They call it memetic warfare general. Which is pretty cringe desu, but it's really just AI generate politically incorrect stuff. Haven't seen it around lately though. Sometimes I like to be a little spicy with my shitposts too.
>>
File: ComfyUI_32637_.png (1.08 MB, 1280x640)
1.08 MB
1.08 MB PNG
With 12GB VRAM, the biggest lora training batch sizes without overspilling are 4 for 512 and 2 for 1024 (adamw8bit, dim 16 or less).
>>
>>102021891
Looks like I'll have to start the (nigger) party
>>
>>
>>102021332
I don't see why it shouldn't work. There is nothing discouraging the lora from making the association that all humanoids should be Costanza if that's all you show it with the tag.
>>
>>102021893
Forgive the retard question but what does batch size do? I've always left it at 1
>>
>>102021929
you can make it work by adjusting the strength of the lora, but I'm not willing to do a series of unload/reload the model just to find a specific setting that will only work for this specific lora combinaison, it's too tedious
>>
>>102021937
As you generate or train batch size is the number of images you're processing at the same time. With enough VRAM you can bake multiple images at once.
>>
>spend literal 5+ hours having chatgpt send me on some wild goose chase to fix "pytorch isn't compiled with libuv" error when trying to use multi GPU training with kohya sd scripts
>install 5765746 different things it says I need
>nothing works
>tired, angry, could've been training a single GPU lora during this time
>uninstall everything
fuck this why the fuck is it so retarded
only thing I can even find related to the error are some posts from 3 weeks ago on an unrelated github saying to downgrade to an older torch version which didn't fix it either.
why is this shit so retarded
it's bad enough I have to rely on gloo because I don't want to use troonix
I'm so goddamn tired
>>
>>102021638
I make at least 100k a month and I have OVER 12000 FANS on Patrron. No joke but I will explaing to you right now you are nothing to me..

But if you wanting to be something I can helping you with private session that will making you understand how to train flux with 3gb Vram.

It will only cost 3000 dollar. Please contact me soon because this is limited offer, if you lose out you will remaining NOTHING to me.
>>
>>102021937
It speeds up the process at the cost of accuracy, with character loras the best results can be achieved with batch size 1, but for style loras you can crank it up as high as possible without downsides.
>>
File: ComfyUI_32635_.png (1.32 MB, 1024x1024)
1.32 MB
1.32 MB PNG
>>
>>102021959
>>102021986
Makes sense. Thanks
>>
>>102021980
Hey man I respect the hustle though, keep it up.
>>
>>102021737
He's also on my discord
>>
>>
>>102021686
The diesrespect... this guy isn't a WHO, he IS the number 1 hustler in the scene.

Your comment shows the ignorance you have on this guy, he's a million times higher than any random grifter.

You have to respect how hard this guy goes.
>>
>>102021737
the wildest part for me is that his reddit thread is full of literal retards cheering him on like he did some amazing, mind breaking, community bending thing

In my head cannon this guy is the furniture room spazz and he needed it as part of his patreon grift. It's why shortly before his recent "revelation" there were a bunch of similar troll posts trying to egg 12gb lora creators into showing everytime they did. I really don't know if I hate this guy more or the people who support him.
>>
>>102022128
i think he should be killed, personally
>>
>>102022173
No this is not how we deal with people in our same hobby, we respect each other even if they slightly annoy us.

Do NOT say anything like that again.

Yeah he's a hustler, but he's OUR hustler. Share some of that love man.
>>
>>102021980
Down with consumerism. I will pay more than what I need to purchase a 4090 so I don't need it. Sign me up.
>>
>>102022128
Idk about that. SD was easy money, even I made around 40k without trying too much
>>
>>102022173
at the very least several papercuts to the inside of his eyelids and jalapeño juice dripped in each pupil
>>
>>102022182
>>102022173
>>102022164
>>102022128
Can you guys take it back to your containment thread? Thanks in advance.
>>
Quickly somebody change the topic.
>>
>>102022209
I need a minimum of several more a-logging posts towards that guy to feel satisfied. they don't have to be my posts, they just have to exist...
>>
>>102021444
Here is the result of proper prompting, the pic fed to the LLM is just two pixel art pics of Miku and the Sailor girl.
The secondary description of the characters and art style applied helps the main prompt in generating a more accurate image.
https://litter.catbox.moe/1jv01h.png
>>
sweaty futa butthole lora for flux status?
>>
File: 00009-1342.png (1.05 MB, 896x1152)
1.05 MB
1.05 MB PNG
>>
File: Capture.png (27 KB, 538x426)
27 KB
27 KB PNG
Training on Kohya and once my vram was close to full (but not) it would start utilizing system ram and slowing everything to a crawl and never recovering. This Nvidia control panel setting fixed that. Posting incase anyone else comes across a similar issue, it was driving me insane
>>
>>102021750
>>102022124
>>102022289
neat
>>
>>102022296
thanks babe, wait..no patreon? Respect
>>
>>102022173
we all know you'd do it too anon, earning thousands of dollars a month doing nothing
>>
>>102022308
Yeah must be why that anon is so upset, he sees the guy as competition.
>>
File: ComfyUI_00019_.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
last one i promise
>>
>>102022255
>>102022255
Here are some further gens as example
>https://litter.catbox.moe/1g2moo.png
>https://litter.catbox.moe/sl3ci9.png
>https://litter.catbox.moe/d7725f.png
>>
>>102022314
Nah he's a cunt. Entering discussions unprompted on local training, offering nothing and shilling your paid Patreon to people who are explicitly NOT looking to pay for online training/renting GPUs/etc. is just a shitty thing to do
>>
is there a way to connect comfy to an LLM via oobaschlongo? I got the ollama thing working but I don't like ollama.
>>
>>102022265
Try other thread and just ask the resident tranny for a selfie
>>
>>102022337
I don't see what's new compared to what you said before, yes, boomer prompting works, but my point is that boomer prompting increase prompt understanding, CFG also increases prompt understanding, the conclusion is that having (boomer prompting + CFG) will increase prompt understanding even further, I don't see why we should avoid one tool or the other when we can actually combine the both of them and having something even better and more consistent
>>
>>102022353
CFG slows it down a little, which is why I said it's fine if you can afford it, but you can still get the same results even without it and more faster.
>>
>>102022370
but it won't be the same results, let's say boomer prompting adds a +2 points in prompt understanding, and CFG adds a +2 points in prompt understanding, combining the both of them adds a +4, you'll always win more by combining the two of them, but yeah I get it, it's slower and maybe some people are content enough with boomer prompting, but if one day you notice that boomer prompting still isn't accurate enough, you can think of CFG combine as another solution
>>
>>102022340
this. he could be making $0 on his Patreon and the behavior would still be annoying/deplorable. kind of person you genuinely just want to kick in the teeth for good measure.
>>
>>102022395
Petty much that
>>
>>
>>
>>102022415
>the face is burned in
fucking kek
>>
>>102022415
>>102022422
Can you post the lora on civit lol
>>
File: cihuatlamacazqui.jpg (1.82 MB, 1920x1072)
1.82 MB
1.82 MB JPG
been having a blast putting my old schizo dall-e 3 prompts into flux
>>
any models for generating svg images for icons and shit?
>>
File: 00002-3490383230.png (1.28 MB, 896x1152)
1.28 MB
1.28 MB PNG
>>
Are you not happy with 12b params?
>>
File: 00005-1415479755.png (1.22 MB, 1152x896)
1.22 MB
1.22 MB PNG
>>102022561
I prefer 2B
>>
>>102022561
I'm not happy in general.
>>
You could get Flux quality in under a billion parameters if you know what you're dong, which Black Forest Labs does not.
>>
>>102022596
go on, show us how it's done
>>
>>102022596
and who does know what they are doing?
>>
>>102022596
Ok Lykon, you say that and your 2b model can't even lie women on grass
>>
>>102022604
smart people
>>102022603
NDA
>>102022609
I'M NOT LYKON
>>
>larping
>>
>>102022615
confirmed for salty comfy
>>
>>102022481
No, I cannot
>>
>>102022621
dismiss me at your own peril
>>
>>102022635
i shall
tits or gtfo
>>
>>102022629
this fuckface needs to go. I'd rather see 100 brazilian bubble butt babies
>>
>>102022596
>flux quality
flux quality isn't good thoughbeit. its clearly not much better visually than a 3b model.
>>
>>102022629
Could you prompt him being smug and standing in a triumphant pose next to a "very sad old ancient Greek philosopher from 1000 BC, wearing a chiton and a laurel wreath, the philosopher looks dumbfounded and in despair, he holds his chin with one hand in an exaggerated thinking pose"?
>>
>>102021045
can 1girls do any other pose?
>>
File: 00160-3311128505.png (1.03 MB, 832x1216)
1.03 MB
1.03 MB PNG
>>102022812
>>
>>102022696
These aren't my images. These are the ones he posts in the GitHub discussions and Reddit
>>
>>102022812
nope.
>>
>>102022870
Alright, that was unexpected. Made me look dumbfounded and in despair and hold my chin with one hand in an exaggerated thinking pose while giggling.
>>
File: 00015-2744568527.png (1.29 MB, 832x1216)
1.29 MB
1.29 MB PNG
>>
>>102022833
now get rid of her buttchin
>>
>>102022812
>>
>>
>>102022874
Nice
>>
File: 00102-2523225278.png (1.18 MB, 896x1152)
1.18 MB
1.18 MB PNG
>>102022936
>>
>>102022870
There's enough of his face in those images to train a lora and post him on civitai.
>>
>>102022977
now make them kiss and lesbian sex
>>
>>102022991
That would be a pretty funny way to get back at him for posting his face everywhere. Call the model: the grifter
>>
buy an ad
u
y

a
n

a
d
>>
File: art.jpg (2.63 MB, 2048x3072)
2.63 MB
2.63 MB JPG
>>102022812
there you go

I can't tell if this is skill issue, flux issue or strictly schnell issue. I know dev looks better but idk if it's also better with anatomy
>>
>>102023021
last one is a burner, bro
>>
>>102023029
they are all fucking deformed
>>
File: ComfyUI_00006_.png (3.53 MB, 1536x1536)
3.53 MB
3.53 MB PNG
>>
>>102021359
This nigga thinks he knows somthing
>>
File: ComfyUI_00008_.png (3.35 MB, 1536x1536)
3.35 MB
3.35 MB PNG
>>
File: 1718618526507520.jpg (92 KB, 800x1198)
92 KB
92 KB JPG
>>
File: JealousFaggot.jpg (34 KB, 400x400)
34 KB
34 KB JPG
https://xcancel.com/Lykon4072/status/1826357045026755039#m
>Impressive how you still get better results on a 6+2gb model using 8 steps. We've been going backwards
>>
File: 2024-08-22_00197_.jpg (565 KB, 2304x3072)
565 KB
565 KB JPG
Anyone else experienced that when using "--fast" in comfy for making gens ~20% faster they get incosistent gens? I mean I can gen a pic, then gen a next, go back an increment, and the gen comes out ever so slightly different when using --fast, just like wiht xformers before they patched. Anyone seen experiencing that to? Or is my GPU spooked?

pic related cause ...
>>
File: 2024-08-22_00195_.jpg (568 KB, 2304x3072)
568 KB
568 KB JPG
>>102023198
... this is the exact same same gen just after another gen then going back, but there are obvious differences.
>>
>>102023208
>this is the exact same same gen just after another gen then going back
nigga just say same seed why do you have to complicate things
>>
Trying to caption these WoW images for a style LoRA but it takes way too fucking long. Even using JoyCaption. Stupid GPU time limits
>>
File: lol.png (1.4 MB, 1024x1024)
1.4 MB
1.4 MB PNG
>>102023183
>>
File: 2024-08-22_00181_.png (1.39 MB, 1152x1536)
1.39 MB
1.39 MB PNG
>>102023216
kek, sorry to implicate your brain
>>
>>102023222
The model already kind of knows WoW, short prompts on specific things you want add should be all that's really necessary. A close up shot of a mage in netherwind or something. A view of the searing gorge etc, a few specific captions and some uncaptioned ones to fill the "vibe"
>>
Does anyone have a JSON file on hand for Flux finetuning in Kohya?
>>
>>102023237
I don't believe you. For style expecially, captioning is important
>>
File: ComfyUI_32721_.png (2.76 MB, 1536x1536)
2.76 MB
2.76 MB PNG
>>
>>102023280
What? No. It's the exact opposite.
>>
File: again.png (1.81 MB, 1536x1024)
1.81 MB
1.81 MB PNG
>>
>>102023325
So this is what futuristic memes will look like
>>
>>102023336
This is what now memes look like
>>
File: FLUX__00001_.png (915 KB, 1024x1024)
915 KB
915 KB PNG
3 weeks already, this year is disappearing quick
>>
Thinking of dabbling in data prep for my first lora ( no experience), watching some yt vids and can see a nice basic workflow in comfyui for tagging, they are using Florence for a japanese woodblock lora, are some Vision llms better than others?
>>
>>102023366
Only in nsfw department.
>>
What are the best options for local video generation? Are any local models good yet?

like this level of uncensored/good: https://www.instagram.com/reel/C-8N-oIhj53/?utm_source=ig_web_copy_link
>>
File: ComfyUI_32724_.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
>>
>>102023398
>Are any local models good yet?
no but the guy's that made Flux are working on one
>>
>>102023436
I would LOVE to see what eldritch abominations a quantised text>vid model produce
>>
I bought a new case instead of a CPU. This will make my gens faster, right? It has RGB. I can set them to blue for cooler temps
>>
>>102023459
kek, hello me from 2 weeks ago
6 case fans actually keeps my cpu and gpu at around 60 under load, so I can sleep easy knowing I won't die from monoxide inhalation
>>
>>102023459
What prompted the case over the new cpu?
>>
>>102023478
tell me you're joking, anon
>>
>>102023484
Yes I'm joking, I don't fear death, I welcome it
>>
Since another Anon mentioned that Flux was also trained on French and German prompts, I've been experimenting a little and discovered that:
1.) certain concepts Flux can understand better in one language than another. For example, prompting German „Windmühle“, even specifying „Holländerwindmühle“ or „niederländisch“ still generates mediterranean-looking, small, round windmills made of stone. “Dutch windmill” in Englishmeanwhile immediately gets you a typical Dutch windmill.
2.) You can freely mix languages in a prompt and the model will understand it. Pic related is from my current tests. Prompt:
>Ein pittoreskes sommerliches Landschaftsbild geprägt durch das sanfte Auf und Ab of rolling hills, hohes Gras und Wildblumen, einsame grüne Bäume, dem tiefblauen Meer in der Ferne und saftem Sonnenschein.
>Nahe der Küste steht ein einstöckiges rustikales niederdeutsches Fachhallenhaus mit roten Backsteinen und Krüppelwalmdach, daneben a large Dutch windmill with four blades and with clapboards.
>>
I have a bunch of images in subdirectories. Is there a tool that will show me each image in turn with a text area where I can write the caption and press ctrl+enter to go to the next one or something like that?
>>
>>102023515
>Since another Anon mentioned that Flux was also trained on French and German prompts
It wasn't
>>
>>102023515
Interesting stuff
>>
>>102023459
>blue
Yellow.
>>
>>102023179
I hate when she stiffens her lips like that
>>
is the capitalisation of proper nouns true? does it make any difference at all
>>
>>102023622
it does for T5
>>
>>102023622
It's a basic courtesy when talking to a machine with superior grammar skills.
>>
File: 2024-08-22_00105_.png (2.04 MB, 1024x1280)
2.04 MB
2.04 MB PNG
>>102023622
yes it does and its especially important if you quote artist styles.. "by vincent van gogh" works less good than "by Vincent van Gogh" (yes the van is smol officially)
>>
Is joycaption available for download?
>>
>>102023703
Nevermind, found the repo
>>
I've trained a lora on flux using AI-Toolkit and while the lora clearly works on the sample images created during training, when I try to use it in forge it has zero effect on the image. how did i screw this up?
>>
>>102023482
Waiting for the next gen so I can come in when the price drops and pick up a cheaper 14700k. Case is to prepare for the new build.
>>
>>102023549
I've been prompting mostly in German today and it understands it as well as any English prompts.
>>
File: FD_00241_.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
>>
File: ComfyUI_32732_.png (2.33 MB, 1536x1536)
2.33 MB
2.33 MB PNG
>>
>>102023740
yes but that doesn't mean Flux was trained on French and German prompts
>>
File: FD_00242_.png (931 KB, 1024x1024)
931 KB
931 KB PNG
>>102023741
>>
>>102023759
How does this work then? Does it translate prompts into English?
>>
>>102023772
T5 has incidentally seen some languages other than English
>>
File: bComfyUI_107331_.jpg (258 KB, 768x1024)
258 KB
258 KB JPG
>>
File: FD_00244_.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
It's really hard to train away the butt chin
>>
I made my own automatic mass captioning script with joycaption and I want you to know it.
>>
>>102023820
your captions are full of mistakes and I don't even have to look at them to know
>>
File: FD_00246_.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
>>
>>102023820
I found Florence 2 to be significantly better for captioning.
>>
>>102023824
I review them manually to ensure my fetish is properly represented.
>>102023836
I'll take a look, thanks.
>>
>>102023841
Running this too, really quick, really easy.
https://github.com/jhc13/taggui?tab=readme-ov-file
You could feed joycaption into it if you prefer that.
>>
File: bComfyUI_107358_.jpg (261 KB, 768x1024)
261 KB
261 KB JPG
she cute
>>
>>102023850
This is great, actually. Thank you.
I'm running out of disk space. I should've installed all this in my main partition.
>>
File: 2024-08-22_00276_.png (1.1 MB, 1024x1024)
1.1 MB
1.1 MB PNG
>>102023703
>>102023715
thanks anon for making me aware of joycaption, amazing tool makes great prompts out of my old sdxl pictures for flux, pic related
>>
In a1111 I can see the image being generated in real time, notice the extra arm starts growing at step 8, and add this to the negative prompt.
>[[extra arms:8]::10]
This just nudges the model in the right direction by applying it only between steps 8 and 10 and it barely changes the final result.

Is there an easy way to do it in ComfyUI? From my newbie perspective looks like I have to add a new sampler and separate prompt nodes for every change.
>>
File: ComfyUI_32735_.png (2.71 MB, 1536x1536)
2.71 MB
2.71 MB PNG
Sharing a lora for generally depressed dead-eyed girls with sorta realistic backgrounds.
https://mega.nz/folder/mtknTSxB#cGzjJnEqhEXfb_ddb6yxNQ
Artist https://xcancel.com/toyonosaki/media
>>
File: image.jpg (91 KB, 1536x1024)
91 KB
91 KB JPG
>>
>>102021657
And it's funny because redditors are retarded and praise him
>>
>>102023850
>https://github.com/jhc13/taggui?tab=readme-ov-file
how would I add joycaption to this
>>
File: ComfyUI_32740_.png (1.38 MB, 1152x1152)
1.38 MB
1.38 MB PNG
Baking took 3 hours on a 3060 with 1024x1024, batch size 2.
>>
File: ComfyUI_03869_.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
>>102021045
>collage has less and less flux images as time goes on

It's over, isn't it? Flux was a flop.
>>
>>102023977
how many steps sweetheart?
>>
>>102023991
>it's over
What is? This is the Local Diffusion General, sir.
Also, if what I'm doing is any indication, most vram flux chads are busy training.
>>
>>102023991
How do you know they aren't flux? they're so many loras now it's hard to say
>>
its over.... Fuck AMD. Fuck Nvidia.
Antitrust law is fucking dead.
We should import chinese vram-rich GPUs.
It's sad that commies are doing well now.
>>
File: image.jpg (93 KB, 1536x1024)
93 KB
93 KB JPG
>Flux was a flop.
there is literally no one on the internet who has this opinion organically
>>
>>102024015
It's never sad that someone is doing well.
>>
>>102024021
this
>>
File: 2024-08-22_00284_.png (1.37 MB, 1024x1024)
1.37 MB
1.37 MB PNG
>>102023991
the few ones that claim flux is a flop are the coomers that desperately seek a fix cause the finetunes are still baking

>>102024021
this.. so this.
>>
>>102024015
Imagine a chinese company comes to beat nvidia
>>
File: FD_00255_.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
>>
>>102024031
yeah they are in the middle of an edging session and want everything now before they cum.
>>
File: FD_00251_.png (1.29 MB, 1024x1024)
1.29 MB
1.29 MB PNG
>>
I can't find any style lora on civit or anywhere that both trained enough and doesn't degrades the anatomy and more overall stuff. Am I the only one that noticed it
>>
>>102021309
and in forge?
>>
File: file.png (642 KB, 1024x1024)
642 KB
642 KB PNG
>>102023966
super cool
>>
File: Capture.jpg (633 KB, 3840x1785)
633 KB
633 KB JPG
>Midjourney allows everyone to use their site API now and give us 25 free gen per week
They're terrified of Flux isn't it?
>>
>>102024057
No, that's why I'm training my own. I'm tired of wading through sludge, so I wade through my own and learn things in the process. Plus the result is catered to my tastes.
>>
>>102024069
can I gen anime there or is that a different thing
>>
>>102024063
On forge you can only use Dynamic Thresholding
>>
File: bComfyUI_107383_.jpg (264 KB, 768x1024)
264 KB
264 KB JPG
>>
What is currently the best way to generate stylized fonts/logos? Are there specific checkpoints/models for this or is it basically a "use flux" situation? I only really need the text/logo, not an entire photo that has non-garbled text as a bonus.
>>
File: ComfyUI_32742_.png (2.01 MB, 960x1920)
2.01 MB
2.01 MB PNG
>>102023992
20 epochs with 35 images, 1 repeat, so 700 (350 with batch size 2), dim and alpha 16, learning rate 0.0001 (gonna try 0.0002 next)
>>
File: 2024-08-22_00286_.png (1.32 MB, 1024x1024)
1.32 MB
1.32 MB PNG
>>102024047
kek
>>
File: Capture.png (235 KB, 2719x1627)
235 KB
235 KB PNG
>>102024084
dunno how you can find the Midjourney Niji here
>>
>>102021054
Wow. The model i made is shared here. Im honored.

Please if you use it share the results, just send them directly to my civitai account.
>>
>>102024034
It would be better to have another AI winter.
If things go well and AGI/Super AI is born, I think the West will be defeated, at least technologically.
>>
>>102024084
there is a niji phone app
>>
>>102024069
MJ doesn't even let you prompt for swimsuits and you get blocked after 2 wrongthinks according to orange reddit
>>
>>102024101
Nice, not bad, better than the 12 hours that Turkish guy said.
>>
File: Capture.jpg (452 KB, 3482x1070)
452 KB
452 KB JPG
>>102024069
ehh, I thought MJ would've nailed Picasso, it's barely better than on Flux
>>
File: file.png (169 KB, 1071x1230)
169 KB
169 KB PNG
Say thank you /g/.
>>
File: xay1ip.png (2.54 MB, 1280x1024)
2.54 MB
2.54 MB PNG
>>102024155
>it's barely better than on Flux
You're wrong
>>
>>102024094
I've done that with dall-e 3 with varying degrees of success

>>102024106
eh guess I'll give it a shot anyway

>>102024135
I don't understand their retardation with not having shit be easily available on their website but I suppose this works for their target audience

>>102024140
I wonder which one is worse, MJ or dall-e 3
>>
File: 2024-08-22_00292_.png (1.34 MB, 1344x768)
1.34 MB
1.34 MB PNG
>>102024069
kek, fucking 25, I can't
>>
File: 2024-08-22_00277_.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>>102024155
this >>102024168 , when used in the right hands FLUX is far superior
>>
>>102024168
I'm sure you can get the same result on MJ if you use the same boomer prompting, I was just going for "Picasso Style" there, let's compare apple to apple and be fair
>>
>>102024165
thank you
>>
>>102024165
He did the meme!
>>
>>102024171
>I don't understand their retardation with not having shit be easily available on their website but I suppose this works for their target audience
me neither, I really don't get them, why going for Discord or a fucking phone when sites exist
>>
>>102024165
This guy is making like 100k dollars a month and lives in Turkey where that money is waaay better.

And yet...he doesn't slow down. He's HUNGRY
>>
>>102024165
I hate this sandnigger cunt so much. Always spamming his fucking ugly mug all over r*ddit to shill his overtrained garbage
>>
File: bComfyUI_107468_.jpg (337 KB, 768x1024)
337 KB
337 KB JPG
>>
>>102024227
that's why I blocked him on leddit, he's insufferable
>>
>>102022596
True. Flux is bloated af
>>
>>102023366
Some VLMs take 10-20 seconds per image and some take 1 second (Florence)
>>
>>102022596
yea but 4B is images of text lel
>>
File: 1girl1cup.jpg (93 KB, 1536x1024)
93 KB
93 KB JPG
i think boomer prompting and having a 2:1 ratio between "young girl" and "young woman" helps with getting teens consistently
>>
>>102024220
>Being the only man in Turkey who knows how to navigate a github repository makes you a professor of AI and earns you 100k a month
God I wish I was a third worlder
>>
Is this an OK caption for Flux?
>The image is a high-resolution, explicit photograph focusing on the lower body and genital area of a nude, light-skinned female with long, straight, light brown hair cascading down her back. She is positioned with her legs slightly apart, knees bent, and is holding a black, twisted leather paddle or flogger in each hand, pulling it towards her buttocks and exposing her vulva. Her nails are manicured with a pink and white design. She wears a silver bracelet on her left wrist. The background features a plain, off-white wall with minimalistic, possibly indoor decor, suggesting the setting is intimate and private. The text [removed] is superimposed at the bottom of the image in white, bold letters, indicating the source or possibly the subject's online presence or affiliation. The overall atmosphere of the photograph is provocative and focuses on erotic themes, intended for an adult audience. The lighting is soft and even, highlighting the smooth texture of her skin and the material of the leather accessory.
>>
File: Capture.jpg (232 KB, 1855x1407)
232 KB
232 KB JPG
>>102024165
Holy cocksucker
>>
>>102024284
I genuinely believe that he sockpuppets his posts because I can't imagine anyone actually wanting to see his "content"
>>
first epoch of wow classic LoRA style
>>
>>102024276
ok Teebs
>>
>>102024298
>I can't imagine anyone actually wanting to see his "content"
yet he's making thousands on his patreon, am I living on an alternate universe or what? this retard doesn't deserve a single penny
>>
>>102024301
>>
>>102024142
Is there a basic rundown of how epochs, images, batch size, repeats, everything else, and steps interact? I'm using the default ai-toolkit batch size 1 with steps bumped up to 3500 (because I still noticed some slight anatomical errors with 2000). But I have no idea what I'm doing. You don't even have configurable epochs or repeats in this config file, apparently.
>>
>>102024313
>>102024301
And this is what it thinks a Tauren is so far.
>>
>>102024313
>>102024301

Already looking better than base
>>
>>102024301
>>102024313
>>102024325
>512x512
>>
>>102024315
This
I only ask because I can't really see a link on this: how do you train on flux? Can kohyaSS do it now?
Kinda want to train a world of horror 1 bit lora.
>>
>>102024335
yes, why would I sample on higher res? I am training at 1024x1024 don't worry.
>>
>>102024335
feels more authentic
>>
I think one of the reasons loras fuck up text generation is because of automatic captioning getting the watermarks from porn sites wrong.
>>102024346
I don't know if you're the 16GB anon from yesterday (to whom I already told this twice, forgetting he did not have 24GB), but you can train with ai-toolkit.
>world of horror 1 bit lora
Fuck yeah. Upload a dataset and I'll do it if you can't.
Oh shit, I just realized the potential for making mods. Got shivers down my spine.
>>
>>102024315
batch size*repeats*epochs=steps
>>
>high res images
>joycaption does not mention resolution at all
>shitty screencap from a video
>"THIS IS A HIGH DEFINITION IMAGE..."
>>
>>102024401
Yes, that much I know, but is there somewhere I can read about what each represents? Should I increase/decrease one or another in a particular scenario and why?
>>
>>102024406
I don't get it, you need 20 pictures to make a good lora from Flux, why can't you do the captions by yourself at this point?
>>
>>102023396
>>102024262
ty.
>>
>>102024401
>batch size*
images*
>>
>>102024406
This is why I prefer Florence2.
>>
File: image.jpg (85 KB, 1536x1024)
85 KB
85 KB JPG
>>
>>102024428
You need about 100 for a style LoRA
>>
>>102024389
For what's worth I have a 4090. I'm not that guy but I'll take a look at the toolkit

Currently trying to make a dataset by ripping the game files
>>
>>102024457
that's completely possible to do that by hand
>>
>>102024468
Tedious as shit, plus Flux takes way better to VLM captioning than it does human captioning
>>
Another loaf of bread is right here.,.
>>102024460
>>102024460
>>102024460
>>
>>102024481
I mean, why not both? you let the VLM do its flowery prose shit and you add some of your shit to correct the wrong stuff it has probably said
>>
>>102024428
I have about 200 assorted porn pictures to try and get Flux to understand what a naked woman looks like. They're not too many to proofread, but I'd rather have a good boomerprompt base.
>>102024467
Alright. Looking forward to that WoH lora.
>>
>>102024057
I just want to see an artist style Lora that works well enough to give me a glimmer of hope, that Loras can do it. The sample images of most Loras I've tried seem cherrypicked.
>>
>>102021745
lohse from divinity original sin 2



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.