[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1724354373248384.jpg (2.24 MB, 2311x2923)
2.24 MB
2.24 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102030143

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>GPU performance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/c/kdg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/u/udg
>>>/trash/sdg
>>
File: delux_me_00051_.jpg (295 KB, 896x512)
295 KB
295 KB JPG
>mfw
>>
File: Untitled.png (607 KB, 1103x826)
607 KB
607 KB PNG
Will report back in a couple of hours, if not sooner.
>>
Blessed thread of frenship
>>
>>102033938
neck yourself
>>
File: v69.png (63 KB, 1058x466)
63 KB
63 KB PNG
Thank god, he's not going to waste anymore time time with SDXL.
>>
>>102033957
When he said he was making a flux backup, is that a confirmation that it IS coming or that he will make one if Auraflow turns out to be shit, which it will? Can he train both at the same time?
>>
>>102033939
>>102033947


The timing
>>
File: sfw_pony.jpg (2.27 MB, 2664x3456)
2.27 MB
2.27 MB JPG
skill issue. I was trying to temp it and it just works.

my prompt:
score_9,score_8_up,score_7_up,score_6_up,source_anime,rating_safe,nature scene,national geographic,
BREAK
a night sky showing the the galaxy arm of the milky way,
BREAK
planets that look like boobs,moons that look like asses,asteroid crash,nature,background,
>>
Do blip work for tagging images for flux as well or I should use a different tagger to train a flux lora in kohya?
>>
File: 2024-08-23_00008_.jpg (243 KB, 1280x1280)
243 KB
243 KB JPG
>>102033918
thank you kindly baker
>>
>>102033966
My interpretation is that if Auraflow turns out to be shit then he will go with Flux Schnell. Auraflow is an unfinished proof of concept, there are many unknowns.
>>
first i ever heard about auraflow. how does it compare to flux?
>>
>>102033993
I've never heard of it either
it's not looking good is it
>>
File: FD_00087_.png (651 KB, 1024x1024)
651 KB
651 KB PNG
>>
>>102033988
>Flux Schnell
ugh, everything this guy does is like a monkey's paw wish.
>>
>>102034003
maybe he can make schnell better than dev
you gotta believe!
>>
>>102033993
The prompt following is as good as flux but the image quality is worse than SD 1.5

It's really more of a personal project. it's also trained exclusively on ideogram outputs (aka AI generated images) which is retarded and will like propagate many issues down the line.
>>
>they dont know aura flow
>nsfw cat completely forgotten
rip
>>
>>102034019
digital inbreeding, ew
>>
>>102034019
>The prompt following is as good as flux but the image quality is worse than SD 1.5
that would be retarded. i guess he doesnt care because its all cartoons anyways, but i'm more interested in the realistic finetunes
>>
File: FD_00090_.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
New World of Warcraft™ Classic expansion is up, with pre-order bonus sonic skin in collaboration with SEGA®
>>
File: 00022-3538182420.png (1.13 MB, 1216x832)
1.13 MB
1.13 MB PNG
>>
>>102033974
This is a blue board.
>>
>>102034046
>Sonic in hillsbrad foothills
kino
>>
>>102033974
turn down the CFG holy fuck
>>
File: 00000-AYAKON_1248176.png (1.31 MB, 1280x1280)
1.31 MB
1.31 MB PNG
sexo demon
>>
>>102033918
Kino Kollage
>>
File: 00014-AYAKON_1248182.png (1.47 MB, 1280x1280)
1.47 MB
1.47 MB PNG
>>
File: FLUX_00028_.png (1.03 MB, 896x1152)
1.03 MB
1.03 MB PNG
>>
>>102034096
sorry sir

>>102034115
it is where it should be.
>>
>>
>>102033938


>100 steps in
>>
>>
>>102034168
could look more homeless
>>
>>102033938
>>102034168
Doing gods work, make sure to dump it on civitai with a description that reads like you're a massive fan of his, maybe add in whatever that redditor wrote about toiler paper from the other thread.
>>
>>102034185
I can't change the prompt until it's done training now.
Though I assume he will only get more clean cut and less homeless as the training continues lol.
>>
>>102034200
you can edit the prompt, it doesn't get cached
>>
File: 00405-920590328.jpg (477 KB, 1024x1536)
477 KB
477 KB JPG
>>
File: 00028-492778958.png (1.09 MB, 832x1216)
1.09 MB
1.09 MB PNG
>>
Is using flux in Forge as basic as just dropping the models into a folder? Is there a guide?
>>
>>102034209
It's thrown an error for me every time I try. Something about gradients not being true.
>>
>>102034168
guys...maybe you are going too far.

Let's take a moment to thing...are you sure you want to do this?
>>
>>102034222
You put the flux model file in your /models/stable_diffusion folder. Assuming it's installed correctly, that should be the start and end of it.
>>
>>102034222
https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/1050

Follow and copy the info here
>>
>>102034138
Can they get bigger tho?
>>
>>102034237
Oh neat. Which should I use with a 3080 and 64gb RAM?

Or 10GB VRAM not enough?
>>
>>102034229
What are you talking about? This is just a model based off a short scrape of the reddit stable diffusion sub.
>>
>>102034266
oh.....ok....
>>
>>102034255
>Or 10GB VRAM not enough?
That's really pushing it.

I think you can use the q4 gguf models and have it run, but it won't be fast.
>>
>>102033938
You data source is actually more extensive than his lmao

Pic related is what he trains from
>>
File: file.png (43 KB, 857x325)
43 KB
43 KB PNG
>>102034168
You better not publish this nor release any more images of him.

Like I said on reddit. You're all plebs who don't see the real value he's providing. A $5 value. Now if you don't like it, I suggest you take it up with the 4chan moderators. I will be reporting your posts for the foreseeable future.
>>
>>102034269
fucking njudea skimping on vram
>>
File: 1714843419852212.png (626 KB, 1152x896)
626 KB
626 KB PNG
>>
>>102034278
No way that screenshot isn't him.
>>
>>102034270
>no full body pictures
why
>>
>>102034270
It's no wonder his results always have the same feel based on this dataset
>>
>>102034297
He's turkish so he's probably not wearing pants.
>>
>>102034270
Even the original dataset feels like AI
>>
File: 2024-08-23T123126.445.jpg (552 KB, 1536x2688)
552 KB
552 KB JPG
V2 WoW Classic LoRA is finished and on Civit. Happy with this one.
https://civitai.com/models/673741?modelVersionId=756382
>>
>>102034278
This has to be someone from here right? lol

How old is the account?
>>
>>102034312
Great job my dude. I actually appreciate that you went out and did the thing you said you would and got results.
>>
>>102034285
unique style
>>102034312
neat
>>
File: ComfyUI_00749_.png (1.16 MB, 1024x1024)
1.16 MB
1.16 MB PNG
>>
i wish i had 128gb vram
>>
File: 00030-1289472002.png (1.04 MB, 1216x832)
1.04 MB
1.04 MB PNG
>>
>>102034327
What would you do with 128gb of vram?
Me? I'd probably just go full time fine tune grifter and collect cash for making fine tunes.
>>
File: FLUX_00620_.png (1.14 MB, 832x1216)
1.14 MB
1.14 MB PNG
HIgher steps solve most of the issues a lora brings in
>>
File: 1718844078016307.png (44 KB, 1106x499)
44 KB
44 KB PNG
>update forge
>launch it
>errors out something i've never seen before
fug

How do I fix this it was working a few weeks ago?
>>
>>102034334
procrastinate
>>
File: 00033-1289472005.png (1.01 MB, 1216x832)
1.01 MB
1.01 MB PNG
>>
>>102034339
I have seen this girl before...
>>
>300 steps in

I'm starting to wonder if 3000 steps was overkill
>>
>>102034343
you use it as an opportunity to start using comfy instead
>>
File: file.png (86 KB, 984x651)
86 KB
86 KB PNG
>>102034278
>>102034290
>>102034317

kek this guy is sucking his dick so hard
>>
>>102034348
lmao this is funny.
>>
File: 2024-08-23T124100.966.jpg (646 KB, 2048x2048)
646 KB
646 KB JPG
>>102034318
I have nothing better to do with my GPU at the moment, and never trained a style LoRA before so it was a good learning experience.
What should I do next? Any requests?
>>
>>102034367
That totally random guy sure like to mention what a small investment 5 bucks is for Cerfukin's infinite wisdom.
>>
>>102034390
RuneScape? Or what about throwing a bunch of "old" games in there from multiple platforms
>>
>>102034367
Seems like it's a real person, and they do some pretty lewd loras.

https://civitai.com/user/gurilagardnr
>>
>>102034367
What's the issue? He's hiding his shit behind baitreon?
I could sub and leak it to everyone then charge back.
>>
>buy advice
>give it away for free
this is why monetising knowledge just doesn't work
>>
>>102034278
>>102034367
>>102034403

>I could sub and leak it to everyone then charge back.

Don't worry about it, his shit is already on kemono.

https://kemono.su/patreon/user/83632369/post/110293257
>>
>>102034390
Star Trek TNG LoRA
>>
>>102034397
like a low poly LoRA? There's a few already. I want to do something new.
>>
>>102034367
>>102034400
say thanks for the free ride since you can just grab the training settings from the metadata
>>
>>102034412
Second >>102034411 but it would be cool to do all the older Treks instead of just TNG
>>
>>102034403
He's not even hiding anything of value, he's just taking publicly available knowledge and selling it to the ignorant.
>>
File: 00418-3145738446.jpg (795 KB, 1536x2304)
795 KB
795 KB JPG
sad obsessive troll existence
>>
every
fucking
thread
>>
>>102034424
>it would be cool to do all the older Treks instead of just TNG
Would be cool but they'd probably have to be separate because of how visually distinct most of the shows are. Maybe voyager, DS9 and TNG can go together, but classic trek is a bit too visually distinct.
>>
>>102034408
Please remove this before I sue you all, also you CANNOT using my face and maked lora. Why? BECAUSE YOU ARE WORTHLESS.

Get ot my level before you EVEN THINK you can use my face for anything.

If you want to get to my level I can helping you on my Patreon. So please considering this.
>>
>>102034436
>voyager, DS9 and TNG
maybe the enterprise too but that might be too far off the others idk
>>
>>102033809
>>102033823
thanks again got it running locally now
>>
>>102034438
Only $5! Amazing value!
>>
why is it telling me my 3080 doesn't support torch 2.4.0?
>>
>>102034452
https://pastebin.com/p2PWgJZK

I made a custom app.py script that can do batch jobs for you. It also has a second text editing tab so you can go back and fix the ones you don't like.
The tagging mechanism is kind of bunk though, I might go back and fix it one day.
>>
>risk bumping lora LR up to 5e-4 from 3e-4
>train for 16 hours
>don't check it once during this time
pray for me bros, for all I know e1 is garbled noise and colors right off the bat
>>
>>102034477
when was the last time you updated your nvidia drivers? that's my best guess
>>
>>102034411
Please do t'pol

I fapped to her a million times when I was a teenager watching the Next Generation.

https://www.youtube.com/watch?v=_5qJ5t2ABVE

https://tenor.com/en-GB/view/jolene-blalock-t-pol-linda-park-hoshi-sato-enterprise-gif-27656579
>>
>>102034493
how did the 5e-4 one turn out?
asking for a friend
>>
>>102034012
Schnell is doomed to always be a turbo model with 4 step generation and I don't see how you can extend it to not act like one because that would be a novel thing no one has done before and that's on top of having practically no work has been done on training it vs dev. Given the guy doesn't know enough of his shit and is a script monkey with resources, I don't have that much hope but at least we can see how far Schnell can go. Auraflow shouldn't be an option until 1.0 releases in my opinion as it's way too underbaked but I can easily see the guy doing it on some prerelease version like 0.4 or 0.5 with access to it before the public and then building something on it, effectively branching off AuraFlow. Personally, with the data he has, he should be making his own model from the ground up but that requires more than what a person who is doing this training can handle.
>>
>>102034506
I mean Enterprise, sorry my dick was thinking.
>>
File: BuyMyCourse.png (373 KB, 613x387)
373 KB
373 KB PNG
Some more great images for your dataset here anon
https://github.com/Nerogar/OneTrainer/issues/116
>>
>>102034511
*3e-4, the one you did
>>
>>102034506
She still looks good in 2024, I just checked.
>>
>>102034482
NTA looks good, although would be cool fi someone figured out a way to add this to taggui
>>
>>102034411
>>102034424
>>102034436
What about a generic 70s scifi LoRA, primarily trained on OG star trek?
>>
>>102034541
tag gui isn't python, is it?

Anyway, my biggest issue with Joy caption is the model is REALLY REALLY bad at taking in context for the dataset. In the prompt editing text box if you add more than a single sentence the model flips out and dies.

I changed the model in the UI to intern, and it worked better. I was able to include a description of the characters in the dataset and write a prompt that they may be in these images and it did a better job, but even on my 3090, it oomed regularly even at only 8B parameters and had all sorts of 8B jank.

It's looking grim for local image tagging.
>>
https://civitai.com/articles/6309
>I'll be training on AuraFlow first, with FLUX as a secondary backup.
>>auraflow is 16.4 GB
vramlets BTFO.
>V7 will offer generalized styles without direct artist style copying.
>>without direct artist style copying
new PDXL gonna be trash.
>>
>>102034594
no shit
he won't use the best tool for the job because then he'd have to rely solely on patreonbux
>>
File: 00037-1548067257.png (1.3 MB, 832x1216)
1.3 MB
1.3 MB PNG
>>
>>102034511
>>102034531
https://desuarchive.org/g/thread/102013088/#102013347
https://desuarchive.org/g/thread/102013088/#102014650

this was the 3e-4 one trained at 512x512. the 5e-4 one is trained at 1024x1024 instead, goal was to try to reduce the amount of epochs needed to reach success when using 1024x1024

if you're not a vramlet and not using 8bit training I think you don't have to heighten the LR as much, which is why 24gb vram loras are working at 1e-4 but vramlets weren't having success with the same settings
>>
>>102034521
You're late. He already revealed he'll do Auraflow and then Flux as backup in a CivitAI post, in case you think he was revealing it on Reddit only.
https://civitai.com/articles/6309
>>
>>102034619
>Flux as backup
Praying this is not schnell.
>>
>install forge completely from scratch
>it loads
>add in launch args to enable cuda
>get a novel of an error message

Holy fuck how am I having this much bad luck with this
>>
>>102034635
install comfy instead
>>
>>102034613
>not using 8bit training
i thought the fp8 option was required for flux?
do you have a link for 24gb training without it?
>>
>>102034634
it is, because he can't monetize dev. I don't know how any of you have/had hopes for astralite, he's a retard grifter who got lucky with a single bake because he fueled resources into it when no one else cared. even if he trained on flux proper, he'd ruin it with his shitty pruning, fucked up captions, jpeg compression system, burning it to all hell, etc

where the fuck were you people when he was having a bitch fight with lykon and said he didn't know what 'caption dropout' meant
>>
>>102034613
>if you're not a vramlet and not using 8bit training
I think even 24gb (king of vramlets) need 8bit training?
>>
screw you guys, I'm making my own UI
>>
>>102034652
oh my bad, it is, I have no idea what the fuck I was thinking of then. sorry for the confusion on my part
>>
>>102034679
Making a UI is easy, it's all the weird optimization stuff you can't see that's hard.
>>
File: 1709897250523855.webm (2.33 MB, 1280x720)
2.33 MB
2.33 MB WEBM
did it get easier and faster to train sdxl over time or can we expect flux to stay this vram expensive?
>>
>>102034671
yeah, I'm retarded. I thought I'd read something on github about how 8bit training forced lower vram users to require higher LR, but that makes no sense since you guys are right, you do need 8bit for 24gb vram. I must have misunderstood something somewhere
>>
>>102034683
No biggie, was confused too when I saw that flag being true while training, but yeah, flux is too big.
In the end though, do you think there is much of a reason to train at 1024x1024 over 512? I feel like the model and buckets do most of the heavy lifting.
>>
>>102034692
>or can we expect flux to stay this vram expensive?
Maybe, but first you need to understand that the current optimizations we have for training Flux at home are literal technological miracles.
>>
File: ComfyUI_00850_.png (2.53 MB, 1152x1536)
2.53 MB
2.53 MB PNG
baked at multi-res, [512;768;1024] and used t5 attention mask (no idea what that really effects in terms of lora training, but saw it got added as a feature).
One thing I've noticed is training at 1024 produces a darker image, especially in contrast to 512. The multi-res has felt the most 'lifelike' to me
>>
>>102034619
Oh okay, that does suck but understandable given he wants to monetize as soon as possible. I just hope it doesn't dissuade a 1.0 release from AuraFlow. Also interesting was a discussion about VLM captioning, and it seems like the anon who talked about InternVL2 was right, it is SOTA.
>Fortunately, I was introduced to a smaller alternative, InternVL2, specifically the 40B variant (there is also a 76B version of InternVL2 but it was not a noticeable improvement in my tests) . This model proved to be even better, reaching GPT-4 levels of captioning with superior prompt understanding, better OCR, more domain knowledge, and no censorship. As a result of this evaluation, InternVL2 is currently the primary captioning model.
I don't trust the rest of his methodology and his inefficient way going about training but I do trust he did enough research to land on using InternVLM2 for good reason after testing a bunch of models. The rest of everything seems solid too with the exception of his base model choice as already noted.
>>
>>102034719
prompt?
>>
>>102034721
>InternVL2 was right, it is SOTA
It's really good. Just good luck getting it to run locally without ooming.
>>
File: 2024-08-23_00048_.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
>>
>>102034702
I'm not sure yet, part of my 1024x1024 run was to test that (and I kind of messed up the 1:1 comparison potential by increasing the LR). lots of people have said 512x512 has done really well, but I'm noticing a lot of details being messed up on my 512x512 run that I'm almost certain is from bucketing, so it probably depends on your dataset/subject matter
>>102034692
I think they already pulled out all the stops and then some for what was used to optimize SDXL. am certainly praying someone pulls something more out, though, as I'm not willing to suffer linux dual boot again to try to utilize my multi gpus for what might not even amount to a speed increase
>>
I'm hoping these are growing pains or I may have fucked up the LoRA.
>>
File: FD_00102_.png (1.45 MB, 1024x1024)
1.45 MB
1.45 MB PNG
>>
>>102034754
who is this
>>
>>102034780
Better you don't know, once you do, you see him everywhere.
>>
>>102034754
I almost feel like it'd be better if you released a fucked up lora, say you followed his patreon tutorial to do it
>>
>>102034754
You should subscribe to my patreon so that you can actually do this lora properly, bitch.
>>
>>102034783
thats debo?
>>
>>102034749
>I'm not sure yet
The fact that nobody is really sure is a pretty good sign that 512 training is probably "okay"
>>
File: ComfyUI_00621_.png (1.99 MB, 896x1344)
1.99 MB
1.99 MB PNG
>>
>>102034803
realistically, I think worst case you could probably just add cropped areas of what you want to retain more detail to your 512x512 dataset and probably have things turn out ok. at least, that worked well in SDXL loras (albeit 1024x1024 training with crops of larger detailed areas like faces or eyes)
>>
File: ComfyUI_00852_.png (2.03 MB, 1280x1600)
2.03 MB
2.03 MB PNG
>>
>>102034833
Huh, never really considered adding new stuff into a completed LoRA. Does that mean I could in theory add new characters to a LoRA?
>>
>>102034855
What....
>>
>>102034855
Same I always just re-trained a new one from scratch, usually the same dataset with some adjustments.
Are we retards?
>>
>>102034868
As in I could load the weights of a LoRA and plug in a different dataset of another character and add that character to the LoRA.

For example, I could take a star wars aesthetic LoRA and then train in Darth Maul on top of that because he wasn't tagged in the first time.
>>
>>102034884
>Same I always just re-trained a new one from scratch

Same, I might give it a whirl when I get home from work.
>>
>>102034855
I meant add to your dataset as in pre-training, but you could try that too. not sure how you'd balance not overcooking or overwriting the existing epochs but that's not to say it's impossible, in theory --save_state, --resume and load weights ARGs could be used for that goal
>>
File: ComfyUI_00855_.png (1.39 MB, 1024x1280)
1.39 MB
1.39 MB PNG
>>
>>102034921
>>102034907
if you guys test it out post results even if they're horrible, I'm curious
>>
>>102034926
In my head, I wouldn't be cooking over the old LoRA, I'd just add a few steps over the top.
I know best practice is to just include it in the dataset from the beginning but in my mind, injecting characters after the fact into a more general LoRA "feels" pretty elegant.
>>
>>102034157
hell yeah
>>
>>102034950
report back if you try it, it's an interesting experiment
>>
File: ComfyUI_00656_.png (1.74 MB, 1024x1024)
1.74 MB
1.74 MB PNG
>>
File: 2024-08-23_00060_.png (1.23 MB, 1280x720)
1.23 MB
1.23 MB PNG
>>
>>102034907
I see I didn't know that was possible.
>>
>>102034992
Freshest fart around
>>
>>102034995
I don't know if it is possible, but I also don't see why it wouldn't work if you went about it lightly.
>>
>>102035011
Lmao, can't unsee
>>
>>102034784
lmao, do this
>>
File: file.png (1.97 MB, 1568x896)
1.97 MB
1.97 MB PNG
>>102034692
there's a lot of work that can be done on the optimizers end, they haven't even implemented Adam Mini yet and that's a free lunch for fp16 training at fp8 VRAM requirements
>>
File: MarkuryFLUX_00174_.png (980 KB, 832x1216)
980 KB
980 KB PNG
>>
File: 2024-08-23_00061_.png (1.26 MB, 1280x720)
1.26 MB
1.26 MB PNG
>>102035011
lol, made you one where this fits better
>>
File: 2024-08-23_00055_.png (679 KB, 1280x720)
679 KB
679 KB PNG
>>
File: 1.png (490 KB, 744x4078)
490 KB
490 KB PNG
Towards Pony Diffusion V7
https://civitai.com/articles/6309
>>
>>102035101
lol perfect
>>
It uh... not looking good. I may have to pull the plug.
>>
>>102035114
Just train on schnell retard
>>
>>102035121
>I a Man?
>>
File: 2024-08-23_00072_.png (890 KB, 720x1280)
890 KB
890 KB PNG
this gen made me remember where I tried for days to even get some simple skeletons done on SD15 .. we got far
>>
File: GGUF_CLIP.png (383 KB, 1555x579)
383 KB
383 KB PNG
That new clip model should work again with the gguf loader.
>>
>>102035191
Also the LoRA reload bullshit may be faster?
>>
>>102035191
Thanks City
>>
File: fluxmotorcycle.jpg (2.03 MB, 5170x1581)
2.03 MB
2.03 MB JPG
Here is my Dehya flux lora if you want to give it a go, I wasn't really able to get it to make anything that good as it seems all the digital art changes flux style too much

https://civitai.com/models/676229?modelVersionId=756985
>>
>>102035191
>Meh
>>102035200
>I pull
>>
>>102035191
where do you get the clip?
>>
>>102035206
Beware that comfy messed with something and until I write a custom file reader it'll randomly take 2x system RAM on loading if you update Comfy.
>>102035221
https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/tree/main
>>
>>102035114
the guy really has been doing nothing for months
>>
>>102035235
He was actually smart to hold off on SD3 but skipping Flux is a certified retard moment
>>
>>102035205
oh i'll add this was trained using both joycaption captions and booru tags in one file.
It may just be skill issue in prompting so if any anons want to try make anything nice thanks
>>
File: 00051-2960813742.png (1.27 MB, 832x1216)
1.27 MB
1.27 MB PNG
night night
>>
>>102034343
Update torch
>>
>>102035248
He didn't hold off on SD3 because it was shit, it was because of the license, same deal here.
>>
>>102035266
lil bro thinks his discord gens are gonna run into milllion dollar territory
>>
>>102035248
he wasn't smart because he could've trained his own full model in that time, a model whose license he controls
the dumb fuck doesn't realize that he's already doing that, SDXL is basically a ground up model
if you have 10 million images and you train for 20+ epochs, congratulations, you did a full fucking model
>>
File: 2024-08-23_00069_.png (575 KB, 1280x720)
575 KB
575 KB PNG
>>
>>102035266
Schnell has a fine license, he doesn't like it because it's distilled
>>102035286
His homebrew model would have gotten BTFO by Flux
>>
>>102035297
A homebrew 4B Pony model would be quite good. And you ignore my main point, NO MATTER WHAT MODEL HE CHOOSES HE IS DOING THE WORK OF A FULL MODEL. When you fine tune 10 million images for dozens of epochs, it's no different than doing a model from scratch.
>>
>>102035324
And he doesn't have the firepower to train a 12B model unless he somehow procured a $500k computer cluster.
>>
File: 2024-08-23_00078_.png (1.42 MB, 720x1280)
1.42 MB
1.42 MB PNG
>>
>>102035324
>>102035352
Agree to disagree, a Flux model trained on 1 epoch of his images would BTFO a homebrew trained on 12 epochs and would presumably cost less to train
>>
File: 1703059274271764.png (11 KB, 267x215)
11 KB
11 KB PNG
what the fuc
>>
>>102035370
Want to do some math anon? Let's say he manages 5s/it training the model with a batch size of 2 (he has 40 GB GPUs so that's a generous assumption), with 10 million images that's 25 million seconds.
>>
File: ComfyUI_01349_.png (300 KB, 1216x832)
300 KB
300 KB PNG
Training has concluded.
>>
File: 2024-08-23_00073_.png (1.72 MB, 720x1280)
1.72 MB
1.72 MB PNG
>>102035403
yay nice
>>
>>102035403
very cool style
>>
>>102035399
Ok, and how long for the homebrew?
>>
File: ComfyUI_02077_.png (1.3 MB, 1024x768)
1.3 MB
1.3 MB PNG
I think the AI inbreeding was too much for the model to handle.
>>
>>102035430
Well a home brew on a much smaller model he could do batch size 64+ and reach double or triple the quality of SDXL. Keep in mind that Pixart is a 600m parameter model. He could do a 4B Pixart model, do it with a 16ch VAE, and do it in one tenth the training time.
>>
File: 2024-08-23_00084_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
>>102035442
ow my.. that's nightmare fodder
>>
File: ComfyUI_01358_.png (275 KB, 1216x832)
275 KB
275 KB PNG
>>102035428
>>102035424
Thanks! I'm going through the different checkpoints right now and after I pick out the best one I will share the lora.
>>
>>102035452
Yeah it really picked up the grid pattern in flux and it ended up taking over everything.
>>
loool .. no its not Llama .. I need a better LLM for JoyCaption! this is NOT Link sitting cross legged
>>
>>102035476
You could've just followed his tutorial on patreon but no. Now look where we are.
>>
>>102034739
If it gets GGUF support, that won't ever be a problem but there has been a lack of focus on VLMs in llama.cpp and the issue requesting support has been some guy who has touched the codebase in the past saying he has no time. So I doubt it will get done unless by the model makers themselves like what happened with CogVLM support.
>>
>>102035493
I kneel
>>
>>102035447
What's the point? Würstchen introduced a new architecture, this hypothetical model would just be another trash low parameter model with no architectural improvements and no general knowledge because it was trained on very few images
>>
>>102035504
???
Pony has no general knowledge outside of its tags and when he trains his dataset for 20+ epochs all previous knowledge is gone.
Anyways, keep holding your breath for the grifter.
>>
File: ComfyUI_02081_.png (1.18 MB, 1024x768)
1.18 MB
1.18 MB PNG
The absolute earliest epochs I have turned out okay. This is 20 epochs.
>>
>>102035513
>Pony has no general knowledge outside of its tags and when he trains his dataset for 20+ epochs all previous knowledge is gone.
KEK I just spit out my drink, you actually got gaslit into believing that!? How did you let a midwit brony gaslight you over the internet?
>>
>>102035517
Holy kek
>>
File: 2024-08-23_00085_.png (389 KB, 1024x1024)
389 KB
389 KB PNG
the result of the prompt is pretty good tho ..
>>
don't reply to the thread schizo bait of the day: >102035525
>>
>>102035525
You clearly can't do math because you think someone with a couple A6000 Adas can train a massive model. In 4 months maybe you'll get a "I promise I'm doing something haha" update just like the last one.
>>
File: file.png (1.06 MB, 1216x832)
1.06 MB
1.06 MB PNG
>>
>>102035545
>>102035546
Do you actually believe that Pony models wiped out the knowledge of SDXL? Respond with a yes or no.
>>
File: extra networks.png (7 KB, 816x38)
7 KB
7 KB PNG
how do i get my extra networks more organized?
they changed how extra network searching works in forge, its now searching through txt files, and this doesnt seem to do anything any more.
>>
>>102035564
You clearly can't do math because you think someone with a couple A6000 Adas can train a massive model. In 4 months maybe you'll get a "I promise I'm doing something haha" update just like the last one.
>>
>>102035571
I know you're probably sick of hearing this, but the best solution is to just use comfy.
>>
>>102035577
Thanks for not responding with a yes or no. I can see I've changed your mind. Take some time to reflect on why you were so easily persuaded to believe that in the first place.
>>
Again, do not take the bait: >102035564
>>
>>102035552
tiiight
like uzumaki
>>
>>102035594
feel free to do a general knowledge prompt on base pony on a concept that isn't in the pony dataset
>>
File: 00030-1917393080.png (2.5 MB, 1024x1440)
2.5 MB
2.5 MB PNG
>>
File: file.png (656 KB, 1024x1024)
656 KB
656 KB PNG
>>
>>102035642
very nice
>>
>>102035642
>fangs
nice
>>
>>102035642
Well now you have to drop a link to the lora
>>
File: ComfyUI_02092_.png (1.06 MB, 768x1024)
1.06 MB
1.06 MB PNG
>>
File: ComfyUI_02093_.png (796 KB, 1024x768)
796 KB
796 KB PNG
>>
>>102035683
patreon link to the lora?
>>
>>102035200
>>102035191
Works, thanks
>>
File: ComfyUI_02098_.png (921 KB, 1024x768)
921 KB
921 KB PNG
>>102035756
I'm still not sure I want to release it.
>>
File: FD_00123_.png (1.08 MB, 1024x1024)
1.08 MB
1.08 MB PNG
>>
File: ComfyUI_02100_.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
>>
If you download my lora please add your images to it so I can get buzz because I am a f2p whore
>>
>>102035766
Did the LoRA reload get any faster? Technically it should no longer be copying the quantized weights.
>>
>>102035835
Seems to be. Could be placebo because my cat sat on my power button and reset my PC.
>>
Is there a translation to English for this?
https://github.com/ggerganov/llama.cpp/blob/master/examples/perplexity/README.md#llama-3-8b-scoreboard

Even better maybe a resource that I can get smarter. I know I can ignore it, but I don't want to.
>>
File: file.png (1.11 MB, 1024x1024)
1.11 MB
1.11 MB PNG
>>102035668
I'll do that in a bit fren
>>
File: Capture.png (56 KB, 947x729)
56 KB
56 KB PNG
>>102035803
Fucking guy is back in github threads begging for help again, of course this was preceded by 5 more images of his ugly fucking face and a link to his patreon
>>
>>102035593
oh, did they improve the lora and model browsing in comfyui? havent given it another shot for 6 months.
>>
File: ComfyUI_01381_.png (718 KB, 1216x832)
718 KB
718 KB PNG
>>
>>102035923
I stay out of the bmaltais repo for exactly this reason. thankfully he's too fucking stupid to use sd-scripts without bmaltais brainlet Gui
>>
File: ComfyUI_02108_.png (1.31 MB, 1248x768)
1.31 MB
1.31 MB PNG
>>102035923
Do you think the repo owners roll their eyes whenever he appears?
>>
>>102035923
Unless he is running a server core version of Windows how the fuck could anyone troubleshoot that. Windows be crazy, that is why it turns on without too much fuckery.
>>
>>102035923
the worst part is that the images aren't hidden so I have to scroll past them maxing out my giant monitor seeing every detail and pore of his ugly mug
this alone is enough that if I saw he'd been stabbed in an ally I wouldn't call for help. I might even stab him again for good measure to make sure the first guy didn't fuck up
>>
File: file.png (470 KB, 1216x832)
470 KB
470 KB PNG
I thamk everyone for helping me to train my first flux lora

Here is the lora itself: https://litter.catbox.moe/imodbt.safetensors

If there are more interest I will upload it to somewhere more permanent

Keywords for prompt: Black and white, digital, pixel art

No I will not be uploading this to CivitAI for coomers.
>>
>>102035997
>not be uploading this to CivitAI for coomers.
yes because we are all distinguished gentlemen who never make coomer slop
>>
>>102035997
>No I will not be uploading this to CivitAI for coomers.
basado
>>
>>102035923
isn't bmaltais a literal brainlet who knows nothing about this shit and his only contribution is throwing a gradio interface over kohya's scripts then crying to kohya when something errors with his gradio instead of troubleshooting? why the fuck would he have the answers to this?
>>
>>102036007
Anon, there is a time for cooming, and there is a time for WORLD OF HORROR(TM.)
>>
File: 1700722932141147.png (663 KB, 1152x768)
663 KB
663 KB PNG
Anyone know a good lora for character design sheets?
https://civitai.com/models/285203/turnaround-model-sheet-character-sheet-xl
I saw this one but I'm wondering if there's anything better out there.
>>
File: ComfyUI_02112_.png (2.29 MB, 1248x1248)
2.29 MB
2.29 MB PNG
>>
>>102036061
Please send patreon link sir
>>
>>102036052
you could use a character sheet as a base with controlnet or use IP adapter to get better results in general
>>
File: ComfyUI_00869_.png (2.1 MB, 1280x1280)
2.1 MB
2.1 MB PNG
>>
should I be downloading the Q[4-6]_0.gguf or Q[4-6]_1.gguf or Q[4-6]_K-S.gguf?
>>
>>102036104
should get a job, get a 4090 and use fp16 like a real man
>>
>>102035954
Is that with my wow LoRA or just proompting?
>>
File: ComfyUI_00872_.png (1.93 MB, 1280x1280)
1.93 MB
1.93 MB PNG
>>
File: ComfyUI_02109_.png (1.54 MB, 1248x768)
1.54 MB
1.54 MB PNG
>>102036115
LoRA, here it was without the LoRA
>>
>>102036104
K
>>
>>102036140
Interesting, I didn't try combining LoRAs with it. Works pretty good. There's a v2 btw that's better
>>
File: delux_bc_00012_.png (1.42 MB, 1536x968)
1.42 MB
1.42 MB PNG
>>102036052
a lot of models can just be prompted to do charsheets. if you want super specific poses, you can slap on a pose controlnet
>>
File: ComfyUI_01402_.png (627 KB, 1216x832)
627 KB
627 KB PNG
>>
>>102035489
>". This image is a photograph of a video game screen, specifically from a pixelated, early 1990s game. The background is completely black, emphasizing the text and image. At the center of the image, there is a pixelated statue of the Buddha, depicted in a meditative posture, sitting cross-legged on a cushion with a serene expression. The statue is rendered in white pixels, contrasting with the black background. Above the statue, in large, bold, white text, the words \"GAME OVER\" are displayed in a classic arcade font. The text is slightly transparent, with a gradient effect that transitions from white to blue, giving it a faded, retro feel. The overall style of the image is reminiscent of classic video games from the era, with a low-resolution, pixelated aesthetic. The image captures the essence of a classic arcade game, with its simple, yet iconic design and nostalgic feel."

Seems to work fine for me
>>
File: ComfyUI_02117_.png (1.4 MB, 1376x800)
1.4 MB
1.4 MB PNG
>>
>>102036222
Kinda funny how when other people train him for aislop the results are considerably better than his spammed images...
>>
>>102036104
https://huggingface.co/city96/FLUX.1-schnell-gguf/blob/main/flux1-schnell-Q4_K_S.gguf
this one is probably what 90% of people should be using from my understanding.
>>
>>102035633
How come the text is coherent? That's not from the AI, is it?
>>
>>102036244
>How come the text is coherent?
When was the last time you came here?
>>
>>102036234
I consider the results of the LoRA extremely scuffed too.
I admit it's therapeutic just genning this man in dumb situations though.
>>
>>102036250
The bigger model tends to give better text outputs compared to the low-end ones, although can't say for sure about that one.
>>
>>102036262
>>102036250
sorry meant to quote this >>102036244
>>
File: ComfyUI_01408_.png (420 KB, 1216x832)
420 KB
420 KB PNG
Kino...
>>
File: 00011-4250565314.png (1.02 MB, 1024x1024)
1.02 MB
1.02 MB PNG
>>102036244
>>
File: FD_00040_.png (1.16 MB, 768x1024)
1.16 MB
1.16 MB PNG
>>102036308
How come text is colemenet indeed
>>102036244
>>
12gb lora anon checking back in

1024x1024 with 5e-4 LR fried it, by e2 it already looks like an abstract painting/mess. RIP
going to see if I can get any speed ups from using my dual GPUs now that kohya supports them for flux lora training, but from my understanding using multi-GPU doesn't actually increase speed much, its more to shard the workload to prevent OOM? don't know, kind of a brainlet, might not even be able to do that kek
>>
>>102036147
>>102036235
thanks
>>
>>102036338
She's retarded please understand
>>
>>102036349
Doesn't matter, look at those titchka
>>
>>102035771
Of course you want to release it.
>>
File: 1710539411338414.png (924 KB, 768x1024)
924 KB
924 KB PNG
>>102036244
text gen is mostly just adding or removing steps from the same seed until the AI gets it just right
>>
>>102036222
lmao
>>
can't believe its been a whole week and no one's made a good NSFW Flux model yet. coomers are slacking.
>>
File: ComfyUI_13482_.png (859 KB, 1024x1024)
859 KB
859 KB PNG
>>102035997
Pretty solid even with other LoRAs on top, good job anon.
Surprised it keeps the dithering/pixelart this well in most cases.
>>
#[...]comfy\ldm\modules\attention.py
if model_management.xformers_enabled():
logging.info("Using xformers cross attention")
optimized_attention = attention_xformers
elif model_management.pytorch_attention_enabled():
logging.info("Using pytorch cross attention")
optimized_attention = attention_pytorch


when launching comfy:
>pytorch version: 2.3.1+cu121
>xformers version: 0.0.27
>Using xformers cross attention

however, when I actually gen something:
[...]comfy\ldm\modules\attention.py:407: UserWarning: 1Torch was not compiled with flash attention. 
(Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)



why would that even happen if xformers is active?
>>
File: ComfyUI_01412_.png (366 KB, 1216x832)
366 KB
366 KB PNG
>>102036415
Thank
I used almost the entire game's sprites (1300+) and trained it at 5k steps. I was thinking if more training is needed but I am quite satisfied with it for now

Training is surprisingly easy so I think I might do more later
>>
File: FD_00137_.png (944 KB, 1024x1024)
944 KB
944 KB PNG
Combined my wow classic lora with a nude lora and got modern wow
>>
File: ComfyUI_02121_.png (1.76 MB, 1024x1328)
1.76 MB
1.76 MB PNG
>>102036356
Oh yeah, I WANT to release it.
>>
>>102036428
I think the default for small tensors still uses pytorch since it's faster, i.e. in the T5/CLIP.
https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ldm/modules/attention.py#L432-L437
>>
>>102036443
I noticed your LoRA file was tiny btw, what did you do to get it so small?
>>
>>102036459
Yeah, seems like it. thanks.

fugg. This means I need to downgrade my torch and break xformers.
How much faster could it really get?
>>
>>102036480
That's just a warning, you can ignore that in most cases. Flash attention is linux only older versions just didn't print that text AFAIK.
>>
>>102036471
network_dim determines it.
LoRAs really shouldn't be much bigger than 20-50mb.
>>
>>102036453
Patreon link sir?
>>
File: ComfyUI_13483_.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>>102036437
>Training is surprisingly easy so I think I might do more later
The model adapts super quickly too, even to something dumb like picrel which was just random control screenshots with booru tags and bullshit like "orange color scheme" kek.
>>
>>102034594
>Fortunately, I was introduced to a smaller alternative, InternVL2, specifically the 40B variant (there is also a 76B version of InternVL2 but it was not a noticeable improvement in my tests) . This model proved to be even better, reaching GPT-4 levels of captioning with superior prompt understanding, better OCR, more domain knowledge, and no censorship. As a result of this evaluation, InternVL2 is currently the primary captioning model.

Ponyfags decided to go with InternVL2. What did I tell you all?

https://huggingface.co/spaces/opencompass/open_vlm_leaderboard

Though not starting out with Schnell right away is a mistake. Dev is a distilled model too and it isn't shit.
>>
where can i download a greg rutkowski lora for flux
>>
File: file.png (1.66 MB, 1216x832)
1.66 MB
1.66 MB PNG
>>102036510
Wanna thank that anon with the Joycaption script a bit earlier (he might have went to sleep now)

Made flux captioning super easy and all

Where should I upload the lora to? Literbox have deleted it already I think. I don't want to upload it to civitAI though
>>
>>102035191
I can't load any flux LoRAs at all after doing a fresh install of both comfy and gguf-loader btw.

>ERROR lora diffusion_model.double_blocks.10.img_attn.proj.weight
>The size of tensor a (1728) must match the size of tensor b (3072) at non-singleton dimension 1
and so on, for every single layer, lmao
>>
>>102036486
Great, thank you. I'll suppress the warning.
>>
>>102036559
Comfy broke that earlier today, already pushed an update that should fix it a few hours ago. When did you install the node?
>>
>>102035997
If you don't, then someone else will do it anon
>>
File: ComfyUI_temp_zalpk_00064_.png (3.68 MB, 1600x1360)
3.68 MB
3.68 MB PNG
>>
>>102036583
It's so niche anyways. I'll do it if I feel like it I guess
>>
>>102036539
in OP the only other sites are huggingface and aitracker.net
>>
File: ComfyUI_13486_.png (982 KB, 1024x1024)
982 KB
982 KB PNG
>>102036539
Huggingface and just dump it in a new model repo. Go to settings and disable community contributions to keep the people begging for LoRAs out.
>>
>>102036492
What dim did you train at?
>>
>>102036539
there were 2 versions of that script, right?
it's good to have a space for these, you can probably make similar loras for many other games. most of them have their assets packed in an easy to unpack file.
>>
Get you next loaf right here...
>>102036630
>>102036630
>>102036630
>>
>>102036580
Wow, it seems I had amazing timing. I reinstalled it all 6 hours ago.
Just saw your 2-hour-old fixes for it. I'll just throw in your new code snippets because I've changed all the files a bit.
great work, thank you
>>
>>102036639
I tried to keep it backwards compatible to fuck over as few people as possible kek.
>>
>>102036653
top man
>>
Still getting lora reload issue if you are still here City. Didn't pull comfy
>>
>>102036752
works with the latest comfy
>>
>>102035997
Muy al dente
>>
>>102036752
It still reloads it, but there's a chance it reloads it marginally faster. I didn't fix the actual reload part sorry.
>>
>>
>>102033988
>Flux Schnell
DOA
>>
>>102034278
what the fuck... is he in love with Turkish man or something? That level of glazing...
>>
>>102035114
>Auraflox or Flux-Schnell
lol, not touching one of these, guess that the pony dev era is over, someone will take over for sure, like Flux took over SAI



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.