[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: the longest dick general.jpg (3.47 MB, 2589x3264)
3.47 MB
3.47 MB JPG
Discussion of free and open source text-to-image models

Machine Learning Edition

Previous /ldg/ bred : >>102687080

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
https://github.com/kohya-ss/sd-scripts/tree/sd3

>Flux
https://replicate.com/black-forest-labs/flux-1.1-pro
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai
>>
the official pixart bigma....hmmrmmmmhfgmmafmamAAAAAAAAAAAAAAAAAAAAAAAAAAAA!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
Blessed thread of Machine Learning
>>
File: file.png (1.28 MB, 1024x1024)
1.28 MB
1.28 MB PNG
>>
5090 waiting room
>>
File: file.png (1.54 MB, 1024x1024)
1.54 MB
1.54 MB PNG
PuLID works ok on female characters, but for male celebrities it's kinda bad, it's supposed to be Robin Williams lol
>>
>>102700004
>Idk man, they managed to make pro 1.1 way faster than pro 1.0 while keeping its quality, my theory is that they managed to make it work on a smaller model, if we had the dev version it means we won't have to deal with a giant 12b model anymore, would make everything easier for everyone involved
Let it go, anon. Corpos don't care about us.
>>
>>102700003
I'll be happy with just a 20GB card around $500
>>
File: file.png (2.98 MB, 2188x1612)
2.98 MB
2.98 MB PNG
>>102700020
dunno why it made her hair pink but that looks cool so I don't mind lol
>>
File: file.png (1.76 MB, 1024x1024)
1.76 MB
1.76 MB PNG
>>102700090
so it can work with real life humans and 3d characters, now we need it to work on anime characters and the possibilities would be endless, no more Lora shit, just imagine that
>>
File: file.png (3.61 MB, 2310x1773)
3.61 MB
3.61 MB PNG
that's really interesting, didn't get the blindfold part but it made a cool haircut anyway
>>
To demoralized coomer JAV anon.
You may not have a place to showcase your work, but you should be proud of the work you made. A true artist doesn't get his jollies from having his shit in a museum, he enjoys the art, the production. You are a true artist, and a master of your craft.
Do it for yourself, king.
>>
File: FLUX_00100_.png (1.46 MB, 1120x1440)
1.46 MB
1.46 MB PNG
can't believe it captured a likeness from blurry 2010 era 360p videos so well
>>
Hey anons why isn't Flux fine-tunable? High costs? T5 fuckery?
>>
>>102700323
Bare minimum VRAM requirement is above 24 GB for fine tuning. Maybe we'll be able to squeeze it on a 5090 (32 GB) or Titan AI (40+ GB)
>>
>>102700323
People too poor to finetune it.
>>
>>102700323
shit distilled model. there's an experimental undistilled version a guy made so we'll see how that goes
>>
File: file.png (129 KB, 1205x845)
129 KB
129 KB PNG
>>102700323
The first reason was because it's distilled, but now it's undistilled the only limit is the fact it's a fat motherfucker, so you need a lot of money to do it, some people are up to the challenge though, like the guy who made Realistic Vision
https://civitai.com/models/788550/realflux-10b
>>
>>102700455
Is it possible to contribute to the dataset by volunteering to manually caption a few images?
>>
File: 00013-3830987892.png (1.45 MB, 1024x1024)
1.45 MB
1.45 MB PNG
another potentially great image ruined by a tiny error.
damn...
>>
>>102700492
.....upscale and inpaint??
>>
File: 00015-3830987894.png (1.47 MB, 1024x1024)
1.47 MB
1.47 MB PNG
>>102700513
yeah I should really stop being a lazy little faggot and learn how inpainting works but I like to just type my prompt, set the batch generation between 5-10 and come back later to see what I got.
I do quite like that image though.
>>
>>102700529
give it a try, it's not as hard as you think, pretty much essential for sdxl. gens look way better with a bit of upscaling and elbow grease
>>
>>102700474
Idk I guess? You can send him a message and see if he's interested
>>
File: fs_0042.jpg (135 KB, 1536x1152)
135 KB
135 KB JPG
>>
File: fs_0048.jpg (158 KB, 1536x1152)
158 KB
158 KB JPG
last one of the night was a blooper, this bitch eatin beans
>>
>>102700604
give her armpit hair
>>
I finally figured out how to maintain a consistent artstyle in Flux. Sorry if this is common knowledge.

Loras right? Yes, but it's a bit more complicated than that. You shouldn't mention the art style AT ALL in your prompt (other than trigger words). Any style you write in your prompt will have a fucking randomizer put on it. So even writing "anime" in your prompt will give you slightly different anime styles from seed to seed.

The more words you use to attempt to describe your art style, the less consistent your art style will be!
>>
>>102700604
using a lora?
>>
>>102700759
Post a grid to demonstrate
>>
>>102700759
What's your style: painting, anime, abstract, other? Can you describe your dataset?
>>
>>102700759
I mean yeah that's common sense, I only use trigger word + I describe the scene and that's it for a lora
>>
>>102700759
>will have a fucking randomizer put on it
If this were true you would get different results from the same noise seed each time. Does that actually happen?
>>
>>102701022
Not him but I've seen that
>>
>>102701022
>Does that actually happen?
it can't happen, all our gens are deterministic
>>
>>102701037
>>102700759
Well I haven't, and I'm testing it now and I'm not seeing it happening. I think he's confused.

Base models vary a lot in their outputs from one seed to another. That's because they're trained on a huge volume of images. Think of how many different images could be tagged "painting of a woman", and how different they could be. The randomness of the noise scheduling helps tease out the wealth of different ways FLUX can understand this prompt. The prompt is not being tampered with like what Dall-E does. Variation in results is just a natural result of a vague prompt having broad potential that plays out differently each time.

The reason you don't want to say anything about your style is because those tokens are giving T5 more opportunities to misunderstand and you're fighting against your LoRA rather than aiding it. My layman's understanding of it is that some prompts offer more control to the LoRA, and some prompts give more control to the base model. That's not exactly what's happening, but for all intents and purposes that's how the prompter experiences it.

I used to use model mixes and I reliably found that some things tended to weaken the finetunes' effect, e.g. using non-1024 size conditionings in SDXL, or using lots of vague verbiage; whereas other words really 'activated' the finetunes—the worst was "nerdy", which was really overtrained on at least one model in the mix to mean a particular sexy nerd with glasses look. If I slipped that word in the gens got noticeably more cooked and samefacey; it felt like I was just using the finetune unmixed.
>>
>>102701188
I'm not necessarily endorsing his theory about length of prompts and etc, just saying that I've prompted images with the exact same prompt and seed and received variety, specifically in Flux. For example

https://files.catbox.moe/xlz0dv.png
https://files.catbox.moe/8spe8f.png
>>
>>102701219
I remember there was some variation back in the day with xformers, whatever faster math it did meant the gens were slightly non-deterministic. This reminds me of that.
>>
>>102701238
This one is also the exact same prompt/seed, and came out with even more variety

https://files.catbox.moe/7ryvmo.png
>>
>>102701254
small differences in how early steps are resolved can push a gen over a 'tipping point' where the next step is made radically differently because of a slightly different 'input' from the state of the latents at that step being different. None of these changes reflect the style of difference we would expect from Dall-E style prompt manipulation for forced variety. All of these look like artefacts of some fast math being done with small random errors.
>>
Was prompting some "van gogh anime painting" type slop to test anon's theory about flux changing prompts. Remember when this sort of anime hybrid stuff was what the average /sdg/ gen looked like? Those were great times. Lots of fun. Of course the breasts were usually much larger.
>>
File: 1716997481410650.png (2.76 MB, 1024x1536)
2.76 MB
2.76 MB PNG
>>102701313
>Remember when this sort of anime hybrid stuff was what the average /sdg/ gen looked like?
Whoever it was that made picrel pictures was basically the reason I bought a computer for the first time in several years
>>
ok I'm getting closer to the old /sdg/ house style
>>
File: 57953.png (3.06 MB, 1440x3120)
3.06 MB
3.06 MB PNG
>>102701313
>he can no longer make big titty anime slop, due in large part to his slavish adoption the "latest greatest thing".
ngmi... fluxfags get the rope
>>
>>102701350
I am not primarily interested in big titty anime slop. Flux frustrates me but so did SDXL. Right now I am more willing to put up with Flux's quirks than SDXL's
>>
File: 57955.png (3.14 MB, 1440x3120)
3.14 MB
3.14 MB PNG
alright, here’s the thing—you had the opportunity for something profound, maybe even transcendent, but the energy in here? nah, it's off. so now, without even needing to say much, let’s just say the vibe has shifted... downward. and in that shift, what’s being offered has, uh, changed. expect nothing but shartys moving forward. not by choice, but by a natural progression. sometimes the cosmos gives, and sometimes? well, it just... sharts.
>>
>>102701433
yeah the whole problem is that these new models are so good at prompt adherence that they neglect to be even slightly creative, so what you end up with is exactly what you said, and that's a bad thing.
>>
if only we knew that creative models were the real deal, but instead we chased after perfect prompt adherence and ended up in a soulless hell of our own making
>>
>>102701480
nothing's preventing you to keep using the old models, you're talking as if they've taken SD1.5 away from you lol
>>
>>102701473
>these new models are so good at prompt adherence that they neglect to be even slightly creative, so what you end up with is exactly what you said, and that's a bad thing.
I feel that, when I tried the undistil dev I thought it would be like the SD models, low cfg = lots of creativity, high cfg = good prompt adherance, but it's not the case at all, I blame the flow architecture for that
>>
>>102701473
it's not even exactly that, although I suspect we mean the same thing, but I'd say it's more that words are overtrained to have exact effects which are not broad enough, this isn't really T5 being more correct it's just more 'exact'

you can kind of break it out of this by using img2img creatively. by putting unusual constraints on it you can reveal some of its image knowledge that T5 almost hides from you
>>
>>102701491
ah, my sweet summer child! i am, of course, using 1.5 models. meanwhile, i'm watching the thread shid and fard itself into irrelevance chasing the latest greatest thing, and failing pathetically. i offer as term of art Debo, and his pathetic adherence to flux gens, even if they're vastly inferior to everything he ever produced before. like pure rote dogshit. and i like debo!
>>
File: 9354.png (1.15 MB, 1024x1024)
1.15 MB
1.15 MB PNG
>>102700291
>>
File: 57957.png (2.62 MB, 1440x3120)
2.62 MB
2.62 MB PNG
i have a dream that one day, no one will post in this wretched splinter thread, that all anons will post in /sdg/ and /sdg/ alone, and that the dissidents will beg forgiveness for their hubris in believing that an /g/ based imggen thread could be sustained anywhere else other than the one true diffusion thread, /sdg/!
>>
>>102701559
I'm not ready to see another debo post yet
>>
File: 57958.png (2.71 MB, 1440x3120)
2.71 MB
2.71 MB PNG
>>102701579
>he puts the worst model ever in file name...
abandon your false models! stability perfected imggen and you're out here drowning in alt-models like some kind of retard. oh ho! the ai did exactly what i said! such dopamine, wow!
>>
but as we all know the AI doing what we least expected is, in fact, the thing we were chasing all along. so a model that does, specifically, what we tell it is bland, and boring.
"make three hot chicks showing titties at the trump convention" and it does it, like an ignorant slave. what you don't know is what you're missing out on with the misfit outputs.
>>
the desire for pure rationality is so great, but so misguided... the world of pure reason is antihuman in ways i can't even begin to describe.
>>
first of all, humans aren't purely rational
>>
this doesn't seem like a controversial statement, you know it to be true. you, yourself, are irrational in ways you have difficulty describing. that's the thing about irrationality, you can't trace it, it just IS.
>>
"i believe things i know cannot be true"
this is all religion
>>
File: 1726990399177602.png (3.71 MB, 2304x1152)
3.71 MB
3.71 MB PNG
>>102701634
>>102701640
>>102701644
This is why 1.4 is my favorite
>>
ir- and otherwise
>>
>>102701670
i liek it, desolate and empty. rusted, wasting a way.
>>
Can FLUX draw hentai yet? The penis in vagina variant.
>>
File: 1708557363642692.png (3.58 MB, 1536x1152)
3.58 MB
3.58 MB PNG
>>102701684
I've never seen a 1.5 or SDXL prompt that can make something like it
>>
>>102701715
i don't recall that much difference between 1.4 and 1.5. 2.0, 2.1, xl, etc, different story. failed abortions, the lot of them. then there are the fluxfags, who don't even know they're jacking their own cock and sucking their own cum. sad but true
>>
"i longed so much for realism, for reasons i never understood, that when it came to pass I was intoxicated but, in the end... i should have been happy enough with what i had before."
>>
"i must have made maybe 50, or 100 '''realistic''' images, and it felt so good but in the back of my mind i knew what i was doing was as fake as fake can be.... but i hoped to fool someone, somewhere..."
>>
it now seems to me that these supposedly realistic models are soulless trash. i don't know why it took so long, nor why we made this new thread to realize this obvious fact, but, we did. and it was a massive mistake
>>
that's why, I, and the rest of you, should be considering reuniting with the true thread, aka where you all belonged all along. this fever dream was nice, for a time, but ultimately illogical.
>>
>>102699906
great collage
>>
File: ComfyUI_temp_kphtt_00054_.jpg (1004 KB, 1776x1296)
1004 KB
1004 KB JPG
>>102701313
What does changing prompts refer to? It has very little knowledge about artists names or style names. Can you demonstrate that it changes the prompt rather than ignoring what it doesn't know about?
>>
>>102701473
Change guidance
>>
>>102701921
>Can you demonstrate that it changes the prompt
It doesn't. Some anon asserted it, I was 99% sure he was wrong, so I tested something.
>>
>>102701473
add more words to your prompt and flux would start schizo hallucinating
>>
What kind of values do you guys use with euler_p /cfg++ ?
>>
any news about new Pony model? it was supposed to be in training, i'm not joining their discord
>>
>>102702352
the guy is still wasting their time in captioning hell.
and v7 is going to be based on auraflow.
it's going to be shit.
>>
>>102702445
dunno why he didn't went for Schnell? This is still better than this Auraflow piece of shit
>>
>>102702464
auraflow is a piece of shit but it's going to be significantly more trainable than the garbagepile known as flux.
it also, unironically, has better prompt comprehension.
https://cloneofsimo.github.io/compare_aura_sd3/

the only thing flux has going for it is a 16ch vae but it's fucking useless because it can't create anything other than corporate grade slop.
>>
>>102702480
>it can't create anything other than corporate grade slop.
that's why I'm having my fun with Loras, if loras never existed flux wouldn't survive lol
>>
>>102702480
16ch vae is huge though, unfortunately.
>>
File: 00017-3217233598.png (1.39 MB, 1024x1024)
1.39 MB
1.39 MB PNG
>>
File: rb123dance1.webm (3.91 MB, 1280x720)
3.91 MB
3.91 MB WEBM
Booba good
>>
File: 00302-3068335178.jpg (730 KB, 1392x1736)
730 KB
730 KB JPG
>>102702579
yes, agreed
>>
i consider sd 1.5 jeets above flux "realism" niggers
>>
>>102702579
Now take off their clothes.
>>
File: rb123frame1.webm (3.07 MB, 720x1280)
3.07 MB
3.07 MB WEBM
>>102703599
You know the rules, and so do I
>>
File: 00020-3778449240.png (2.98 MB, 1920x1080)
2.98 MB
2.98 MB PNG
Been out of the loop for a while. What's this Flux thing, and why doesn't it work with A1111?
>>
File: doorframein2.webm (3.03 MB, 720x1280)
3.03 MB
3.03 MB WEBM
>>102704016
Welp I mean to send this "interesting" one
>>
File: spoon.png (270 KB, 557x559)
270 KB
270 KB PNG
>>102704128
>>
File: 00023-3108705150.png (2.94 MB, 1920x1080)
2.94 MB
2.94 MB PNG
>>102704217
Considering how many people I've spoonfed in /sdg/ before, I fully expect to be that now, yes.
>>
>>102704153
her face doesnt change at all lel
>>
File: rbblouse1.webm (2.64 MB, 720x1280)
2.64 MB
2.64 MB WEBM
>>102704311
It's hard to get them to be very expressive so I start with an already expressive face and accept that it's better than the 1000 yard AI stare
>>
>>102704483
the "no blinking" thing is creepy tho lel
>>
File: spoona.png (446 KB, 619x623)
446 KB
446 KB PNG
>>
File: 0.jpg (572 KB, 2048x1024)
572 KB
572 KB JPG
>>
File: spoonime.png (501 KB, 627x627)
501 KB
501 KB PNG
>>
File: sporg.png (500 KB, 627x627)
500 KB
500 KB PNG
>>
File: 1715647556295011.jpg (106 KB, 800x1170)
106 KB
106 KB JPG
>>102699906
got prompt
!!! Exception during processing !!! ERROR: Could not detect model type of: /content/drive/MyDrive/ComfyUI/models/checkpoints/Stable-diffusion/flux1-dev.safetensors
Traceback (most recent call last):
File "/content/drive/MyDrive/ComfyUI/execution.py", line 323, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "/content/drive/MyDrive/ComfyUI/execution.py", line 198, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "/content/drive/MyDrive/ComfyUI/execution.py", line 169, in _map_node_over_list
process_inputs(input_dict, i)
File "/content/drive/MyDrive/ComfyUI/execution.py", line 158, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "/content/drive/MyDrive/ComfyUI/nodes.py", line 540, in load_checkpoint
out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings"))
File "/content/drive/MyDrive/ComfyUI/comfy/sd.py", line 557, in load_checkpoint_guess_config
raise RuntimeError("ERROR: Could not detect model type of: {}".format(ckpt_path))
RuntimeError: ERROR: Could not detect model type of: /content/drive/MyDrive/ComfyUI/models/checkpoints/Stable-diffusion/flux1-dev.safetensors


Wat?
>>
>>102705629
Is your ComfyUI like 10 months old?
>>
File: file.png (3.93 MB, 1887x1257)
3.93 MB
3.93 MB PNG
i want to generate something like this
>>
>>102705645
I'll try uprooting it and see if that helps
>>
>>102705690
whats a model to have a subject on a wide background
>>
>>102700529
you can even use paint to do it,
maybe throw this trough your first ksampler at 0.35-0.4 and it will fix things
>>
>>102703472
You consider wrong.
>>
>>102705690
realism, 4k, realistic, (bokeh, blurry background),
(fumo, fumo doll, sitting, wide view, countryside background, small figure)
>>
>>102706204
>realism, 4k, realistic, (bokeh, blurry background),
>(fumo, fumo doll, sitting, wide view, countryside background, small figure)
checkpoint, model?
>>
>>102706214
mmh, maybe look for a fumo doll lora, and I think DreamshaperXL will work? I don't gen 3d realism myself
Post what rolls out of the online generator? you're using?
>>
>>102706204
3d, sketch, render to negative if using sdxl
>>
>>102706256
i dont care much for the realistic look i just want a similar looking fumo in a nice environment but most things i find are just model is 80% of the screen
>>
File: anakin anger1.gif (1.03 MB, 480x270)
1.03 MB
1.03 MB GIF
is comfyui supposed to randomly send nodes to the fucking ether? been using it consistently for a good few months now, got my workflow down to a reasonable level with face and hand detailer, now my hand detailer suddenly doesnt exist anymore. importing images that used this workflow shows every part of my node setup, except the hand detailer
what in the fuck, is there some magical way i can recover this?
>>
>>102706291
Ive figured out NONE of my gens this past month saved this particular workflow, except ONE. What the fuck is this shit?
>appending angry anakin reaction images so i dont start sperging out here
>>
>>102706285
I find a very nice flux lora but the rest that look decent are 1.5
https://civitai.com/search/models?sortBy=models_v9&query=fumo
>>
>>102706269
ty
>>
out of last night's 1,500 1girl gens this was the best failed gen
>>
File: fs_0010.jpg (133 KB, 1536x1152)
133 KB
133 KB JPG
leaned into it
>>
>>102706291
>>102706354
Open your images in a text editor and see if the nodes are there in the text
>>
>>
>>102706438
Pretty good
>>
File: 1716646632624698.png (1.71 MB, 1152x896)
1.71 MB
1.71 MB PNG
>>
File: 1722272816131406.png (1.61 MB, 1152x896)
1.61 MB
1.61 MB PNG
>>
I hate sunk cost fallacy
>>
File: 1707734967808234.png (1.92 MB, 1152x896)
1.92 MB
1.92 MB PNG
>>
File: 0.jpg (237 KB, 2048x1024)
237 KB
237 KB JPG
>>
>>102706754
>>102706655
>>102706650
Tell me about bigma, did they already finish the paper
>>
File: 1718432814893948.png (1.91 MB, 1152x896)
1.91 MB
1.91 MB PNG
>>
>>102706889
bigma balls
>>
>>102706889
Yeah you didn't see the link posted earlier?
>>
>>102706940
No
>>
File: 1709607605301542.png (1.79 MB, 1152x896)
1.79 MB
1.79 MB PNG
>>
File: 1709972065102448.png (1.66 MB, 1152x896)
1.66 MB
1.66 MB PNG
>>
great lora btw
https://civitai.com/models/718812
>>
>>102707089
Isnt it better to use use a CRT filter?
>>
File: 26.png (3.77 MB, 1160x1576)
3.77 MB
3.77 MB PNG
something something big bara pits
>>
>>102707239
https://youtu.be/s3PV9hat814?t=11
>>
https://us.download.nvidia.com/Windows/565.90/565.90-win11-win10-release-notes.pdf
>What’s New in Release 565
> Support for CUDA 12.7.
Is there a pytorch compatible with such a recent cuda though?
>>
>>102707414
the 12.7 toolkit isnt even out
>>
File: file.png (176 KB, 367x720)
176 KB
176 KB PNG
back to the lab again
>>
>>102707831
what are you training anon?
>>
>>102707845
Pixart Sigma 1.3B (2 and a half months of training)
I didn't have the heart to do it again for the 16 channel although the code is all working. I'll probably do it when the 5090 comes out.
>>
>>102707862
Cool man. How many hours total trained?
>>
>>102707882
It would be like 2000-3000 hours give or take using just 4090s
>>
>>102707900
Did you try training a smaller finetune first to test output quality?
>>
>>102707946
Yeah I've done some finetunes, but the goal was learning how to do it by scratch and show off the capability of the Pixart architecture which I think is underappreciated/underutilized especially when it comes to grassroots models.
>>
>>102707831
I think these are the first faces I've seen from it. Nice. Crazy it's already been two and a half months.
>>
>>102707831
how many pictures did you use so far to get this result?
>>
File: file.png (514 KB, 1059x1037)
514 KB
514 KB PNG
A lot of people are downloading OpenFlux right now, prepare yourself for some shitty finetune of this shitty Schnell model, un-distill dev isn't even on the list bruh
>>
>>102707986
if it was bigger, didnt use sdxl vae and didnt use a huge text encoder, then there would probably be much more interest
hopefully next pixart comes soon and improves enough, because at least on the anime side i think people are sick of sdxl
>>
File: 1723702585589.jpg (3.84 MB, 7961x2897)
3.84 MB
3.84 MB JPG
Have there been any updates to this image?
>>
>>102708795
you don't really need an update, we know exactly how the Q quants behave and which one is better than the other, that picture solely exists so that we could compare the nf4 against the fp against the Q
>>
>>102708816
>you don't really need an update, we know exactly how the Q quants behave and which one is better than the other
We do? I haven't tested them.
>>
File: file.png (195 KB, 825x850)
195 KB
195 KB PNG
>>102708829
like I said, we objectively know which Q quant is better than the others
https://reddit.com/r/LocalLLaMA/comments/197mip0/some_perplexities_for_the_new_2_bit_sota_gguf/
>>
Is there any significant difference between the following?

>train 1000 steps for 1 epoch
>train 100 steps for 10 epochs
>train 100 steps for 1 epoch and use the resulting safetensors file as network weights for the subsequent training session, 10 training sessions total
>>
>>102707089
Thanks for bringing it to my attention, time to see if Flux can now do what DALLE-3 can
>>
>>102708842
What about the speed? Q5_0 seems to be particularly a bit slower than the others for some reason.
>>
>>102708842
Can anyone explain why Stablediffusion.cpp doesn't allow conversion to Q6?
>>
>>102708945
https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/quantization_and_gguf.md
it doesn't even do the QK quants, there's probably another repo that does it all
>>
>>102706655
this one is cute
>>
>>102708964
ty
It's photoshopped tho, I could've thrown it trough at .30 again for the blend, there is surprisingly little fumo loras available
>>
>>102708590
the text encoder is the magic no matter how much you want CLIP
>>
>>102709294
i dont want clip, but its either too small (clip) or too huge (t5), why not something in the middle that can also be trained if necessary
>>
>>102709319
you still couldn't run something in the middle bud, if the T5 is squeezing you too hard already you're not going to get rescued by something that requires a GB less of VRAM
>>
File: 1723073414935100.png (2.06 MB, 1280x1280)
2.06 MB
2.06 MB PNG
>>102707089
What settings? It just makes my prompts border-heavy with 1.0 strength and a prompt beginning containing

"crt style

Close shot of a crt screen, over saturated."
>>
>>102708271
I took a bit of a break exploring 16 channel VAEs and stuff including an exploratory training session. I'll wait for when I can just use a computer with a 5090 and put in a corner and set it and forget it for training. My current setup is too intrusive especially if I want to play games or whatever.

>>102708337
~3 million images.
>>
>>102709403
you try some of the sample image prompts?
>>
what are brown/black asians called?
>>
>>102709731
Neo-noir chinamen
>>
File: 1711738035922703.png (1.54 MB, 1024x1024)
1.54 MB
1.54 MB PNG
>>102709716
Seems like it's very prompt sensitive and works better with vibrant video game/anime like prompts, and it still looks a bit iffy
>>
File: 1703496815437418.png (1.59 MB, 1024x1024)
1.59 MB
1.59 MB PNG
>>102710083
>>
>>102710117
looks cool even if nothing like a crt
>>
File: 1724964471773980.png (2.42 MB, 1024x1024)
2.42 MB
2.42 MB PNG
>>102710195
It's a Joy Caption based on a (much better) October 2023 DALL-E 3 prompt:

https://litter.catbox.moe/mx39mw.jpg

Still waiting for any model that has the right color and style adherence as peak DALL-E 3
>>
File: 1712544310549937.png (1.85 MB, 1024x1024)
1.85 MB
1.85 MB PNG
kek
>>
fuck flux
>>
>>102710560
With what?
>>
>You want it, you get it ... but 500 up to 2000 downloads but only 3 users give me buzz, hey so i must use next one early access with buzz... sry
I hate that civit added this buzz bullshit. It is devolving the whole site into civitards begging for buzz, releasing shitty overfit loras solely for buzz, farming buzz, etc. Pretty soon they're gonna add an option to completely paywall loras and models for buzz.
>>
>>102710656
zero moderation on quality is what killed the site
>>
>>102710564
pencil penis
>>
>waiting room
>>
>>102710780
what are we waiting for?
>>
File: 0.jpg (261 KB, 2048x1024)
261 KB
261 KB JPG
>>
File: paganilow.png (1.6 MB, 1016x1016)
1.6 MB
1.6 MB PNG
>>
>>102710232
Catbox looks nothing like a crt tho
>>
File: bComfyUI_126389_.jpg (828 KB, 1440x1024)
828 KB
828 KB JPG
>>
File: bComfyUI_126443_.jpg (1.01 MB, 1440x1024)
1.01 MB
1.01 MB JPG
>>
I like Flux
>>
https://www.reddit.com/r/StableDiffusion/comments/1fxdm1n/i_tested_the_use_of_a_random_filename_1234jpg_to/
https://www.reddit.com/r/StableDiffusion/comments/1fxkt3p/common_camera_and_smartphone_file_naming_formats/
>>
You're welcome
>>
File: ComfyUI_00222_.png (1.74 MB, 770x1126)
1.74 MB
1.74 MB PNG
>>102699906
I heard that undistilled versions of flux are either in the works or are already available. People been successful in training fine tunes of those models? What about LoRAs?
>>
File: ComfyUI_00783_.png (1.25 MB, 896x1160)
1.25 MB
1.25 MB PNG
>>
File: 1707345841893108.jpg (77 KB, 800x1170)
77 KB
77 KB JPG
>>102711603
>>102711764
Neat
>>
File: 0.jpg (203 KB, 2048x1024)
203 KB
203 KB JPG
>>
>>102711022
Cool
>>
File: ComfyUI_00786_.png (1.28 MB, 896x1160)
1.28 MB
1.28 MB PNG
>>
File: ComfyUI_temp_kasmf_00040_.png (1.78 MB, 1696x1160)
1.78 MB
1.78 MB PNG
>>
i hate how detailed pony shading is
currently using prefect xl v3
anyone has any recommendations for more "anime" model?
>>
>>102712226
Catbox? Looks legit
>>
>>102710656
come home white man https://aitracker.art
>>
just genned 800 more fat slags to have a pick through
>>
kept 73 of em but wasn't being particularly selective
>>
>>102712505
>>102712644
p attractive for slags
>>
BUILT
>>
>>102712661
no worries, we kept some uggos too
>>
>>102712688
still would
>>
File: ComfyUI_temp_kasmf_00059_.png (1.89 MB, 1160x1696)
1.89 MB
1.89 MB PNG
>>102712350
https://files.catbox.moe/hv86a7.png
>>
>>102701542
give catbox
>>
>>102712769
Danke bro
>>
File: ComfyUI_temp_kasmf_00062_.png (1.78 MB, 1696x1160)
1.78 MB
1.78 MB PNG
>>
>>102712709
ever since I heard that large breasts are considered "grandmotherly" in Asian countries I have changed my view of them. I find them a bit comic and grotesque now. To still be turned on by them in spite of this—humiliating

>>102712854
lookin' good. lol @ minimap that's actually just another similar scene
>>
>>102711022
SpaceEngine
>>
diffusers and huggingface are pure fucking aids
>>
File: ComfyUI_temp_kasmf_00071_.png (1.89 MB, 1696x1160)
1.89 MB
1.89 MB PNG
>>102712874
ty
>>
File: 0.jpg (282 KB, 2048x1024)
282 KB
282 KB JPG
>>102712093
thanks
>>
File: 00299-3850448666.jpg (383 KB, 2200x1467)
383 KB
383 KB JPG
Pour one out for the nigga HellKaiza. No idea what happened but the account must've been nuked not that long ago. There goes 90% of the good Dragonball loras.
>>
GOOD MORNING SAARS.
https://www.reddit.com/r/StableDiffusion/s/r1lrfbbKyn
>>
>>102713605
>"OMG GUYS, FINETUNE FLUX WITH 6 FPS"
>Doesn't mention it would take 2 years to finish the training
god I hate this turkish fuck so much
>>
>>102712253
use "anime screencap" in prompt
>>
File: ComfyUI_02760_.png (2.5 MB, 1920x1152)
2.5 MB
2.5 MB PNG
3 months before Flux becomes decent at anime?

Have a Fuuka
>>
>>102713815
fantastic
>>
https://xcancel.com/Lykon4072/status/1842678140092461557#m
>Flux is not really in target for what SAI does. Looks to me like SDXL is still the king.
Holy copium
>>
>>102713815
>3 months before Flux becomes decent at anime?
Now that we got the undistilled version, we can finally finetune Flux, so probably yeah, we'll get something interesting soon enough

What model did you use to make that picture btw?
>>
>>102713829
>Flux is a very good model, but it's more in the turf of Ideogram and Midjourney.
>
>>
>>102713829
I'll bite. What do you think SAI is trying to do?
>>
>>102713846
Haven't been here since Flux came out. What do you mean by undistilled version? Did they release Pro?
>>
>>102713879
>What do you mean by undistilled version? Did they release Pro?
No, someone undistilled flux-dev, and that's important because you can't really finetune distilled models in the first place, so we unlocked that
https://huggingface.co/nyanko7/flux-dev-de-distill
>>
>>102713886
Weird. I feel like I remember there being fine tunes. Or were they fake/suboptimal or something?
>>
>>102713918
>I feel like I remember there being fine tunes. Or were they fake/suboptimal or something?
We can make loras out of flux, but that's it, if you want to make a real training with a lot of pictures it collapses hard, that's why we needed the undistilled version if we wanted to go for the next step
https://github.com/bmaltais/kohya_ss/issues/2885#issuecomment-2395679834
>The latest commit includes the latest sd-scripts updates with flux dev de-distilled model support. Give it a try.
Nice, I believe it's on the SD3-flux.1 branch it's working now
https://github.com/bmaltais/kohya_ss/tree/sd3-flux.1
>>
>>102713928
Oh I see, so basically only shallow trains were possible, but now this allows long continued pretraining. Is there anyone that plans to make a big one for Flux or are there concerns with the license?
>>
>>102713964
>Is there anyone that plans to make a big one for Flux or are there concerns with the license?
the guy who made Realistic Vision is up to the task, that's the only guy I know he's gonna train undistill dev so far
https://huggingface.co/SG161222/Verus_Vision_1.0b
>>
>>102713972
Interesting. Looking forward to see if it works out.
>>
>>102713928
>Nice, I believe it's on the SD3-flux.1 branch it's working now

No wonder my attempts didn't work earlier. What a waste of time.
>>
>>102713605
>>102713689
I actually tried this briefly after seeing it, just to see the speed.

It's not bad. On a 4090 it averages around 150W power usage on the GPU (flux loras are just over 300W). So swapping the layers around between GPU and CPU cuts utilization in half, roughly. Shame it doesn't work with multi GPU, else I'd do some finetunes on my 4x4090 system. Maybe I'll try to fix it for multi GPU, theoretically it should be possible but currently it fails with weird Accelerate errors.
>>
>>102713928
Ok, looks there wasn't anything special done. I still couldn't get training for flux to work properly. The samples are all fried regardless of how high or how low I set the learning rate, guidance, and discrete flow shift. I would like to know if anyone managed to train a lora successfully with dedistilled flux.
>>
>
>>
>>102713736
I did, it still gives me way too "realistic" shading.
>>
File: teatime.jpg (53 KB, 900x900)
53 KB
53 KB JPG
10.10
>>
>>102713523
Damn that sucks. I wonder why people are nuking their civitai accounts
>>
File: file.png (130 KB, 1214x869)
130 KB
130 KB PNG
>>102714864
>The samples are all fried regardless of how high or how low I set the learning rate, guidance, and discrete flow shift.
by "guidance" you mean CFG? because there's no distilled guidance anymore on that model

>I would like to know if anyone managed to train a lora successfully with dedistilled flux.
Maybe you should message that guy, he found a way to get better results from the undistilled model
https://civitai.com/models/788550/realflux-10b
>>
>>102715531
i wonder if it's because i am using "orc", when i use other tags and use the same negative prompt(3D, photorealistic, monochrome) it gives me good results.
I am looking for something more animey tho, illustrious looks good.
>>
do people still use 1.5? why?
>>
>>102715950
Quick lora training, great for inpainting.
>>
File: 02023-540061691.png (995 KB, 896x1152)
995 KB
995 KB PNG
>>
File: 02015-3638303554.png (851 KB, 896x1152)
851 KB
851 KB PNG
>>102716220
>>
File: 02017-3638303558.png (891 KB, 896x1152)
891 KB
891 KB PNG
>>102716237
>>
File: 02041-2385432971.png (912 KB, 896x1152)
912 KB
912 KB PNG
>>102716247
>>
File: 02049-574698730.png (1.03 MB, 896x1152)
1.03 MB
1.03 MB PNG
>>102716259
>>
File: 02060-574698747.png (1.03 MB, 896x1152)
1.03 MB
1.03 MB PNG
>>102716274
>>
File: IMG_0461.jpg (256 KB, 1024x1024)
256 KB
256 KB JPG
>update the cockarm lora with women
>it’s still very overcooked melting bodies etc
>but now the cover image is a a spread open woman instead of a skateboarding cock
>gets more DLs and likes in a few hours than the original even though there are a dozen other naked women loras
Why are straightfags like this
>>
https://huggingface.co/RareConcepts/FluxDev-LoKr-beavisandbutthead-nf4
>This is a LyCORIS adapter derived from black-forest-labs/FLUX.1-dev.
What's a LyCORIS? Is it better than a Lora?
>>
File: bComfyUI_126496_.jpg (805 KB, 1440x1024)
805 KB
805 KB JPG
>>
File: bComfyUI_126491_.jpg (725 KB, 1440x1024)
725 KB
725 KB JPG
>>
File: VPX-104.jpg (2.3 MB, 2608x2080)
2.3 MB
2.3 MB JPG
>>102716946
>>102716901
Thee are DOPE
>>
File: ComfyUI_32216_.png (1.47 MB, 1024x1024)
1.47 MB
1.47 MB PNG
>>102716901
>>102716946
Very nice.
>>
File: ComfyUI_31767_.png (1.47 MB, 1024x1024)
1.47 MB
1.47 MB PNG
>>
File: ComfyUI_31761_.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
>>
File: ComfyUI_31627_.png (1.45 MB, 1024x1024)
1.45 MB
1.45 MB PNG
>>
File: ComfyUI_31155_.png (1.56 MB, 1024x1024)
1.56 MB
1.56 MB PNG
>>
>>102715595
?
>>
>>102713972
Wake me up when he (or someone) has a safetensors to share
>>
File: fs_0042.jpg (162 KB, 1536x1152)
162 KB
162 KB JPG
>>
>>102716360
>straightfags
coomers**
>>
https://www.reddit.com/r/StableDiffusion/comments/1fy2riz/cusersyour_prompt_herepicturesphotos_also_works/
>>
>>102716360
it's like there is a percentage of the population that is straight and a percentage that is gay. One might guess that it is a 10 to 1 ratio.
>>
>>102713523
There needs to be an archive for these things. I know I have a ton of DBZ Loras from him saved somewhere
>>
File: 00005-182415007.png (1.04 MB, 896x1152)
1.04 MB
1.04 MB PNG
Flux could be so cool if it was even somewhat more reliable. The level of work you need to do to get even 1 image to look semi realistic, not have plastic skin, not have 20 fingers, and look like who you want it to is insane
>>
In comfy. When using a segs filter do you feed it the normal conditioning or should you be setting up a custom one for the object. I am tried of switch to A1111 (or variant) to get things not shit.

Example: If you are seg a face should your prompt be
1girl, red dress, blonde hair, strong makeup
or
blonde hair, heavy makeup, attractive face

>>102719170
mirror on HF is always a possibility. I haven't seen much for loras though.
>>
>>102719182
>not have 20 fingers
It gets fingers right like 90% of the time which is way more than XL or 1.5
>>
>>102719182
Welcome to Bigma waiting room
>>
>>102719471
I apparently gotta spend more time in here, this is the first I'm hearing of it
>>
>>102719521
chinamen making the next version of PixArt with nvidia sponsor
>>
>>102719760
>nvidia sponsor
wut. I thought the US government would have squashed that. Still pissed about the super VRAM cards they have access to.
>>
I want to make VQ-VAEs work so bad. Shrinking down the network because the next token prediction was so fucking slow, like .2s/token but you need like 2048 tokens to make an image.
>>
>>102719760
Oh lord, so we're likely months out then. Might as well wait for Pony V7 while we're at it
>>
>>102719837
apparently training is done, they wanna release paper before publishing the model
>>
>>102719864
Let's hope this comes sooner than later. We haven't gotten any great models in a while outside of Flux, and that has it's own issues
>>
File: 1726075449183071.jpg (7 KB, 249x250)
7 KB
7 KB JPG
Does image quality and prompt adherence degrade the more generation you make? I've noticed that using A1111 and Forge, I can't quite put a finger on it exactly but it feels like anatomy, composition etc becomes worse the longer you gen. After closing and restarting I get better gens for a while again
>>
>>102720058
Its confirmation bias. Most people will check settings and tweak something when they restart. Keep doing it periodically, it is really good for fixing things.
>>
File: catbox_tqb029.png (1.68 MB, 1152x1632)
1.68 MB
1.68 MB PNG
>>
>>102719986
>Let's hope this comes sooner than later. We haven't gotten any great models in a while outside of Flux, and that has it's own issues
I was excited about using Flux for few hours. I couldn't get used to it because it doesn't seem to know basic art styles, terms, artists etc. Generation times feel bloated. Prompting requires writing purple prose which makes me fucking sick.
>>
>>102720298
>Prompting requires writing purple prose which makes me fucking sick
You're not lying. looking at prompts other people put to get even decent images is insane compared to other models where you can throw in a few keywords and get what you want
>>
>>102720112
nice
>>
>>102720434
I used local bot to turn simple tags into larger lenghty sentences, but it just doesn't feel like it's worth the effort. It's fun to go back to 1.5 finestunes just for few "landscape, by artist" gens
>>
So Flux 1.1 is pro only?
Are those jews seriously not release a 1.1 dev?
>>
>>102721114
maybe next week
>>
>>102721114
No one likes Flux anymore
>>
>>102721232
No one likes Flux because is cucked and impossible to finetune, but maybe that could change with the de-distill version?
>>
>>102721294
I'll believe it when I see it
>>
What's the best option to caption NSFW images? Joycaption is really bad at it outside of 1girl, standing scenarios.
>>
>>102719087
Gross
>>
>>102719201
I’ve been mass testing loras and I would say 70% of them, even when testing only highly rated ones, completely destroy the model’s ability to make text and fingers.
>>
>>102721535
Unfortunately still joycaption
Maybe pixtral
>>
>>102716901
>>102716946
Are you using a lora?
>>
>>102721620
Fucking rip...
>>
File: fs_0144.jpg (139 KB, 1240x984)
139 KB
139 KB JPG
>>
>>102720058
I can confirm that this happens with A1111 and probably Forge too since it's a derivative. The more often you change your prompt, the more it deviates from your prompt. Using an old prompt and seed that worked seems to do a reset and fix the issue until you change the prompt again. But, even if you use the original seed and settings, you won't be able to reproduce the same exact image even if you use CPU as the random seed generator.
>>
>>102721584
Because they're super overcooked.
>>
File: ayowhat.webm (1.49 MB, 720x1280)
1.49 MB
1.49 MB WEBM
Just want to make dancing redhead girls, what the fuck is this
>>
>>102722174
You got the Netflix version
>>
>>102722174
she got DEI'ed kek
>>
>>102720298
Use the Schnell version if you only want to do short prompts.
>>
>>102720298
>Prompting requires writing purple prose
I suggest you to try the undistill version, you can improve the real CFG on it and the prompt understanding gets better, no need to go for boomer prompting anymore it understand simple prompts better
https://huggingface.co/nyanko7/flux-dev-de-distill/discussions/3
>>
>>102711828
>>
>>102722486
that's pretty cool
>>
>>102722486
good fight, good night
>>
>>102722674
>>102722691
Gened using this

https://civitai.com/models/549967/crisppyboat-ponyxl-or-flux1
>>
>>102722174
White t short activates the dove commercial dataset
>>
File: AllowIt.gif (675 KB, 320x240)
675 KB
675 KB GIF
>>
File: IMG_0467.png (1.77 MB, 1024x1024)
1.77 MB
1.77 MB PNG
>>102722107
No this is raw
>>
>>102721535
https://huggingface.co/allenai/Molmo-7B-D-0924
This is IMO better than joycaption and almost completely uncensored. If you have tons of VRAM, molmo 72b is even better.
>>
>>102722932
NTA and that image rocks but 90% of Flux loras ARE over cooked. Two or three anons post links to some nicely trained flux loras quite often.
>>
>>102722412
This working on Forge yet?
>>
>>102723148
normally it should, it's the same flux architecture except that you just have to crank the CFG up and put the distilled guidance at 0 I guess
>>
>>102722412
>using CFG
Won't that make gens even slower than they already are? Doesn't really feel like a win. I can ask chatgpt to expand my prompts just fine.
>>
Honestly Forge is such a piece of shit.
>slow
>wax skin
>butt chin
>doesn't just work without writing novels
>cant control what LM understands in your novel (no weights)
>>
Next Bred

>>102723260
>>102723260
>>102723260
>>
>>102723253
>Won't that make gens even slower than they already are?
it does, CFG allows you to go for negative prompts though
>>
>>102722943
thank you, I will give it a shot!
>>
File: file.png (1.9 MB, 1024x1024)
1.9 MB
1.9 MB PNG
man
is pony literally fucking unable to do shit that isnt so detailed?
I've tried any amount of tags, this is all it outputs
aaaaaaaaaaaaaaaaaaa
please recommend a more 2D model please please
>>
>>102725348
>>102723271



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.