File: tmp.jpg (967 KB, 3264x3264)
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>102405949

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out

>Model Ranking

>Models, LoRAs & training


>Pixart Sigma & Hunyuan DIT
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality

>Related boards
Blessed thread of frenship
File: Sigma_14651_.png (2.25 MB, 1024x1024)
File: Sigma_14660_.png (1.61 MB, 1024x1024)
File: Sigma_14670_.png (954 KB, 1024x1024)
File: Sigma_14629_.png (764 KB, 1024x1024)
how come percentile_clipping and block_wise wont work with regular Lion? Is PagedLion8bit so much different?
Open source tools most likely plus custom loras.
File: 00060-2457933165.png (2.05 MB, 1152x1536)
File: 00003-48613435.jpg (173 KB, 1080x1280)
When is this shit going to be good enough to generate doujins for works that somehow don’t have any despite practically crying out for them?
File: 00005-4205403894.jpg (140 KB, 1080x1280)
File: 00006-2321852529.jpg (138 KB, 1080x1280)
my slowpoke self only just got the time to look up the full specs sheet of the 4070 compared to the 3090
how the fuck does it have half the cuda cores of the 3090? the ((marketing)) claims are that it's at least faster due to being a newer architecture along with other things but, in reality, is that even true? i can't find any fucking direct comparisons and data is all over the place with people's wildly different setups, but the general consensus solely places the 3090 as a better value for vram alone, but how does that stack up versus a card that might be faster regardless of less cuda cores and vram?

>people still using sd 1.5 in 2024 at 512x as if that helps get an idea of the performance differences

>vergil is MFW this entire rabbithole
File: 00081-2457933167.png (2.54 MB, 1152x1536)
uncanny stuff
File: 00011-2545065371.jpg (105 KB, 1080x1280)
I bet the inference speed would be pretty close between a 4070super and a 3090. The 4000 series clocks are a lot higher than the 3000 cards. 12gb of ram is tough though if you are actually going to gen a lot.
I've noticed something interesting with flux for anime images. The resolution you generate at influences the style. High resolutions have more visually appealing, high-quality style overall. And I don't mean that higher res just means the image captures more fine details, obviously that's true. I mean the actual style the image is drawn in changes. Generating in lower res is not the same as just a scaled-down version of a higher res gen. But for photos, I don't notice any difference.

I think this is because flux was trained at whatever resolution the image was natively in. So a 512 res anime image on the internet is likely to be kinda shitty and low quality. But a 1024 image, on average, had more effort put into it. For photos this isn't true, a lower res photo might have just been scaled down to fit on a webpage.

It's kind of annoying. If you're using lower res to quickly iterate on prompts, you're not really seeing what flux is capable of with anime.
File: 0.jpg (274 KB, 1016x1024)
File: ComfyUI_01586_.png (1.49 MB, 1024x1024)
File: ComfyUI_01472_.png (1.98 MB, 1024x1024)
so whats a good resolution for anime images?
896x1152 seems good for portrait aspect ratio. I've barely tried higher than that, mainly because it's so fucking slow.
File: ComfyUI_01338_.png (1.42 MB, 1024x1024)
File: 00056-377886493.jpg (186 KB, 1080x1280)
File: 00059-4231916271.jpg (153 KB, 1080x1280)
On A1111 is there some way to send a pic to inpaint sketch? It doesn't seem to be a default option and I can't find an extension for it so I feel like I'm missing something obvious. I don't want to always save and open a pic to send it there, I'd like to do it directly like with regular inpainting
File: 00065-4095912729.jpg (189 KB, 1080x1280)
File: 0.jpg (251 KB, 1024x1024)
from txt2img there is a button, you need to hover over it.

in img2img the buttons are around the input image.
File: 1726543760.png (1.28 MB, 1024x1024)
File: ComfyUI_33677_.png (1.29 MB, 768x1024)
File: 1726543790.png (1 MB, 1024x1024)
1 MB
Isn't it supposed to be ~one gen - one decimal difference, two gens - two decimals? So 3090 ~ 4080 ~ 5070
File: file.png (17 KB, 1968x43)
4090 noob here, is this the model I need for Flux LoRA training?
Epoch [1/100], Step [10], Loss: 2770.3979, Importance Loss: 150.7213, Perceptual Loss: 1.0613, Total Loss: 2785.4805, EMA Loss: 3282.3598, LR: 0.00000039, Grad Norm: nan, Max Grad: inf, Param Norm: 982.0561, Time/Step: 8.18s

Patches with importance and perceptual loss bb.
You need the fp8 version or else the training is going to be a lot slower.
File: dino_00061_.png (977 KB, 1024x1024)
OK, thank you
My queen
someone please make a krill myself meme with Flux where the krill breaks his handcuff chains
hibernation mode
Why would that matter?
because just one image isn't enough to make a conclusion, to be sure something is worse than the other, it must be consistently worse, not just once I guess, nta btw
It's cool we are technically able to run PuLID on ComfyUi but, I'm seeing a requirements.txt + requirements_fp8.txt, I guess it's not working on GGUF quants right?
looks like anon did multiple tests https://desuarchive.org/g/thread/102253191/#q102257955
still trying to figure out which xformers works with torch 2.3.1 tho
>still trying to figure out which xformers works with torch 2.3.1 tho
If I remember well, the newest version of torch and cuda, especially cuda 12.1 aren't compatible with xformers anymore
File: file.png (1.22 MB, 3826x1782)
doesn't seem to be working for me
Odd in a good way
nuking the venv and manually installing deps seems to have done it
I highly suggest you to update ComfyUi now, his new commit now allows us to change a lora or a lora strength without any unload/reload now
my images aren't the same as before but also different than when using the newest torch
the only difference should be missing xformers but that shouldnt change the output
File: file.png (3.45 MB, 3185x1612)
>my images aren't the same as before but also different than when using the newest torch
that's normal, every new version of torch means different outputs
File: file.png (90 KB, 1404x438)
if you use the GGUF node it keeps unloading though, to prevent that you can use the Force device
im meaning my NEW outputs using 2.3.1 dont match my OLD outputs that also used 2.3.1 perhaps there's something i'm missing but i'm done fucking around with my venv for today
oh, well if you're using xformers, this shit isn't deterministic so you get different pictures each time even on the same seed
>this shit isn't deterministic
>xformers has been deterministic for quite a while now, they fixed the thing from the old 2022 versions where the image changed on every gen

anyway, output is the same with or without xformers but still not the same as my previous install with 2.3.1
well, even if it's deterministic now, it doesn't mean that output + xformers = output, maybe xformers changed the picture anyway
File: file.png (1.87 MB, 1024x1024)
What if George Costanza and Tidus have a son?
Tidus laugh scene or we're not BACK baby
I generate 100% women.
Can't I just train a basemodel with 20tb of women images?
Do I need pictures of hubble space, swiss cheese, caves in thailand, chinese wall etc. in my dataset?
Does my model need this knowledge to generate boobs? Pls smart anon, answer me.
Ask for a friend
Only if you want such things in the background of your tits model. The cost of training a full model isn't really worth just making some Loras though
>be me
>presumably using torch 2.3.1
>gen image, call it output A
>wow this looks really good
>ff to august
>gen image with output A's workflow
>receive entirely different image, call it output B
>oh it looks like the problem is a torch update
>ff to today
>finally have the urge to downgrade torch to 2.3.1
>gen image with output A's workflow, receive another completely different image to output A or B, call it output C
>maybe xformers has something to do with it
>install xformers
>receive same image as output C
this idea is likely retarded as it only treats the symptom and not the root cause but i wish metadata included version numbers of pip packages used so i can click a button that goes "one of the updates messed with the output, please revert everything to what it was when i generated this image"
>i wish metadata included version numbers of pip packages used so i can click a button that goes "one of the updates messed with the output, please revert everything to what it was when i generated this image"
that would be a horrible idea, not everyone has the same version of each package
What % of flux training is data that I don't need? 99%? So if I only need 1% of the data, wouldn't the model be much smaller and therefore much cheaper?
Would still be very expensive, but not millions?
maybe prompt the user if they'd like to match their venv/install with the one used for the loaded output/workflow if its not the same? i did say it was a retarded idea... regardless that wont solve my current problem :( i just want to be able to generate the same images that i was back in july :(
Wake up, find out CogX img2vid is pretty much released (theres a pth file that needs to be safetensored). Not a word in the thread.
*shocked pikachu face*
Wait until you find out what lousy quality the thing generates. Then you'll never talk about it again. That's probably why you don't read anything here.
>i just want to be able to generate the same images that i was back in july :(
I suggest you to reinstall ComfyUi and then go back to 2.3.1 (because the new default one is 2.4.1), maybe it'll work
wake me up when there's a decent video inpainting workflow
>Not a word in the thread.
the last thread talked about it anon >>102415243
File: 1726474928926460.webm (754 KB, 1280x720)
I mean, now that we tasted MiniMax, everything else looks like a fucking toy not gonna lie
i changed requirements.txt to use 2.3.1 (and that's what
 pip list 
shows as well) so that's what leads me to believe it's something other than torch or xformers as in theory everything is the same as what it was in july. but there's really no way to check it's for sure the same, is there? did something with comfy itself change outputs?
Pytorch updated?
The new versions generate significantly poorer quality. There was a thread with comparative images in /r/stablediffusion.
The difference is huge, no idea what they fucked up.
One day you will be able to easily impregnate anyone you wish
Well, that'll teach me for not reading old threads while i'm drinking my morning brew, ty.
Fair comment, going to give it a whirl anyway just so i can get the full depressive experience.
File: bolter_.png (3.25 MB, 1696x1696)
File: Flux.1_00010.png (1.49 MB, 896x1152)
follow the reply chain, everything should be as it was before >>102422406
>The difference is huge, no idea what they fucked up.
I've seen the same complaints a year ago with even older versions of Pytorch, feels like the more the time passes, the shittier our images will get, that's fucking grim
File: 1720697341987443.png (491 KB, 500x500)
if someone made a comfyui to aftereffects bridge that day could be TODAY
it's my fault for not backing up before updating desu
Uh I was writing you bastard.
Karma that fucks up your fucking comfy, hope it stays that way fucker. ;)
go eat brekky, anon
/ldg/.. my sweet home..
At least say which UI goofed up again
File: Flux.1_00012.png (1.93 MB, 896x1152)
Anon says flux can't into dynamic poses but these are pretty decent
Anon says a lot of things
I cant seem to get the lettering tho
anon says the darndest things
File: ComfyUI_00336_.png (1.53 MB, 1024x1024)
>Anon says flux can't into dynamic poses
dunno why they say that, flux is way better than anything we got so far at poses
I hope you figure it out anon this thread is severely lacking jlaw imgs.
>flux is way better than anything we got so far at poses
arguable, flux isn't great for it's aesthetics, but more complex prompt adherence
File: 1726510282143.png (2.51 MB, 1024x1024)
just set guidance to 1 and cfg to 5
Using a lora?
>just set guidance to 1 and cfg to 5
it was guidance 1 + cfg 1 for the ultimate kino kek
yeah, is it coz of that?
>implying I can run flux
how much vram do you have anon? there's the GGUF quants now you can run this shit on 8gb of vram without a lot of quality decrease
File: ThatFeeling.jpg (1.41 MB, 2330x1472)
Could be everso overtrained. We only just recently figured out targeting specific blocks doesn't fuck up the hands. I dunno.
I have 8vram precisely but I've had no luck running quants. They run on par or slower (due to loading) than nf4 in my case.
File: Flux.1_00017_.png (1.61 MB, 896x1152)
I guess, picrel is a cherry picked example with loras
which one?
Its vintage abstract film
File: ComfyUI_temp_jaoco_00004_.png (2.8 MB, 1520x1520)
File: 20240917_104611.webm (119 KB, 720x480)
119 KB
Started off ok, then got chinkyfied, bloody Chang and his "locally sourced" datasets eh.
Kind of the theme of "Uglies", but in reverse.
I'd say not too shabby either way
not too bad.. the examples on leddit are shit tho
AI is only good when you exploit what's unique to the medium.
Well said. It's funny how instead of using it to create new things you wouldn't be otherwise able to do traditionally, people are more keen on imitating what we're already familiar with.
I much prefer to remix in a funny way what's existing than trying to make something completely new, Will Smith spaggheti meme got so popular for a reason, it's just fun to see him struggle with something as basic as eating spaggheti
File: file.png (1.74 MB, 1024x1024)
nah I can't make it work, guess that the guy hasn't finished his repo yet
>it's just fun to see him struggle with something as basic as eating spaggheti
...which is unique to the medium
>people are more keen on imitating what we're already familiar with.
>Will Smith eating spaggheti
>which is unique to the medium
How having Will Smith eating spaggheti completely unique? It's a completely mundane scene

