[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: long dick general.jpg (2.04 MB, 2465x3264)
2.04 MB
2.04 MB JPG
Black Forest Pt. 3: Localchads Won Edition

Discussion of free and open source text-to-image models

Previous /ldg/ bred : >>101674851

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Kolors
https://gokaygokay-kolors.hf.space
Nodes: https://github.com/kijai/ComfyUI-KwaiKolorsWrapper

>AuraFlow
https://fal.ai/models/fal-ai/aura-flow
https://huggingface.co/fal/AuraFlows

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>GPU performance
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/trash/sdg
>>
blessed thread of frenship
>>
File: 1722080365983147.jpg (86 KB, 740x1232)
86 KB
86 KB JPG
Hail nigga forest . They cooked
>>
official pixart bigma, lumina 2 and hunyuan finetune waiting room, now with flux 12b fp8
>>
Anyone know what resolutions it's trained for?
>>
>>101678411
>now with flux 12b fp8
it's already here
https://huggingface.co/Kijai/flux-fp8/tree/main
>>
>>101678458
any it seems. even "long" pics in either direction work
>>
>>101678470
Still, I'm a bit suspicious it might be making the model a bit worse/dumber. I'd rather just gen at the proper resolutions.
>>
File: 806.png (1.05 MB, 1768x312)
1.05 MB
1.05 MB PNG
>>101678458
all of them it seems
>>
>>101678465
For the retards here; what specs does this version run on?
>>
File: asasa.jpg (13 KB, 301x429)
13 KB
13 KB JPG
What's the best scheduler for euler?
>>
>>101678529
a bit more than 12gb of vram
>>
>>101678518
I'm not sure it can go too far though, 2048x2048 gives duplication
>>
File: 8169372046.png (1.04 MB, 304x2008)
1.04 MB
1.04 MB PNG
>>101678537
I think the only ones that work are simple and sgm_uniform they are (practically?) identical
>>
>>101678271
I'm so sick of this ritual posting
are you d*bo??
>>
>>101678590
wow thats a crazy res
>>
File: file.png (1.32 MB, 1280x800)
1.32 MB
1.32 MB PNG
>>
File: Capture.jpg (396 KB, 3087x1688)
396 KB
396 KB JPG
>A worker gives coins to a pharmacist in the street, and a sign reads: "How ironic, don't you think?", anime style
Sometimes it has great prompt understanding, sometimes not, it lacks a bit of consistency
>>
>>101678632
boring reality btfo
>>
>>101678632
no need for boring reality
>>101678638
why only one clip
>>
File: Capture.jpg (35 KB, 922x628)
35 KB
35 KB JPG
>>101678651
>why only one clip
what do you mean? I should put the text on the 2 of them?
>>
>>
File: 37.png (9 KB, 310x147)
9 KB
9 KB PNG
>>101678669
>>
File: file.png (1.91 MB, 1280x800)
1.91 MB
1.91 MB PNG
>>
File: Capture.jpg (339 KB, 3230x1485)
339 KB
339 KB JPG
>>101678695
I already have that
>>
>>101678669
try it out and see if its better
>>
>>101678705
Crosswalk is almost perfect
>>
>>101678718
nothing changed kek
>>
>>101678529
>>101678554
Haven't tried the fp8 on disk version yet but have been loading with fp8 and running fine on 3080 10gb card (both dev and schnell). Might need --novram when launching Comfy.
>>
File: file.png (1.65 MB, 1280x800)
1.65 MB
1.65 MB PNG
>>
File: 824195.jpg (428 KB, 2144x1072)
428 KB
428 KB JPG
>>101678711
shouldn't you put the prompt on both fields?
left: bottom field only
right: two fields
a man a woman holding hands
>>
File: 319.png (2.69 MB, 1072x1072)
2.69 MB
2.69 MB PNG
>>101678757
top field only
LMAO
>>
File: file.png (1.2 MB, 1280x800)
1.2 MB
1.2 MB PNG
>>
File: file.png (1.58 MB, 1280x800)
1.58 MB
1.58 MB PNG
>>
>>
File: ComfyUI_00118_.jpg (972 KB, 3283x1504)
972 KB
972 KB JPG
>>101678757
Holy fuck you're right... but that's retarded,why didn't he make a single text box that applies for the both of them? that's annoying to have to copy paste the prompt everytime...
>>
File: file.png (1.7 MB, 1280x800)
1.7 MB
1.7 MB PNG
>>
File: file.png (1.82 MB, 1280x800)
1.82 MB
1.82 MB PNG
>>
Why does it load and unload for each gen? I have 54gb of ram and 24gb of vram and I'm runing the DiT model on fp8
>>
File: 4645455630.png (18 KB, 742x172)
18 KB
18 KB PNG
>>101678871
I use it like this
>>
>>101678902
yeah but you don't have the guidance scale on the classic CLIP Text Encode
>>
>>101678871
>>101678916
right click convert widget to input, then feed a text box into both
>>
>>
File: Capture.jpg (29 KB, 927x308)
29 KB
29 KB JPG
>>101678923
which one?
>>
File: file.png (727 KB, 1280x800)
727 KB
727 KB PNG
>>
>>101678554
Does comfy support CPU or Vulkan?
If so, anyone tried a gen on CPU? How many days did it take?
>>
File: Capture.jpg (124 KB, 2182x873)
124 KB
124 KB JPG
>>101678923
that's it it's good to go?
>>
>>101678299
ugly anime desu
>>
The only problem is the lack of knowledge in anime culture, pop culture, the mutant foots in some poses, and the censorship for NSFW. So the next step is finetune the model, I thing only the Pony guy and a pair more could do a proper fine tune with their data.
>>
>>101678964
I tried cpu earlier with a 10900K and canceled. For 1x 1024x1024 it showed 80 minutes.
>>
File: file.png (499 KB, 1280x800)
499 KB
499 KB PNG
>>
>>101678991
we've been waiting on that next step for quite some time now. prepare to wait quite some time longer
>>
>>101679009
Sonichu?
>>
File: Capture.jpg (137 KB, 1844x1336)
137 KB
137 KB JPG
>>101678923
something like this?
>>
>>101679070
good job anon you figured it out
>>
>>101679075
It gives me different outputs though, so I'm not sure it's the good way of dealing with it
>>
File: Capture.jpg (295 KB, 2788x1526)
295 KB
295 KB JPG
>>101678901
Yeah, that's twice as long because of the loading shit, how to fix that??
>>
File: file.png (633 KB, 1280x800)
633 KB
633 KB PNG
>>
>>101678997
I'm a diffusion pleb. Are they compute bound or memory bandwidth bound (like LLM's)?
Trying to work out if efficiently leveraging an AMD APU could accelerate generation significantly or not.
>>
File: Capture.jpg (324 KB, 3248x1622)
324 KB
324 KB JPG
For those who want those nodes (it has the negative prompt + guidance scale + a simple text for both clip and t5xxl) I give you can get the metadata here:
https://files.catbox.moe/imf60c.png
>>
>>
>>101678250
Holy shit I took a break while waiting for Bigma and what the fuck just hapoened here bros, did the Dalle weights just drop?
>>
>>101679295
Flux dropped and it's what SD3 should've been. Upper half uncensored and really good quality and prompt comprehension.
>>
>prompt goth
>instantly turns into sloppastyle
grim, but perhaps deserved
>>
>>101679172
all this anon knows is AMD has pisspoor support
srry imggen bros are mostly retarded
>>
File: ComfyUI_temp_cdifl_00027_.png (2.27 MB, 2400x1600)
2.27 MB
2.27 MB PNG
>https://github.com/comfyanonymous/ComfyUI/commits/master/
>Hack to make all resolutions work on Flux models.

So I updated Comfy to get this commit from an hour ago, and now Flux can directly generate coherent 2400x1600 images apparently. Probably even higher, though I haven't tried yet. What the fuck?

Even the hands are perfect which is INSANE at this res, any other model you'd be lucky to get a coherent face at 2K without hiresfix let alone hands.
>>
>>101679317
>Flux dropped and it's what SD3 should've been.
this, I wished it wasn't a 12b model though, that's too big, 10b would've been the sweet spot
>>
>>101679295
>did the Dalle weights just drop?
It sure does look like it doesn't it
>>
>>101679222
Thanks anon, much appreciated :)
>>
File: ComfyUI_00133_.png (3.1 MB, 2048x2048)
3.1 MB
3.1 MB PNG
>>101679368
Oh nice, I just made a 2048x2048 res output, it's only asking me for 15gb of vram for the fp8 DiT model, that's cool to see how much we've improved in a single day kek
>>
So going forward I guess we're only going get models that are great at coherence but have a slopped sovlless style and can't imitate even public domain artists very well? Kind of sad desu
aesthetics were always more important to me than coherence, and it seems like we're going backwards on aesthetics
>>
>some poor anon still using 1.5
>>
i really hope this model gains traction. it's getting a lot of attention but it's still so expensive to finetune. hopefully some people come along thinking "now's our chance!" and finally get cooking. this really does feel like local's dall-e moment, it's a bit step up in a lot of ways
>>
>>101679368
>>101679448
>dalle3 at home made high res obsolete
that's cool to be able to render high res images like that, it will help for details that's for sure
>>
File: file.png (1.15 MB, 1280x800)
1.15 MB
1.15 MB PNG
>nogen doomposter is upset he has to describe images with words instead of using artist names
>>
>>101679368
Just got into Comfy earlier today. How do you update it? Does a simple git pull just work or are there any other commands I should do?
>>
>>101679490
holy fucking cope
>>
File: 1706405090986084.png (544 KB, 512x768)
544 KB
544 KB PNG
>>
>>101679524
Say sike
>>
>>101679514
just use the "Direct click to download", on that zip you have an updater.bat ready to be used
https://github.com/comfyanonymous/ComfyUI?tab=readme-ov-file
>>
File: fluxDiffAttempt.png (39 KB, 600x851)
39 KB
39 KB PNG
>>101677984
https://github.com/motexture/FluxDiff

if you get an error with fbgemm.dll either dl random libomp140.x86_64.dll on pytorch forums or get it from comfy venv

by default it'll download 16.8gb model + vae as .bin [.cache\huggingface\hub\models--motexture--FluxDiff\snapshots]
and then it'll expect a .safetensors (either fix it in code or just duplicate and rename, otherwise it'll redownload .bin again)

tested on 12g vram, you can try to cast/offload

also, reforge dev is working on implementing flux image model
>>
flux inpainting model when
>>
>>101679447
it's a pleasure o/
>>
File: file.png (927 KB, 1280x800)
927 KB
927 KB PNG
>doomposter thinks I'm talking about him
>>
fp8 feels pretty good, what i was hoping for when i first tried the model. no unloading bullshit
>>
>>101679531
I use Linux...
>>
File: tadpoles.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
>A bunch of tadpoles swimming in a pond
bake. again
>>
>>101679553
>no unloading bullshit
you also have that anon? is that a bug or something? that makes me so annoyed... >>101679126
>>
>>
>>101679070
How do you get the Prompt box?
When I search, I just see CLIP Text Encode (Prompt) show up
>>
>>101679618
just use this metadata you'll get everything, and I double clicked on the dot near t5xxl to make the prompt text appear >>101679222
>>
>>101679573
I had it with 24gb vram, it seems the fp16 is just slightly over. One of the nodes should have an option to switch the fp8_e4 or some shit in the model loading node, that allowed me to fit it comfortably without it reloading every prompt. If that isn't enough and you have less vram then idk, there might be another solution i think i saw some shit about 12gb vram somewhere
>>
Any way to load the text encoder on one gpu and the image model on another for flux? I'm capping out with 24GB of VRAM.
>>
File: ComfyUI_00136_.jpg (696 KB, 2692x1488)
696 KB
696 KB JPG
>>101679448
>>101679368
you lose a lot of prompt understanding if you go to far though, I guess it works great when it's a simple scene though
>A concert with Donald Trump as bassist and Hatsune Miku as singer, the audience is ecstatic and all raise their hands to the sky.
>>
File: file.png (1.41 MB, 1280x800)
1.41 MB
1.41 MB PNG
>>
>>101679542
Appreciate the effort, but all I could get were wonky frames like a sketchy cartoon style even with realism. Not a lot of motion.
>>
i really like that one
>>
>>101679644
I have this loading -> unloading shit for 24gb vram + 56gb ram and fp8 DiT + fp16 text encoder :(
>>
>>101679542
wait, we can already use their text to video model locally?
>>
>>101679638
Based, ty anon I was using the comfy example and there was no negative prompt on there.
>>
>>101679435
Just looking at pics in this thread, Dalle 3 is obsolete and so is MJ v6.1, this model very much looks like what ClosedAI was planning to release with GPT 4o (which never came out), this is a massive win for local.
>>
>>101679761
Issue is that its so big 99% of people are not going to be able to train it.
>>
sieg heil danke deutscher mann
>>
File: file.png (1.25 MB, 1280x800)
1.25 MB
1.25 MB PNG
>>
>>101679750
That's because this model is supposed to work with only a cfg = 1, and cfg = 1 means you can't use negative prompt, it works at higher cfg though, just be careful to not fry your picture >>101679669
>>
File: ComfyUI_00007_.png (853 KB, 1024x1024)
853 KB
853 KB PNG
SAI bros... not like this...
>>
>>101679761
>so is MJ v6.1
I would like this to be true but it doesn't have anywhere near the art aesthetics or style understanding of MJ 6.1
Amazing coherence but yeah, the style and soul are just not there for art
>>
File: ComfyUI_00017_.png (945 KB, 1024x1024)
945 KB
945 KB PNG
Nice night for a walk
>>
>>101679808
For realistic shit this model is easily API level, only MJ is better and not by much, we're so back
>>
>>101679783
At the end of the day the question is how trainable is it. If it's like SDXL then it's a problem. If it's like Sigma then it's not. SDXL was extremely slow and took forever to figure out new concepts. Pixart is relatively fast and learned new concepts fairly quickly. If the only problem is renting a 80 GB GPU then people will swallow it if they get their money's worth in a day or week.
>>
File: file.png (337 KB, 1280x800)
337 KB
337 KB PNG
>>
>>101679722
This is something else, the name is just a coincidence I think. Flux=Flow, etc
>>
Can flux do coom yet?
>>
>>101679860
Not great, needs to be finetuned
https://files.catbox.moe/3pbilx.jpg (embed)
>>
>>101679860
it can do tasteful PG-13 / R-rated coom out of the box
>>
File: ComfyUI_00009_.png (1.45 MB, 1920x1080)
1.45 MB
1.45 MB PNG
>1920x1080 works
We eating good tonight chads
Good for them to release over the weekend too
>>
>>101679860
Yes it can
https://files.catbox.moe/b09u2v.png
>>
>>101679803
That's super useful anon much obliged. What cfg do you recommend for this? Default (3.5)?
>>
>>101679885
>>101679871
>>101679873
Guess it's time to reinstall, hope the macbook can take it.
>>
>>101677984
literally malware
>>
>>101679871
>Not great
you're joking? the anatomy is almost perfect, this will be a blast to finetune it
>>
File: file.png (1.43 MB, 1280x800)
1.43 MB
1.43 MB PNG
I know there's going to be some gems in this dataset.
>>
>>101679892
>What cfg do you recommend for this? Default (3.5)?
You're talking about the guidance scale, that's not the CFG, if you want that it's on this metadata >>101679222
Even on the API they put CFG = 1 that's why they didn't display negative prompt

And desu for the value of cfg it fry the model really quickly, I'd go for the lowest value, 1.1 so that you can still get a great picture + being able to use the negative prompt
>>
>>101679172
I believe that diffusion is more compute bound than LLM. Diffusion uses few slow evaluations (~steps), while LLM require lots of fast evaluations (one for each token ~ word).
>>
Anyone else getting unbelievably slow gens for flux? It's only using 14GB/24GB so that's not the issue, but 20 steps takes 10+ mins
>>
>>101679929
You're definitely on the CPU. On a 4090 Shnell was 20s and Dev was 50s
>>
>>101679885
Should have known the second I saw no "Safety" section in the release
>>
>>101679929
yeah that's an issue, for me the inference takes 30-40 seconds, but the problem is the loading -> unloading, why the fuck does it do that? :(
>>
>>101679871
Topless seems to work fine.
Bottomless on the other hand is proving challenging. For example:
>>101679885
Note the one on the right, model can be situationally prude.
>>
>>101679959
>Note the one on the right, model can be situationally prude.
yeah, desu it will be easily fixed with a finetune, the model doesn't seem to be that brainwashed compared to SDXL for example
>>
>>101679458
>So going forward I guess we're only going get models that are great at coherence but have a slopped sovlless style
patience is a virtue
>>
after testing a whole afternoon I think Kolors (with automated LLM prompt translation into chinese) is much, much better than Flux for art

but Flux is going to be the new gold standard for photography and memes
>>
File: file.png (1.53 MB, 1280x800)
1.53 MB
1.53 MB PNG
>>101679959
There is little to no genitals in the dataset and they certainly didn't train on those words. So if you want them you're going to have to dig deep and be clever. It's almost smart enough that you can poorman generate them by description.
>>
>>101679919
kek, nice gen anon
>>
returning oldfag here
haven't been into imagegen for a couple years, what's the best way to prompt nowadays? i'm more used to the "tag-style" prompts like "a beautiful white woman, blonde hair, blue eyes, cinematic, studio lighting, hyperrealistic, 4k uhd, award winning, kodak film" etc.
but a lot of the examples i see on newer models have full sentences, especially on "demo prompts" that often have a hilariously large amount of adjectives and fluff like "A stunning and beautiful white woman stands in the dramatic, breathtaking, pronounced cinematic lighting. Her thought-provoking expression stands in stark contrast to the plain background - an enchantingly magical pure white."
whereas some of the prompts i've seen here are more normal but still have a more "sentence-style" structure
do modern models work best with natural language sentences or is the tag style still the best method?
>>
>>101679999
Use both
>>
File: Capture.jpg (153 KB, 2796x737)
153 KB
153 KB JPG
When I simply change seed, there's no unload -> reload, but when I change the prompt, the unload -> reload starts again, hmm...
>>
>>101680021
Prompt has to be encoded which is a different model. Seed just changes the color pattern the generation starts with.
>>
>(SUSPENDED:1.2)
>>
>>101679973
Agreed with everything else this model really shines, it's the last missing piece
>>101679925
Holy shit the negatives work now! Ty anon
>>
lol the devs fucked it up so bad, really how can they release the model? they will get the shitstorm of the decade in a week
>>
>>101680021
maybe when it's reusing the same prompt it just keeps the saved clip embeds that it generated earlier and gets rid of the model weights
so when the prompt changes it needs to load the clip weights again
>>
>>101680045
still, it's not normal to have this constant loading -> unloading, some anons don't have this issue though
>>
File: RTX 6090 Ti.png (1.06 MB, 768x1216)
1.06 MB
1.06 MB PNG
Are you ready?
>>
>>101680053
What's wrong with it?
>>
>>101680060
omg 26.5 gb vram
sugoi!!!
>>
>>
File: ComfyUI_00015_.png (1.72 MB, 1920x1080)
1.72 MB
1.72 MB PNG
Model is b*sed
>>
huh, crazy how good it is at text. better than proprietary models
>>
>>101679959
Bottomless is mostly featureless it seems, when it works
https://files.catbox.moe/lk0zai.png
>>
>>101680092
omg it looks great, maybe the nipples are a bit weird though
>>
>>101680021
Been a while since i used Comfy, but doesn't it have some strategy to save memory by unloading models? You may be hitting your memory limit. There was an option to not unload models from memory i think, but you may swap or just OOM instead of reload.
>>
File: file.png (1.2 MB, 1280x800)
1.2 MB
1.2 MB PNG
>>101680053
SAI survived SD 1.5. With the Olympics and the election Twitter won't even have time to stir up the Staceys to hand wring about consensual images. Also it barely knows celebrities so you're not going to see much on that front. Actually shocked it knows politicians.
>>
>>101680061
it can generate cunny apparently
>>
>>101680092
This seems like something that could actually be fixed by loras, it isn't deliberately ruined.
>>
File: Capture.jpg (341 KB, 2675x1774)
341 KB
341 KB JPG
>>101680110
No I'm not hitting any limit, I still have room to spare and it just wants to unload for some reason
>>
>>101680122
It can't >>101680092
>>
>>101679938
Bizarre, both my CPU and GPU are pinned
Not sure what's going on
>>
>>101680114
>With the Olympics and the election Twitter won't even have time to stir up the Staceys to hand wring about consensual images.
that's so true, it's the perfect moment to release that model
>>
File: 1704386294103.jpg (249 KB, 1024x1024)
249 KB
249 KB JPG
How do we deal with the existential dilemma of images that are close to perfect (with respect to personal style choices), but just slightly flawed, yet correcting the flaw would take 20x more effort than simply generating 100 new images?

I just feel such a constant strange disconnect. I make a hundred images, each of which is flawed in some tiny way so that none are perfect, and none can be really chosen as "the best". They're all just different interpretations of an idea. I really wonder if this is going to cause some kind of psychological damage.
>>
>>101680155
You're playing with a gacha machine, just go with the flow.
>>
>>101680129
I think i found the option:
>https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/cli_args.py#L117
--disable-smart-memory
Give it a go anyway.
Can comfy normally split the model into multiple cards?
>>
>>101680155
nice style, what was the model and prompt to get this style?
>>
>>101680173
>--disable-smart-memory
trying this now, praying multiple cards exists because i have a handful of 12ish gig ones and im kicking myself for not just getting a 3090
>>
>>101680178
dalle
>>
tryna generate trump punching hillary in the face during a boxing match, but it keeps making them friendly
>>
>>101680173
Now it free up the VRAM when the gen is over, it's worse lol
>>
>>101680222
>but it keeps making them friendly
I noticed this too when trying to make a gigachad laughing and pointing at a fat whore
It just made them buddy buddy
>>
File: 1.png (2.2 MB, 1232x856)
2.2 MB
2.2 MB PNG
>>
>>101680226
Kek.. sorry about that. Have you ever genned with comfy and seen it use more than one card? Can it actually use more than one? it's a 24gb model...
>>
>>
>>101680222
getting closer
really impressive likenesses and body coherence either way
>>
>>101680255
I'm on a fp8 mode, so it only uses 12gb of vram, that's enough for my 3090, and Comfy doesn't have a multigpu feature so the 3060 I have is just sleeping
>>
File: Capture.jpg (315 KB, 2384x1566)
315 KB
315 KB JPG
It sad this model doesn't know much trivia about styles, I ask it to make a ps1 render and it completely ignores that, even such aesthetics is ignored
>>
File: 2414586.png (1.13 MB, 768x768)
1.13 MB
1.13 MB PNG
>>101680286
yeah, those models are too tuned for "aesthetics" to appeal to people
>>
>>101680267
It still needs some extra buffers in memory to keep the latent and do some work.
i also found this:
>vram_group.add_argument("--highvram", action="store_true", help="By default models will be unloaded to CPU memory after being used. This option keeps them in GPU memory.")
--highvram. May as well give it a go. There's also --gpu-only, but i don't know how they interact with each other.
>>
>>101680136
>>101680122
It can generate photorealistic children with nipples, there will be a shitshow about this.
>>
>>101680328
I don't think anyone cares about that anymore
>>
>>101680319
I tried highvram, it forced the gpu to take both the image model + the text encoder into the vram, of course that's impossible that's over 24gb
>>
>>101680335
lol
LMAO
>>
>>101680346
People care about
>OMG [THING] CAN DO [NEW THING]
They already think models can do this.
>>
File: 98377712.png (734 KB, 735x873)
734 KB
734 KB PNG
>>101678250
damn buddy...
>>
>>101680353
All it takes is one random person to push it into the public conscience and all the normies will be out for blood
>>
when are we going to get a model that can actually UNDERSTAND the prompts like LLMs can?
like if i type in "a table but deflated like a balloon" i want to see a table deflated like a balloon but instead it just gives me normal tables ("table" is an example, not the actual prompt i tried)
the model can't seem to understand some foreign concepts like that, while LLMs can more or less easily grasp the concepts easily
>>
>>101680328
Why are you pretending that the SD models can't do that since 2022?
>>
>>101680367
That is true of literally everything, this is old news. Unless someone (You?) decides to push it hard, nobody cares about AI slop anymore.
>>
>>101680374
>when are we going to get a model that can actually UNDERSTAND the prompts like LLMs can?
for that, we'll need better models than t5xxl and go for llama9b for example
>>
>>101680374
LLMs don't understand stuff like that either
>>
>>101680328
Mindblowing revelation: If you train a model on X and Y, it can generate X+Y.
There is no cure for this short of shortcircuiting the laws of logic and physical reality (which some people try).
>>
File: 36.png (1.16 MB, 336x1904)
1.16 MB
1.16 MB PNG
>>101680374
Give a research team 1 billion dollars and they will do it for you in 2 weeks
>>
>>101680414
lmao
>>
>>101680344
>I'm on a fp8 mode
>Comfy doesn't have a multigpu feature
>of course that's impossible that's over 24gb
You're trying to do the impossible, then. I expected fp8 would ~half the size. Even if all the models end up being 12gb or a little under, the thing needs some working memory to do the work.
>>
>>101680414
impressive, very nice
>>
Does 3 mins an image sound about right for a p40?
>>
>>101680431
but why does it free some memory when doing a new gen? I have enough room to spare, it's not like a new gen will ask more memory than the first one >>101680129
>>
>>101680414
im crying
>>
>>101680092
this could be fixed with inpainting using normal pony model i guess, a bit inconvenient but better than a kick in the teeth.
>>
File: 5.png (1.62 MB, 1560x520)
1.62 MB
1.62 MB PNG
>>
>>101680129

Maybe your getting screwed by Nvidia driver memory management. I rather OOM than have Nvidia driver screw thing up. Look up how to disable it or roll back to before Nvidia add that feature.

https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion
>>
>>
File: ComfyUI_08693_.png (883 KB, 1368x768)
883 KB
883 KB PNG
i fucking hate my life
>>
>>101680222
success
can't get hillary to look as beaten up as I'd like, but I'm declaring victory on this prompt
>>
File: ComfyUI_00154_.jpg (635 KB, 3072x1449)
635 KB
635 KB JPG
Don't settle for 20 steps, that's not enough
>>
What kind of optimizations did Replicate or the black forest people do so that schnell literally takes 1 second to generate on Replicate? Dev is about 15 seconds.
>>
>>101680466
miku would never say that >>101678159
>>
>>101680499
>low step count doesnt look ass
Insane
>>
>>101680482
I'm sure that's a bug on ComfyUi, there's already an issue about it, and memory fallback isn't likely to be the culprit, it also unload on the RAM side
https://github.com/comfyanonymous/ComfyUI/issues/2046
>>
>>101680508
20 really isn't that low for any model
>>
>>101680499
yeah any interesting or weird composition seems to benefit from cranking steps to 50

unfortunate since that means 75 seconds per image even on my 3090 in fp8
>>
whats the token count for it tho
>>
>>101680519
same, it's kinda slow but meh, quality > quality, always
>>
>>101680129
to add to what >>101680482 said you can run
python -c "import sys; print(sys.executable)"
on cmd to get your system python path
>>
File: Image.jpg (2.6 MB, 1920x2176)
2.6 MB
2.6 MB JPG
>>
>>101680496
2016 was two decades ago
>>
>>101680552
yet trump will beat up another woman for the election kek
>>
>Back to several mins per generation for larger images
Just like the good ol' days
>>
File: ComfyUI_00157_.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
>>101680466
>>
>>101680563
weird
>>
>>101680592
forced meme
>>
>>101678991
Training flux finetunes is a different beast from standard SD and SDXL.
Also the smallest flux model (which is Apache 2.0) is not even that good. Sure it's better than SD 3.0 medium, but it's not good enough to dedicate tens of thousands up to hundreds of thousands of dollars for finetuning right from the bat.
Anything that prohibits commercial use is out of question for basically every single "big" finetuner. No one will drop that amount of cash for a model they can't monetize in any way.
Pixart Sigma, HunyuanDiT, Lumina and Flux "schnell" allow commercial usage.
Kwai-Kolors and Flux "dev" do not, yet those two are the best out of the box in many aspects.

Kwai-Kolors has the best anatomy understanding, poses, hands, feet are good. It has nice and crisp style. Negatives for it are Chinese prompting, bad NSFW out of the box, horrible anime quality and lack of styles and concepts.
Flux "dev" is the best model out of all overall and has superior prompt understanding and text generation. Biggest negatives are the size of the model and inability to train loras locally (not even RTX 5090 will cut it).
>>
FUCK YOU REPLICATE
MY PROMPTS AREN'T NSFW
>>
>>101680576
kekd
>>
>>101680595
cope weirdo
>>
File: 38763.png (2.14 MB, 1024x1024)
2.14 MB
2.14 MB PNG
>>
>>101678682
>>101678818
>>101678934
>>101679226
>>101679599
>>101679982
>>101680048
>>101680075
>>101680447
>>101680491
>>101680538
Got a catbox for any of these? Really dig the style, curious how to get these kinds of results and what the general setup would look like.
>>
>>101680592
letting a man beat up a woman in the olympics is pretty fucking weird, but when trump does it he's just taking out the trash so it's fine
>>
>>101680602
Use it through the api or telegram glowie bot that apparently some anon put up @imgfun_bot
>>
>>101680596
>Also the smallest flux model (which is Apache 2.0) is not even that good. Sure it's better than SD 3.0 medium, but it's not good enough to dedicate tens of thousands up to hundreds of thousands of dollars for finetuning right from the bat.
>Anything that prohibits commercial use is out of question for basically every single "big" finetuner. No one will drop that amount of cash for a model they can't monetize in any way.
I'm sure a big finetune on schnell can beat flux dev, so why not going that path yeah

>Kwai-Kolors has the best anatomy understanding, poses, hands, feet are good. It has nice and crisp style. Negatives for it are Chinese prompting, bad NSFW out of the box, horrible anime quality and lack of styles and concepts.
And it's not a DiT model, that makes it obsolete from the start
>>
>>101680552
I'm not even american I just think the mental image of them having a ring fight is funny

Another thing that's impressive about Flux is what it infers about body types
Like Hillary's body here is soft around the midsection like an old person's tends to be, the kind of body you'd expect a woman her age to have
if you tried to do this with SDXL she'd just have a boxer's body
>>
>>101680630
forgot the image like a retard
>>
>>101680592
the zoomer has received his new programming
>>
>>101680610
this stuff doesn't work on 4chan man
>>
>>101680618
https://files.catbox.moe/z5hiho.png
>>
File: 254344.png (1.69 MB, 1024x1024)
1.69 MB
1.69 MB PNG
>naruto looks like a anime girl even when male is the first word in the prompt
waow
>>
>>101680624
>confusing women for men isnt weird
if you say so

>>101680652
why are you weirdos replying to me if it doesn't work. you're compelled to declare how not weird you are, only to attach incredible weirdness to it
>I'm not weird but let me tell you how much I think about trans people!!!!
weird
>>
>>101680655
looks like naruto if he was drawn on some shotacon doujin kek
>>
>>101680596
In LLMs there are many people fine tuning 70 and 120 B parameters, with a license similar to the fux dev, I thing is just a question of time to some rich nigger with 5 or moe H100 NVL, train a good fine tune. As I said, the Ponyfag could be that, since the license don't forbiden monetize with donations.
>>
File: le-sad.jpg (27 KB, 500x346)
27 KB
27 KB JPG
I have 12 VRAM but only 16GB system RAM till end of month then i can upgrade to only a max of 32GB of system RAM. But I do have an entire 250GB SSD dedicated to swap with discard enabled. Will it run or nah?
>>
>>101680667
anon, I'm afraid I must again draw your attention to the fact that you are on 4chan
>>
>>101680674
>But I do have an entire 250GB SSD dedicated to swap
Why?
>>
>>
>>101680685
because it prevents the system from hanging when i run out of ram? Anyway it helps if you have a low ram system and are doing stable diffusion.
>>
is flux 8b that much worse than 16b?
>>
>>101680685
go on say something dumb about how i should care about wearing out the SSD that i paid £15 for second hand...
>>
>a child's crayon drawing of a house
sovl
>>
>>101680712
No I was going to say most recommend the max amount of swap being double system RAM. 250GB sounds like so much.
>>
what are best prompts for that amateur photo look
>>
File: ComfyUI_01158_.png (1.39 MB, 1024x1024)
1.39 MB
1.39 MB PNG
>>101680642
hey anon this is actually kind of fun
>>
>>101680740
kek nice
this one is really technically impressive since other models generally can't do upside down people without mangling them
>>
>>101680740
oh also, tip: starting the prompt with "espn footage of" seems to be better for getting that slightly lofi television camera look
>>
>>101680725
its not much, i've saw it used up to 100 GB swap when i done a video through animatediff that was about 2 minutes long. If the swap wasn't that big it would have failed for sure, also one time my GPU crashed and when checking journalctl the last thing that happened was memory pressure flushing caches then seconds later the GPU died.

So yeah, if you have memory issues with stable diffusion try increasing swap file/partition. It works because there is more virtual memory available albeit slower.
>>
File: Image.jpg (1.18 MB, 1920x1088)
1.18 MB
1.18 MB JPG
>>
>>101680642
>>101680740
kek
>>
File: ComfyUI_01159_.png (1 MB, 1024x1024)
1 MB
1 MB PNG
>>101680757
anon i kneel, thanks.
>>
>>101680762
>its not much, i've saw it used up to 100 GB swap when i done a video through animatediff that was about 2 minutes long
You might just wanna ask someone for a short term loan so you can get that ram ASAP, your hard drives are taking a beating
>>
>>101680725
>most recommend the max amount of swap being double system RAM.
this just general copy pasta from every know it all online since like forever. You can have as much swap space as you like. This general assumption of double your ram is no different than general assumptions of how big a boot partition should be, but it will always depend on what you actually plan on doing to how much you will actually need.

>>101680775
nah i can wait. I don't do loans.
>>
I can't seem to make a girl stab a beast and there be blood and gore. Also can't make her do a middle finger while holding can of pepsi. Sadly, that means it's censored, unlike Dalle. Still, this is an interesting model.
>>
>some rich nigger with 5 or moe H100 NVL, train a good fine tune
It takes way more effort than just being rich. You need good dataset. You need to curate that dataset. You need to label that dataset. Then you need to know what the fuck you are doing too.

https://www.reddit.com/r/StableDiffusion/comments/1dbasvx/the_gory_details_of_finetuning_sdxl_for_30m/
>>
>>101680801
from what dalle red team testers said the dataset wasn't actually censored at all and during the testing phase they could generate extreme gore and disturbing shit
the 'safety' is all in having GPT-4 act as the middleman between you and the API and cockblock disallowed prompts, without that the model is actually capable of really dark stuff
I guess that's something you can do when you're closed source and not sharing the weights
>>
>>101680826
Sad. There is nothing wrong with being able to gen extreme gore and disturbing shit as long as it's not super photorealistic and just 80s movie or anime style.
>>
>>101680817
/lmg/ here, first time?
>>
>>101680724
>>
>>101680826
Yes, and on Azure you can disable the NSFW and prompt filters (only the basic filter remains) and see the raw dalle3 dataset power: https://catbox.moe/c/lfnwjt
>>
>>
>>101680826
That is the "secret sauce". You train it with everything and then you just perform post-filtering with multimodal vision model that looks at the prompt+image and estimates the level of "harm". Then you just set a cutoff point for what level of "harm" you tolerate and call it a day.

If you go back to the time before SD 2.0 release, you can read Emad's and other SD employee messages or listen the public discord calls (on youtube still I think).
They were grappling with the issue of CP. The issue was that if a model is capable of doing nudity and also children, then it is always capable of combining those into nude children, even if the training data has never seen a nude child. That is the only reason they and everyone else prunes all nudity from the dataset for models that they give out.
>>
>>101680852
cute
looks like it's good at the crayon style, has a nice texture and doesn't feel slopped like the digital art style it tries to do
>>
>>101680859
you have the prompts for any of these? curious how they'd transfer to flux
>>
>>101680866
Anon, you didn't get it, the prompt for ALL of those is literallt just DeviantArt + a jailbreak so that the API doesnt rewrite it, it just shows how depraved unchained dalle is, and deviantart specifically is represented like that in their dataset.
>>
>>101680863
Hopefully eventually one of the real models will leak so we can finally have a not-shit local model
>>
similar to the anons' findings above with crayon prompts, using "beatrix potter drawing of" seems to produce a hand drawn looking art style
doesn't really look like beatrix potter at all but it's nice and not slopped looking

>beatrix potter drawing of a cozy stone cottage in the forest
>>
>>101680859
wow why have i never seen these before
>>
>>101680859
neat
even ignoring the content, there's some sovl some of the drawing styles there that's quite hard to get when using it on chatgpt
>>
>>101680849
That said, I did get a result that somewhat resembles what I was after. I guess the key to a good result is being less precise
https://files.catbox.moe/ve7hxh.png
>>
>>101680912
*some sovl in
>>
>>101680859
I wouldn't say that's "DeviantArt". It's more like dalle trying to gen "deviant" "art"
>>
>>101680921
this works with filtered llms too when you're trying to generate smut but they have a filter on your prompts
you take advantage of the model's intelligence by having it infer what you want rather than stating it outright
>>
>>101680897
It's a shame that stylization is inferior to SDXL. Is that by choice? Or is it by training with AI images that have super generic styles?
>>
File: ComfyUI_00019_.png (1.09 MB, 1024x1024)
1.09 MB
1.09 MB PNG
>>101680775
>>101680725
>>101680674
Of course it was able to do it :P took a long time to load the initial models though
>>
>>101680960
AuraFlow has the problem too, I think it's an artifact of AI dataset captioning. Gonna have to stick to Kolors for art gens for now unfortunately
>>
>>101680960
More neutral / not stylized biased dataset most likely. And im sure dalle did DPO training.
>>
File: Image.jpg (838 KB, 2880x2176)
838 KB
838 KB JPG
>>
>>101680962
are you drunk?
>>
i like how it drew a face on this house unprompted. it's cute.
>>
>>101680960
I think the vision models people are using to caption their dataset are only describing the content of the image and not going into detail about the style at all

like the model will describe the composition of the image incredibly accurately but won't say much about the art style except to note that it's art and not a photograph, and it likely won't make any guesses as to the name of the artist either

so then the resulting model trained on those captions has amazing understanding of the content of an image, but it doesn't really know much about style other than "photograph/not a photograph"
>>
The image quality is pretty great at 10 steps
>>
>>
SAI releases 2B model with miserable quality that everyone can see for themselves how shitty it is
DFL publishes 12B model with extreme quality that only a few can use and they advertise it for free with their enthusiasm

too bad for the vram poor, but simply smarter
>>
File: ComfyUI_00025_.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>>101681011
go fuck a tree dork.
>>
For those struggling with image quality, with these type of models it often helps to add "aesthetic" at the end if your prompt if that's what you're going for, for instance https://files.catbox.moe/5xbcfw.png
>>
>>101681052
up the steps to 50, it makes the images even better imo
>>
>>101681131
isn't pixart really small? so there's groups working on stuff for the vram poor as well
>>
>>101681153
It also takes several minutes.
>>
really impressed with how it follows prompts, it gives exactly what I tell it.
>>
>>101681166
but you also don't need 20 trys for a good picture with crippled hands - like pre flux
>>
>>
>>101681131
even if you have a 24GB vram card you are still vram poor to make loras on this thing.
>>
>>101681140
your guidance is too high
>>
>>101681141
>https://files.catbox.moe/5xbcfw.png
She's about to become the next phineas gage with that umbrella
>>
>>101681232
I get pretty good hands at 10. Might step up later when things get faster, but for now I'm happy.
>>
>>101681141
I loaded your catbox image into comfy and your prompt was:
>1girl. anime, holding an umbrella, glitch art
there's no 'aesthetic' in there at all
>>
>>101681250
he might be using schnell, its images look overcooked like that even at low guidance
all 'turbo' type models are like that, I can't stand them
>>
Even with style issue all it needs is thousands of LoRAs or a very clever solution, so it's not over just because it can't copy style right away, a capable model is the first step, refinement can happen later.
>>
>>101681297
I switched it up. It was
>1girl. anime, holding an umbrella, aesthetic
>>
>>101681311
>>101681250
I'm just using the example workflow, switching things around now, I did a little sharpening on that last image also at about 0.20 to see the effect as the first image was a little blurry. I'm trying words like ultra sharp and aesthetic like that other anon said.
>>
bros, is it over for the faggots over at SAI???
Unironic question
>>
been trying to wrangle it to give me PC-98 type graphics. so close but so far.
>>
>>101681328
thanks
>>
>>101681340
what percentage of consumer GPUs can run flux?
unironic question
>>
>>101681350
I can

Sent from my GeForce RTX 4090
>>
>>101681340
pixart stabbed the knife
kolors twisted it
flux ran it through SAIs asshole
>>
File: Untitled.jpg (1.98 MB, 3760x2216)
1.98 MB
1.98 MB JPG
Top images: 2560x1440
Bottom images: 1280x720

It's really interesting how the quality deteriorates at higher resolutions. They just "look worse" by being less interesting.
>>
>>101681350
12GB can run it, although a bit slowly. So a lot of them. I bet it will get faster and more efficient in the coming days as well.
>>
>>101681367
>12GB can run it
this isn't true
>>
The oven stays hot and the bread just keeps coming...

>>101681353
>>101681353
>>101681353
>>
>>101681367
>I bet it will get faster and more efficient in the coming days
just like pixart? and hunyan? right?
spoiler: they never got more efficient
if you just wanna dream, then dream whatever fantastic dreams you want. don't come here asking for conversation to validate your dreaming though.
>>
>>101681372
ty baker
>>
can this thing only do euler or something?
>>
>>101681394
Anon, those are already small, and aren't huge leaps. There's no reason to make them run more efficiently. Remember the early days of local diffusion?
>>
>>101681371
i'm using 3060 12GB and only 16GB ram and it runs aka works on my machine, now stfu.
>>
>>101681421
dpmpp2pm (non-sde) also works
looks worse though imo, overcooked
>>
File: Image.jpg (1.43 MB, 2880x1088)
1.43 MB
1.43 MB JPG
>>101681372
Nice bake
>>
Sigma is still the king of smokes, guns and cigars but this is damn nice
https://files.catbox.moe/gse06h.png
>>
>>101681436
listen, I'm responding to someone asking "is it over for SAI". if you don't want to acknowledge the problem of accessibility, then you don't need to involve yourself in this conversation. for everyone else, SAI still has a large market of GPUs to reach that dont run these 13gb+ models
>>
>>101681476
Problems of accessibility? What are you talking about? It will get more efficient to run: layer specific quants, finetunes, optimizations for the architecture, more. You are simply wrong.
>>
>>101681497
>What are you talking about?
>>101681350
>It will get more efficient to run
>>101681394
>>
>>101681514
Anon, you are retarded or poor and in denial. The majority of people genning images locally have 12 or more GB of VRAM. There is no accessibility issue already, and it's only going to become more accessible. People are going to focus on this model far more than any model since the NAI leak, since it's an actual, definitive jump in quality. Nobody will realistically be using anything but derivatives of this model in 4 months, unless something better comes out.
>>
>>101681394
Anon why are you getting so triggered and hasty? The cost to run at home is only $500-700, well within the allowance of most households. If you can't afford then just rent a GPU. Also, ever heard of a distilled model?
>>
>>101681542
>The majority of people genning images locally have 12 or more GB of VRAM
just wanna quote this incase you delete your post out of embarrassment at some point
>>
>>101681542
Last time I checked (end of last year) people still recommended GPUs with 8GB.
>>
>>101681556
people having been saying things will not improve for like months, including oh we will never get local text to video, by the end of the year you will probably be the one that looks like a retard. Within 5 years home owned GPU's will probably have 1TB VRAM, you think that's impossible? Look at how computers evolved over the last 20 years you tard. We now have SSD's that have write speeds of 5 GB/s, that is miles away from the old SSD tech that you'd be lucky if you get 480 MB/s
>>
>>101681686
>Within 5 years home owned GPU's will probably have 1TB VRAM
It's more likely that AI shit won't be done on GPUs anymore than that.
>>
>>101681686
>y the end of the year you will probably be the one that looks like a retard
how much VRAM will consumer hardware have by the end of the year? please just give a specific number
>>
>>101678321
total janny death
>>
>>101679146
>male Pikachu tail



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.