Black Forest Pt. 3: Localchads Won Edition Discussion of free and open source text-to-image modelsPrevious /ldg/ bred : >>101674851>Beginner UIEasyDiffusion: https://easydiffusion.github.ioFooocus: https://github.com/lllyasviel/fooocusMetastable: https://metastable.studio>Advanced UIAutomatic1111: https://github.com/automatic1111/stable-diffusion-webuiComfyUI: https://github.com/comfyanonymous/ComfyUIInvokeAI: https://github.com/invoke-ai/InvokeAISD.Next: https://github.com/vladmandic/automaticSwarmUI: https://github.com/mcmonkeyprojects/SwarmUI>Use a VAE if your images look washed outhttps://rentry.org/sdvae>Model Rankinghttps://imgsys.org/rankings>Models, LoRAs & traininghttps://civitai.comhttps://huggingface.cohttps://aitracker.arthttps://github.com/Nerogar/OneTrainerhttps://github.com/derrian-distro/LoRA_Easy_Training_Scripts>Pixart Sigma & Hunyuan DIThttps://huggingface.co/spaces/PixArt-alpha/PixArt-Sigmahttps://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiThttps://huggingface.co/comfyanonymous/hunyuan_dit_comfyuiNodes: https://github.com/city96/ComfyUI_ExtraModels>Kolorshttps://gokaygokay-kolors.hf.spaceNodes: https://github.com/kijai/ComfyUI-KwaiKolorsWrapper>AuraFlowhttps://fal.ai/models/fal-ai/aura-flowhttps://huggingface.co/fal/AuraFlows>Fluxhttps://huggingface.co/spaces/black-forest-labs/FLUX.1-schnellhttps://comfyanonymous.github.io/ComfyUI_examples/flux>Index of guides and other toolshttps://rentry.org/sdg-linkhttps://rentry.org/rentrysd>GPU performancehttps://vladmandic.github.io/sd-extension-system-info/pages/benchmark.htmlhttps://docs.getgrist.com/3mjouqRSdkBY/sdperformance>Try online without registrationtxt2img: https://www.mage.spaceimg2img: https://huggingface.co/spaces/huggingface/diffuse-the-restsd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium>Related boards>>>/h/hdg>>>/e/edg>>>/d/ddg>>>/b/degen>>>/vt/vtai>>>/aco/sdg>>>/trash/sdg
blessed thread of frenship
Hail nigga forest . They cooked
official pixart bigma, lumina 2 and hunyuan finetune waiting room, now with flux 12b fp8
Anyone know what resolutions it's trained for?
>>101678411>now with flux 12b fp8it's already here https://huggingface.co/Kijai/flux-fp8/tree/main
>>101678458any it seems. even "long" pics in either direction work
>>101678470Still, I'm a bit suspicious it might be making the model a bit worse/dumber. I'd rather just gen at the proper resolutions.
>>101678458all of them it seems
>>101678465For the retards here; what specs does this version run on?
What's the best scheduler for euler?
>>101678529a bit more than 12gb of vram
>>101678518I'm not sure it can go too far though, 2048x2048 gives duplication
>>101678537I think the only ones that work are simple and sgm_uniform they are (practically?) identical
>>101678271I'm so sick of this ritual postingare you d*bo??
>>101678590wow thats a crazy res
>A worker gives coins to a pharmacist in the street, and a sign reads: "How ironic, don't you think?", anime styleSometimes it has great prompt understanding, sometimes not, it lacks a bit of consistency
>>101678632boring reality btfo
>>101678632no need for boring reality >>101678638why only one clip
>>101678651>why only one clipwhat do you mean? I should put the text on the 2 of them?
>>101678669
>>101678695I already have that
>>101678669try it out and see if its better
>>101678705Crosswalk is almost perfect
>>101678718nothing changed kek
>>101678529>>101678554Haven't tried the fp8 on disk version yet but have been loading with fp8 and running fine on 3080 10gb card (both dev and schnell). Might need --novram when launching Comfy.
>>101678711shouldn't you put the prompt on both fields?left: bottom field onlyright: two fieldsa man a woman holding hands
>>101678757top field onlyLMAO
>>101678757Holy fuck you're right... but that's retarded,why didn't he make a single text box that applies for the both of them? that's annoying to have to copy paste the prompt everytime...
Why does it load and unload for each gen? I have 54gb of ram and 24gb of vram and I'm runing the DiT model on fp8
>>101678871I use it like this
>>101678902yeah but you don't have the guidance scale on the classic CLIP Text Encode
>>101678871>>101678916right click convert widget to input, then feed a text box into both
>>101678923which one?
>>101678554Does comfy support CPU or Vulkan?If so, anyone tried a gen on CPU? How many days did it take?
>>101678923that's it it's good to go?
>>101678299ugly anime desu
The only problem is the lack of knowledge in anime culture, pop culture, the mutant foots in some poses, and the censorship for NSFW. So the next step is finetune the model, I thing only the Pony guy and a pair more could do a proper fine tune with their data.
>>101678964I tried cpu earlier with a 10900K and canceled. For 1x 1024x1024 it showed 80 minutes.
>>101678991we've been waiting on that next step for quite some time now. prepare to wait quite some time longer
>>101679009Sonichu?
>>101678923something like this?
>>101679070good job anon you figured it out
>>101679075It gives me different outputs though, so I'm not sure it's the good way of dealing with it
>>101678901Yeah, that's twice as long because of the loading shit, how to fix that??
>>101678997I'm a diffusion pleb. Are they compute bound or memory bandwidth bound (like LLM's)?Trying to work out if efficiently leveraging an AMD APU could accelerate generation significantly or not.
For those who want those nodes (it has the negative prompt + guidance scale + a simple text for both clip and t5xxl) I give you can get the metadata here:https://files.catbox.moe/imf60c.png
>>101678250Holy shit I took a break while waiting for Bigma and what the fuck just hapoened here bros, did the Dalle weights just drop?
>>101679295Flux dropped and it's what SD3 should've been. Upper half uncensored and really good quality and prompt comprehension.
>prompt goth>instantly turns into sloppastylegrim, but perhaps deserved
>>101679172all this anon knows is AMD has pisspoor support srry imggen bros are mostly retarded
>https://github.com/comfyanonymous/ComfyUI/commits/master/>Hack to make all resolutions work on Flux models. So I updated Comfy to get this commit from an hour ago, and now Flux can directly generate coherent 2400x1600 images apparently. Probably even higher, though I haven't tried yet. What the fuck? Even the hands are perfect which is INSANE at this res, any other model you'd be lucky to get a coherent face at 2K without hiresfix let alone hands.
>>101679317>Flux dropped and it's what SD3 should've been.this, I wished it wasn't a 12b model though, that's too big, 10b would've been the sweet spot
>>101679295>did the Dalle weights just drop?It sure does look like it doesn't it
>>101679222Thanks anon, much appreciated :)
>>101679368Oh nice, I just made a 2048x2048 res output, it's only asking me for 15gb of vram for the fp8 DiT model, that's cool to see how much we've improved in a single day kek
So going forward I guess we're only going get models that are great at coherence but have a slopped sovlless style and can't imitate even public domain artists very well? Kind of sad desuaesthetics were always more important to me than coherence, and it seems like we're going backwards on aesthetics
>some poor anon still using 1.5
i really hope this model gains traction. it's getting a lot of attention but it's still so expensive to finetune. hopefully some people come along thinking "now's our chance!" and finally get cooking. this really does feel like local's dall-e moment, it's a bit step up in a lot of ways
>>101679368>>101679448>dalle3 at home made high res obsoletethat's cool to be able to render high res images like that, it will help for details that's for sure
>nogen doomposter is upset he has to describe images with words instead of using artist names
>>101679368Just got into Comfy earlier today. How do you update it? Does a simple git pull just work or are there any other commands I should do?
>>101679490holy fucking cope
>>101679524Say sike
>>101679514just use the "Direct click to download", on that zip you have an updater.bat ready to be usedhttps://github.com/comfyanonymous/ComfyUI?tab=readme-ov-file
>>101677984https://github.com/motexture/FluxDiffif you get an error with fbgemm.dll either dl random libomp140.x86_64.dll on pytorch forums or get it from comfy venvby default it'll download 16.8gb model + vae as .bin [.cache\huggingface\hub\models--motexture--FluxDiff\snapshots]and then it'll expect a .safetensors (either fix it in code or just duplicate and rename, otherwise it'll redownload .bin again)tested on 12g vram, you can try to cast/offloadalso, reforge dev is working on implementing flux image model
flux inpainting model when
>>101679447it's a pleasure o/
>doomposter thinks I'm talking about him
fp8 feels pretty good, what i was hoping for when i first tried the model. no unloading bullshit
>>101679531I use Linux...
>A bunch of tadpoles swimming in a pondbake. again
>>101679553>no unloading bullshityou also have that anon? is that a bug or something? that makes me so annoyed... >>101679126
>>101679070How do you get the Prompt box?When I search, I just see CLIP Text Encode (Prompt) show up
>>101679618just use this metadata you'll get everything, and I double clicked on the dot near t5xxl to make the prompt text appear >>101679222
>>101679573I had it with 24gb vram, it seems the fp16 is just slightly over. One of the nodes should have an option to switch the fp8_e4 or some shit in the model loading node, that allowed me to fit it comfortably without it reloading every prompt. If that isn't enough and you have less vram then idk, there might be another solution i think i saw some shit about 12gb vram somewhere
Any way to load the text encoder on one gpu and the image model on another for flux? I'm capping out with 24GB of VRAM.
>>101679448>>101679368you lose a lot of prompt understanding if you go to far though, I guess it works great when it's a simple scene though>A concert with Donald Trump as bassist and Hatsune Miku as singer, the audience is ecstatic and all raise their hands to the sky.
>>101679542Appreciate the effort, but all I could get were wonky frames like a sketchy cartoon style even with realism. Not a lot of motion.
i really like that one
>>101679644I have this loading -> unloading shit for 24gb vram + 56gb ram and fp8 DiT + fp16 text encoder :(
>>101679542wait, we can already use their text to video model locally?
>>101679638Based, ty anon I was using the comfy example and there was no negative prompt on there.
>>101679435Just looking at pics in this thread, Dalle 3 is obsolete and so is MJ v6.1, this model very much looks like what ClosedAI was planning to release with GPT 4o (which never came out), this is a massive win for local.
>>101679761Issue is that its so big 99% of people are not going to be able to train it.
sieg heil danke deutscher mann
>>101679750That's because this model is supposed to work with only a cfg = 1, and cfg = 1 means you can't use negative prompt, it works at higher cfg though, just be careful to not fry your picture >>101679669
SAI bros... not like this...
>>101679761>so is MJ v6.1I would like this to be true but it doesn't have anywhere near the art aesthetics or style understanding of MJ 6.1Amazing coherence but yeah, the style and soul are just not there for art
Nice night for a walk
>>101679808For realistic shit this model is easily API level, only MJ is better and not by much, we're so back
>>101679783At the end of the day the question is how trainable is it. If it's like SDXL then it's a problem. If it's like Sigma then it's not. SDXL was extremely slow and took forever to figure out new concepts. Pixart is relatively fast and learned new concepts fairly quickly. If the only problem is renting a 80 GB GPU then people will swallow it if they get their money's worth in a day or week.
>>101679722This is something else, the name is just a coincidence I think. Flux=Flow, etc
Can flux do coom yet?
>>101679860Not great, needs to be finetunedhttps://files.catbox.moe/3pbilx.jpg (embed)
>>101679860it can do tasteful PG-13 / R-rated coom out of the box
>1920x1080 worksWe eating good tonight chadsGood for them to release over the weekend too
>>101679860Yes it canhttps://files.catbox.moe/b09u2v.png
>>101679803That's super useful anon much obliged. What cfg do you recommend for this? Default (3.5)?
>>101679885>>101679871>>101679873Guess it's time to reinstall, hope the macbook can take it.
>>101677984literally malware
>>101679871>Not greatyou're joking? the anatomy is almost perfect, this will be a blast to finetune it
I know there's going to be some gems in this dataset.
>>101679892>What cfg do you recommend for this? Default (3.5)?You're talking about the guidance scale, that's not the CFG, if you want that it's on this metadata >>101679222Even on the API they put CFG = 1 that's why they didn't display negative promptAnd desu for the value of cfg it fry the model really quickly, I'd go for the lowest value, 1.1 so that you can still get a great picture + being able to use the negative prompt
>>101679172I believe that diffusion is more compute bound than LLM. Diffusion uses few slow evaluations (~steps), while LLM require lots of fast evaluations (one for each token ~ word).
Anyone else getting unbelievably slow gens for flux? It's only using 14GB/24GB so that's not the issue, but 20 steps takes 10+ mins
>>101679929You're definitely on the CPU. On a 4090 Shnell was 20s and Dev was 50s
>>101679885Should have known the second I saw no "Safety" section in the release
>>101679929yeah that's an issue, for me the inference takes 30-40 seconds, but the problem is the loading -> unloading, why the fuck does it do that? :(
>>101679871Topless seems to work fine.Bottomless on the other hand is proving challenging. For example:>>101679885Note the one on the right, model can be situationally prude.
>>101679959>Note the one on the right, model can be situationally prude.yeah, desu it will be easily fixed with a finetune, the model doesn't seem to be that brainwashed compared to SDXL for example
>>101679458>So going forward I guess we're only going get models that are great at coherence but have a slopped sovlless stylepatience is a virtue
after testing a whole afternoon I think Kolors (with automated LLM prompt translation into chinese) is much, much better than Flux for artbut Flux is going to be the new gold standard for photography and memes
>>101679959There is little to no genitals in the dataset and they certainly didn't train on those words. So if you want them you're going to have to dig deep and be clever. It's almost smart enough that you can poorman generate them by description.
>>101679919kek, nice gen anon
returning oldfag herehaven't been into imagegen for a couple years, what's the best way to prompt nowadays? i'm more used to the "tag-style" prompts like "a beautiful white woman, blonde hair, blue eyes, cinematic, studio lighting, hyperrealistic, 4k uhd, award winning, kodak film" etc.but a lot of the examples i see on newer models have full sentences, especially on "demo prompts" that often have a hilariously large amount of adjectives and fluff like "A stunning and beautiful white woman stands in the dramatic, breathtaking, pronounced cinematic lighting. Her thought-provoking expression stands in stark contrast to the plain background - an enchantingly magical pure white."whereas some of the prompts i've seen here are more normal but still have a more "sentence-style" structuredo modern models work best with natural language sentences or is the tag style still the best method?
>>101679999Use both
When I simply change seed, there's no unload -> reload, but when I change the prompt, the unload -> reload starts again, hmm...
>>101680021Prompt has to be encoded which is a different model. Seed just changes the color pattern the generation starts with.
>(SUSPENDED:1.2)
>>101679973Agreed with everything else this model really shines, it's the last missing piece >>101679925Holy shit the negatives work now! Ty anon
lol the devs fucked it up so bad, really how can they release the model? they will get the shitstorm of the decade in a week
>>101680021maybe when it's reusing the same prompt it just keeps the saved clip embeds that it generated earlier and gets rid of the model weightsso when the prompt changes it needs to load the clip weights again
>>101680045still, it's not normal to have this constant loading -> unloading, some anons don't have this issue though
Are you ready?
>>101680053What's wrong with it?
>>101680060omg 26.5 gb vramsugoi!!!
Model is b*sed
huh, crazy how good it is at text. better than proprietary models
>>101679959Bottomless is mostly featureless it seems, when it workshttps://files.catbox.moe/lk0zai.png
>>101680092omg it looks great, maybe the nipples are a bit weird though
>>101680021Been a while since i used Comfy, but doesn't it have some strategy to save memory by unloading models? You may be hitting your memory limit. There was an option to not unload models from memory i think, but you may swap or just OOM instead of reload.
>>101680053SAI survived SD 1.5. With the Olympics and the election Twitter won't even have time to stir up the Staceys to hand wring about consensual images. Also it barely knows celebrities so you're not going to see much on that front. Actually shocked it knows politicians.
>>101680061it can generate cunny apparently
>>101680092This seems like something that could actually be fixed by loras, it isn't deliberately ruined.
>>101680110No I'm not hitting any limit, I still have room to spare and it just wants to unload for some reason
>>101680122It can't >>101680092
>>101679938Bizarre, both my CPU and GPU are pinnedNot sure what's going on
>>101680114>With the Olympics and the election Twitter won't even have time to stir up the Staceys to hand wring about consensual images.that's so true, it's the perfect moment to release that model
How do we deal with the existential dilemma of images that are close to perfect (with respect to personal style choices), but just slightly flawed, yet correcting the flaw would take 20x more effort than simply generating 100 new images?I just feel such a constant strange disconnect. I make a hundred images, each of which is flawed in some tiny way so that none are perfect, and none can be really chosen as "the best". They're all just different interpretations of an idea. I really wonder if this is going to cause some kind of psychological damage.
>>101680155You're playing with a gacha machine, just go with the flow.
>>101680129I think i found the option:>https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/cli_args.py#L117--disable-smart-memoryGive it a go anyway.Can comfy normally split the model into multiple cards?
>>101680155nice style, what was the model and prompt to get this style?
>>101680173>--disable-smart-memorytrying this now, praying multiple cards exists because i have a handful of 12ish gig ones and im kicking myself for not just getting a 3090
>>101680178dalle
tryna generate trump punching hillary in the face during a boxing match, but it keeps making them friendly
>>101680173Now it free up the VRAM when the gen is over, it's worse lol
>>101680222>but it keeps making them friendlyI noticed this too when trying to make a gigachad laughing and pointing at a fat whore It just made them buddy buddy
>>101680226Kek.. sorry about that. Have you ever genned with comfy and seen it use more than one card? Can it actually use more than one? it's a 24gb model...
>>101680222getting closerreally impressive likenesses and body coherence either way
>>101680255I'm on a fp8 mode, so it only uses 12gb of vram, that's enough for my 3090, and Comfy doesn't have a multigpu feature so the 3060 I have is just sleeping
It sad this model doesn't know much trivia about styles, I ask it to make a ps1 render and it completely ignores that, even such aesthetics is ignored
>>101680286yeah, those models are too tuned for "aesthetics" to appeal to people
>>101680267It still needs some extra buffers in memory to keep the latent and do some work.i also found this:>vram_group.add_argument("--highvram", action="store_true", help="By default models will be unloaded to CPU memory after being used. This option keeps them in GPU memory.")--highvram. May as well give it a go. There's also --gpu-only, but i don't know how they interact with each other.
>>101680136>>101680122It can generate photorealistic children with nipples, there will be a shitshow about this.
>>101680328I don't think anyone cares about that anymore
>>101680319I tried highvram, it forced the gpu to take both the image model + the text encoder into the vram, of course that's impossible that's over 24gb
>>101680335lolLMAO
>>101680346People care about >OMG [THING] CAN DO [NEW THING]They already think models can do this.
>>101678250damn buddy...
>>101680353All it takes is one random person to push it into the public conscience and all the normies will be out for blood
when are we going to get a model that can actually UNDERSTAND the prompts like LLMs can?like if i type in "a table but deflated like a balloon" i want to see a table deflated like a balloon but instead it just gives me normal tables ("table" is an example, not the actual prompt i tried)the model can't seem to understand some foreign concepts like that, while LLMs can more or less easily grasp the concepts easily
>>101680328Why are you pretending that the SD models can't do that since 2022?
>>101680367That is true of literally everything, this is old news. Unless someone (You?) decides to push it hard, nobody cares about AI slop anymore.
>>101680374>when are we going to get a model that can actually UNDERSTAND the prompts like LLMs can?for that, we'll need better models than t5xxl and go for llama9b for example
>>101680374LLMs don't understand stuff like that either
>>101680328Mindblowing revelation: If you train a model on X and Y, it can generate X+Y.There is no cure for this short of shortcircuiting the laws of logic and physical reality (which some people try).
>>101680374Give a research team 1 billion dollars and they will do it for you in 2 weeks
>>101680414lmao
>>101680344>I'm on a fp8 mode>Comfy doesn't have a multigpu feature>of course that's impossible that's over 24gbYou're trying to do the impossible, then. I expected fp8 would ~half the size. Even if all the models end up being 12gb or a little under, the thing needs some working memory to do the work.
>>101680414impressive, very nice
Does 3 mins an image sound about right for a p40?
>>101680431but why does it free some memory when doing a new gen? I have enough room to spare, it's not like a new gen will ask more memory than the first one >>101680129
>>101680414im crying
>>101680092this could be fixed with inpainting using normal pony model i guess, a bit inconvenient but better than a kick in the teeth.
>>101680129Maybe your getting screwed by Nvidia driver memory management. I rather OOM than have Nvidia driver screw thing up. Look up how to disable it or roll back to before Nvidia add that feature.https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion
i fucking hate my life
>>101680222successcan't get hillary to look as beaten up as I'd like, but I'm declaring victory on this prompt
Don't settle for 20 steps, that's not enough
What kind of optimizations did Replicate or the black forest people do so that schnell literally takes 1 second to generate on Replicate? Dev is about 15 seconds.
>>101680466miku would never say that >>101678159
>>101680499>low step count doesnt look assInsane
>>101680482I'm sure that's a bug on ComfyUi, there's already an issue about it, and memory fallback isn't likely to be the culprit, it also unload on the RAM side https://github.com/comfyanonymous/ComfyUI/issues/2046
>>10168050820 really isn't that low for any model
>>101680499yeah any interesting or weird composition seems to benefit from cranking steps to 50unfortunate since that means 75 seconds per image even on my 3090 in fp8
whats the token count for it tho
>>101680519same, it's kinda slow but meh, quality > quality, always
>>101680129to add to what >>101680482 said you can run python -c "import sys; print(sys.executable)" on cmd to get your system python path
python -c "import sys; print(sys.executable)"
>>1016804962016 was two decades ago
>>101680552yet trump will beat up another woman for the election kek
>Back to several mins per generation for larger imagesJust like the good ol' days
>>101680466
>>101680563weird
>>101680592forced meme
>>101678991Training flux finetunes is a different beast from standard SD and SDXL.Also the smallest flux model (which is Apache 2.0) is not even that good. Sure it's better than SD 3.0 medium, but it's not good enough to dedicate tens of thousands up to hundreds of thousands of dollars for finetuning right from the bat.Anything that prohibits commercial use is out of question for basically every single "big" finetuner. No one will drop that amount of cash for a model they can't monetize in any way.Pixart Sigma, HunyuanDiT, Lumina and Flux "schnell" allow commercial usage.Kwai-Kolors and Flux "dev" do not, yet those two are the best out of the box in many aspects.Kwai-Kolors has the best anatomy understanding, poses, hands, feet are good. It has nice and crisp style. Negatives for it are Chinese prompting, bad NSFW out of the box, horrible anime quality and lack of styles and concepts.Flux "dev" is the best model out of all overall and has superior prompt understanding and text generation. Biggest negatives are the size of the model and inability to train loras locally (not even RTX 5090 will cut it).
FUCK YOU REPLICATEMY PROMPTS AREN'T NSFW
>>101680576kekd
>>101680595cope weirdo
>>101678682>>101678818>>101678934>>101679226>>101679599>>101679982>>101680048>>101680075>>101680447>>101680491>>101680538Got a catbox for any of these? Really dig the style, curious how to get these kinds of results and what the general setup would look like.
>>101680592letting a man beat up a woman in the olympics is pretty fucking weird, but when trump does it he's just taking out the trash so it's fine
>>101680602Use it through the api or telegram glowie bot that apparently some anon put up @imgfun_bot
>>101680596>Also the smallest flux model (which is Apache 2.0) is not even that good. Sure it's better than SD 3.0 medium, but it's not good enough to dedicate tens of thousands up to hundreds of thousands of dollars for finetuning right from the bat.>Anything that prohibits commercial use is out of question for basically every single "big" finetuner. No one will drop that amount of cash for a model they can't monetize in any way.I'm sure a big finetune on schnell can beat flux dev, so why not going that path yeah>Kwai-Kolors has the best anatomy understanding, poses, hands, feet are good. It has nice and crisp style. Negatives for it are Chinese prompting, bad NSFW out of the box, horrible anime quality and lack of styles and concepts.And it's not a DiT model, that makes it obsolete from the start
>>101680552I'm not even american I just think the mental image of them having a ring fight is funnyAnother thing that's impressive about Flux is what it infers about body typesLike Hillary's body here is soft around the midsection like an old person's tends to be, the kind of body you'd expect a woman her age to haveif you tried to do this with SDXL she'd just have a boxer's body
>>101680630forgot the image like a retard
>>101680592the zoomer has received his new programming
>>101680610this stuff doesn't work on 4chan man
>>101680618https://files.catbox.moe/z5hiho.png
>naruto looks like a anime girl even when male is the first word in the promptwaow
>>101680624>confusing women for men isnt weirdif you say so>>101680652why are you weirdos replying to me if it doesn't work. you're compelled to declare how not weird you are, only to attach incredible weirdness to it>I'm not weird but let me tell you how much I think about trans people!!!!weird
>>101680655looks like naruto if he was drawn on some shotacon doujin kek
>>101680596In LLMs there are many people fine tuning 70 and 120 B parameters, with a license similar to the fux dev, I thing is just a question of time to some rich nigger with 5 or moe H100 NVL, train a good fine tune. As I said, the Ponyfag could be that, since the license don't forbiden monetize with donations.
I have 12 VRAM but only 16GB system RAM till end of month then i can upgrade to only a max of 32GB of system RAM. But I do have an entire 250GB SSD dedicated to swap with discard enabled. Will it run or nah?
>>101680667anon, I'm afraid I must again draw your attention to the fact that you are on 4chan
>>101680674>But I do have an entire 250GB SSD dedicated to swapWhy?
>>101680685because it prevents the system from hanging when i run out of ram? Anyway it helps if you have a low ram system and are doing stable diffusion.
is flux 8b that much worse than 16b?
>>101680685go on say something dumb about how i should care about wearing out the SSD that i paid £15 for second hand...
>a child's crayon drawing of a housesovl
>>101680712No I was going to say most recommend the max amount of swap being double system RAM. 250GB sounds like so much.
what are best prompts for that amateur photo look
>>101680642hey anon this is actually kind of fun
>>101680740kek nicethis one is really technically impressive since other models generally can't do upside down people without mangling them
>>101680740oh also, tip: starting the prompt with "espn footage of" seems to be better for getting that slightly lofi television camera look
>>101680725its not much, i've saw it used up to 100 GB swap when i done a video through animatediff that was about 2 minutes long. If the swap wasn't that big it would have failed for sure, also one time my GPU crashed and when checking journalctl the last thing that happened was memory pressure flushing caches then seconds later the GPU died. So yeah, if you have memory issues with stable diffusion try increasing swap file/partition. It works because there is more virtual memory available albeit slower.
>>101680642>>101680740kek
>>101680757anon i kneel, thanks.
>>101680762>its not much, i've saw it used up to 100 GB swap when i done a video through animatediff that was about 2 minutes longYou might just wanna ask someone for a short term loan so you can get that ram ASAP, your hard drives are taking a beating
>>101680725>most recommend the max amount of swap being double system RAM.this just general copy pasta from every know it all online since like forever. You can have as much swap space as you like. This general assumption of double your ram is no different than general assumptions of how big a boot partition should be, but it will always depend on what you actually plan on doing to how much you will actually need. >>101680775nah i can wait. I don't do loans.
I can't seem to make a girl stab a beast and there be blood and gore. Also can't make her do a middle finger while holding can of pepsi. Sadly, that means it's censored, unlike Dalle. Still, this is an interesting model.
>some rich nigger with 5 or moe H100 NVL, train a good fine tuneIt takes way more effort than just being rich. You need good dataset. You need to curate that dataset. You need to label that dataset. Then you need to know what the fuck you are doing too.https://www.reddit.com/r/StableDiffusion/comments/1dbasvx/the_gory_details_of_finetuning_sdxl_for_30m/
>>101680801from what dalle red team testers said the dataset wasn't actually censored at all and during the testing phase they could generate extreme gore and disturbing shitthe 'safety' is all in having GPT-4 act as the middleman between you and the API and cockblock disallowed prompts, without that the model is actually capable of really dark stuffI guess that's something you can do when you're closed source and not sharing the weights
>>101680826Sad. There is nothing wrong with being able to gen extreme gore and disturbing shit as long as it's not super photorealistic and just 80s movie or anime style.
>>101680817/lmg/ here, first time?
>>101680724
>>101680826Yes, and on Azure you can disable the NSFW and prompt filters (only the basic filter remains) and see the raw dalle3 dataset power: https://catbox.moe/c/lfnwjt
>>101680826That is the "secret sauce". You train it with everything and then you just perform post-filtering with multimodal vision model that looks at the prompt+image and estimates the level of "harm". Then you just set a cutoff point for what level of "harm" you tolerate and call it a day.If you go back to the time before SD 2.0 release, you can read Emad's and other SD employee messages or listen the public discord calls (on youtube still I think).They were grappling with the issue of CP. The issue was that if a model is capable of doing nudity and also children, then it is always capable of combining those into nude children, even if the training data has never seen a nude child. That is the only reason they and everyone else prunes all nudity from the dataset for models that they give out.
>>101680852cutelooks like it's good at the crayon style, has a nice texture and doesn't feel slopped like the digital art style it tries to do
>>101680859you have the prompts for any of these? curious how they'd transfer to flux
>>101680866Anon, you didn't get it, the prompt for ALL of those is literallt just DeviantArt + a jailbreak so that the API doesnt rewrite it, it just shows how depraved unchained dalle is, and deviantart specifically is represented like that in their dataset.
>>101680863Hopefully eventually one of the real models will leak so we can finally have a not-shit local model
similar to the anons' findings above with crayon prompts, using "beatrix potter drawing of" seems to produce a hand drawn looking art styledoesn't really look like beatrix potter at all but it's nice and not slopped looking>beatrix potter drawing of a cozy stone cottage in the forest
>>101680859wow why have i never seen these before
>>101680859neateven ignoring the content, there's some sovl some of the drawing styles there that's quite hard to get when using it on chatgpt
>>101680849That said, I did get a result that somewhat resembles what I was after. I guess the key to a good result is being less precisehttps://files.catbox.moe/ve7hxh.png
>>101680912*some sovl in
>>101680859I wouldn't say that's "DeviantArt". It's more like dalle trying to gen "deviant" "art"
>>101680921this works with filtered llms too when you're trying to generate smut but they have a filter on your promptsyou take advantage of the model's intelligence by having it infer what you want rather than stating it outright
>>101680897It's a shame that stylization is inferior to SDXL. Is that by choice? Or is it by training with AI images that have super generic styles?
>>101680775>>101680725>>101680674Of course it was able to do it :P took a long time to load the initial models though
>>101680960AuraFlow has the problem too, I think it's an artifact of AI dataset captioning. Gonna have to stick to Kolors for art gens for now unfortunately
>>101680960More neutral / not stylized biased dataset most likely. And im sure dalle did DPO training.
>>101680962are you drunk?
i like how it drew a face on this house unprompted. it's cute.
>>101680960I think the vision models people are using to caption their dataset are only describing the content of the image and not going into detail about the style at alllike the model will describe the composition of the image incredibly accurately but won't say much about the art style except to note that it's art and not a photograph, and it likely won't make any guesses as to the name of the artist eitherso then the resulting model trained on those captions has amazing understanding of the content of an image, but it doesn't really know much about style other than "photograph/not a photograph"
The image quality is pretty great at 10 steps
SAI releases 2B model with miserable quality that everyone can see for themselves how shitty it isDFL publishes 12B model with extreme quality that only a few can use and they advertise it for free with their enthusiasmtoo bad for the vram poor, but simply smarter
>>101681011go fuck a tree dork.
For those struggling with image quality, with these type of models it often helps to add "aesthetic" at the end if your prompt if that's what you're going for, for instance https://files.catbox.moe/5xbcfw.png
>>101681052up the steps to 50, it makes the images even better imo
>>101681131isn't pixart really small? so there's groups working on stuff for the vram poor as well
>>101681153It also takes several minutes.
really impressed with how it follows prompts, it gives exactly what I tell it.
>>101681166but you also don't need 20 trys for a good picture with crippled hands - like pre flux
>>101681131even if you have a 24GB vram card you are still vram poor to make loras on this thing.
>>101681140your guidance is too high
>>101681141>https://files.catbox.moe/5xbcfw.pngShe's about to become the next phineas gage with that umbrella
>>101681232I get pretty good hands at 10. Might step up later when things get faster, but for now I'm happy.
>>101681141I loaded your catbox image into comfy and your prompt was:>1girl. anime, holding an umbrella, glitch artthere's no 'aesthetic' in there at all
>>101681250he might be using schnell, its images look overcooked like that even at low guidanceall 'turbo' type models are like that, I can't stand them
Even with style issue all it needs is thousands of LoRAs or a very clever solution, so it's not over just because it can't copy style right away, a capable model is the first step, refinement can happen later.
>>101681297I switched it up. It was>1girl. anime, holding an umbrella, aesthetic
>>101681311>>101681250I'm just using the example workflow, switching things around now, I did a little sharpening on that last image also at about 0.20 to see the effect as the first image was a little blurry. I'm trying words like ultra sharp and aesthetic like that other anon said.
bros, is it over for the faggots over at SAI???Unironic question
been trying to wrangle it to give me PC-98 type graphics. so close but so far.
>>101681328thanks
>>101681340what percentage of consumer GPUs can run flux?unironic question
>>101681350I canSent from my GeForce RTX 4090
>>101681340pixart stabbed the knife kolors twisted it flux ran it through SAIs asshole
Top images: 2560x1440Bottom images: 1280x720It's really interesting how the quality deteriorates at higher resolutions. They just "look worse" by being less interesting.
>>10168135012GB can run it, although a bit slowly. So a lot of them. I bet it will get faster and more efficient in the coming days as well.
>>101681367>12GB can run itthis isn't true
The oven stays hot and the bread just keeps coming...>>101681353>>101681353>>101681353
>>101681367>I bet it will get faster and more efficient in the coming daysjust like pixart? and hunyan? right?spoiler: they never got more efficientif you just wanna dream, then dream whatever fantastic dreams you want. don't come here asking for conversation to validate your dreaming though.
>>101681372ty baker
can this thing only do euler or something?
>>101681394Anon, those are already small, and aren't huge leaps. There's no reason to make them run more efficiently. Remember the early days of local diffusion?
>>101681371i'm using 3060 12GB and only 16GB ram and it runs aka works on my machine, now stfu.
>>101681421dpmpp2pm (non-sde) also workslooks worse though imo, overcooked
>>101681372Nice bake
Sigma is still the king of smokes, guns and cigars but this is damn nicehttps://files.catbox.moe/gse06h.png
>>101681436listen, I'm responding to someone asking "is it over for SAI". if you don't want to acknowledge the problem of accessibility, then you don't need to involve yourself in this conversation. for everyone else, SAI still has a large market of GPUs to reach that dont run these 13gb+ models
>>101681476Problems of accessibility? What are you talking about? It will get more efficient to run: layer specific quants, finetunes, optimizations for the architecture, more. You are simply wrong.
>>101681497>What are you talking about?>>101681350>It will get more efficient to run>>101681394
>>101681514Anon, you are retarded or poor and in denial. The majority of people genning images locally have 12 or more GB of VRAM. There is no accessibility issue already, and it's only going to become more accessible. People are going to focus on this model far more than any model since the NAI leak, since it's an actual, definitive jump in quality. Nobody will realistically be using anything but derivatives of this model in 4 months, unless something better comes out.
>>101681394Anon why are you getting so triggered and hasty? The cost to run at home is only $500-700, well within the allowance of most households. If you can't afford then just rent a GPU. Also, ever heard of a distilled model?
>>101681542>The majority of people genning images locally have 12 or more GB of VRAMjust wanna quote this incase you delete your post out of embarrassment at some point
>>101681542Last time I checked (end of last year) people still recommended GPUs with 8GB.
>>101681556people having been saying things will not improve for like months, including oh we will never get local text to video, by the end of the year you will probably be the one that looks like a retard. Within 5 years home owned GPU's will probably have 1TB VRAM, you think that's impossible? Look at how computers evolved over the last 20 years you tard. We now have SSD's that have write speeds of 5 GB/s, that is miles away from the old SSD tech that you'd be lucky if you get 480 MB/s
>>101681686>Within 5 years home owned GPU's will probably have 1TB VRAMIt's more likely that AI shit won't be done on GPUs anymore than that.
>>101681686>y the end of the year you will probably be the one that looks like a retardhow much VRAM will consumer hardware have by the end of the year? please just give a specific number
>>101678321total janny death
>>101679146>male Pikachu tail