[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: tmp.jpg (1.36 MB, 3265x3265)
1.36 MB
1.36 MB JPG
Discussion of free and open source text-to-image models

Previous /ldg/ bread : >>101639278

>Beginner UI
EasyDiffusion: https://easydiffusion.github.io
Fooocus: https://github.com/lllyasviel/fooocus
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://www.modelscope.cn/home
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Kolors
https://gokaygokay-kolors.hf.space
Nodes: https://github.com/kijai/ComfyUI-KwaiKolorsWrapper

>AuraFlow
https://fal.ai/models/fal-ai/aura-flow
https://huggingface.co/fal/AuraFlows

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>View and submit GPU performance data
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/trash/sdg
>>
blessed thread of frenship
>>
SAAAAAAAAAAAAAAAAAAS!!!!!!!!!!!!!!!!!!!!!!!
>>
THERE'S A FUCKING SAAS IN THE COLLAGE!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
File: Sigma_12120_.jpg (1.88 MB, 2688x1536)
1.88 MB
1.88 MB JPG
>>101655632
>>101655654
Baker has been compromised!
>>
https://files.catbox.moe/2cbnsx.png
>>
>>101655488
got a message for the top right corner image, *hyuckptui*
>>
>>101655785
why so smug
>>
>https://github.com/jhc13/taggui/releases/tag/v1.30.0
>Phi-3-vision-128k-instruct
I wonder if it's crap
>>
>>101656014
probably censored to hell and back. remember seeing people mention how some models would only use gender neutral pronouns and refuse to mention skin color, not sure if phi is like that but i wouldn't be surprised.
>>
>>101655818
When you're the savior of the Six Faced World, you tend to be that way.
>>
>>101656062
I love when the caption is "the person is holding their breasts"
>>
>>101656062
>probably censored to hell and back
I wouldn't be surprised. For now I just need something that captions objects and colors really well. Previous MS models have been decent and lightweight

>>101656085
I like "person is getting her tongue and her tongue pushed up"
>>
>https://huggingface.co/SmilingWolf/wd-vit-large-tagger-v3
have to test this too
>>
>>101656269
>Trained on Danbooru images
You have my attention.
>>
File: Untitled.jpg (2.92 MB, 3840x4356)
2.92 MB
2.92 MB JPG
which one should i use
>>
>https://huggingface.co/docs/peft/main/en/package_reference/boft
and what the fuck is this

>Diag-OFT
waht

>>101656515
seems pretty good, tried on photos
>>
official bigma
>>
File: 00108-1537472161.jpg (271 KB, 1552x1200)
271 KB
271 KB JPG
>>
File: Sigma_12126_.jpg (2.95 MB, 2688x1536)
2.95 MB
2.95 MB JPG
>>101655488
No Sigma the past two collages. SaaS in latest collage.

>Is this an anime betrayal?

>>101641968
>>101642309
>>101642323
>>101646306
>>101646584
>>101646714
>>101648131
>>
File: 1722456844462.jpg (1 MB, 804x1430)
1 MB
1 MB JPG
>>101655488
Can anyone replicate this style, it is sharp and soft at the same time
>>
File: Sigma_12129_.jpg (3.16 MB, 2688x1536)
3.16 MB
3.16 MB JPG
>>101656616
>boft
Interesting, will need to read a paper on it. The description left me with more questions than answers
>>
File: Sigma_12137_.jpg (2.4 MB, 1536x2688)
2.4 MB
2.4 MB JPG
Maybe?
>A realistic full color detailed drawing of a beautiful woman with tribal tattoos and clothed in a fur bikini looking at the camera
>>
>>101657543
>https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
Dev branch has it implemented, but I cannot test it yet. I'll have time tomorrow
>>
File: Sigma_12133_.jpg (3.07 MB, 2688x1536)
3.07 MB
3.07 MB JPG
>>101657622 is for >>101657498
Low sleep, many mistakes

>>101657628
ty, will reference that against what I read
>>
>>101657622
Its sharp but not soft, and definitely not beautiful
>>
File: Sigma_12130_.jpg (3.54 MB, 2688x1536)
3.54 MB
3.54 MB JPG
>>101657756
Oh well
>>
File: Sigma_12143_.jpg (1.35 MB, 2048x2048)
1.35 MB
1.35 MB JPG
>>
>IPNDM sampler
worth using?
>>
>>101657423
two imgs in the last collage were sigma, and before that, three! :P
i dont think ive even seen a completely non sigma collage desu
>>
File: Sigma_12147_.jpg (2.9 MB, 2048x2048)
2.9 MB
2.9 MB JPG
>>101657970
I thought not initially, but it might be sharper than deis when testing again

>>101657984
Previous /ldg/ bread : >>101639278
Which ones? I see SD and Cascade. There were a bunch of OP's w/o Sigma, but I never cared until SaaS creeped in. Tired and tilted so maybe don't mind me
>>
File: Sigma_12148_.jpg (3.03 MB, 2048x2048)
3.03 MB
3.03 MB JPG
>>
File: Sigma_12152_.jpg (2.3 MB, 2816x1408)
2.3 MB
2.3 MB JPG
>>
File: Sigma_12154_.jpg (2.16 MB, 2816x1408)
2.16 MB
2.16 MB JPG
>>
File: 3717041298.jpg (183 KB, 1024x768)
183 KB
183 KB JPG
>>
>>101656576
you must test every single option
>>
>>
Anyone have tips for prompting for a character holding weapons? I find it's really rare to get a gen where it doesn't break everything.
>>
>>101659255
Any decent model will respond well to prompts like "wielding" or "holding"
>>
>>101655745
nice
>>
>>
File: Sigma_12160_.jpg (2.28 MB, 2816x1408)
2.28 MB
2.28 MB JPG
>>101659255
I often see gens with the weapon to the side

>>101660179
ty
>>
File: Sigma_12162_.jpg (2.61 MB, 2816x1408)
2.61 MB
2.61 MB JPG
>>
File: Sigma_12163_.jpg (2.34 MB, 2816x1408)
2.34 MB
2.34 MB JPG
>>
File: Sigma_12164_.jpg (3.12 MB, 2816x1408)
3.12 MB
3.12 MB JPG
>>
File: Sigma_12166_.jpg (2.54 MB, 2816x1408)
2.54 MB
2.54 MB JPG
>>
File: Sigma_12167_.jpg (2.16 MB, 2816x1408)
2.16 MB
2.16 MB JPG
>>
File: Sigma_12172_.jpg (1.68 MB, 2816x1408)
1.68 MB
1.68 MB JPG
I summon the latent underworld to correct what was done.

We must flush this thread from our memories
>>
>>101655488
oni SEX
>>
File: Sigma_12174_.jpg (1.89 MB, 2816x1408)
1.89 MB
1.89 MB JPG
>>
File: Sigma_12175_.jpg (1.66 MB, 2816x1408)
1.66 MB
1.66 MB JPG
>>
File: Sigma_12176_.jpg (1.76 MB, 2816x1408)
1.76 MB
1.76 MB JPG
>>
File: Sigma_12180_.jpg (1.49 MB, 2816x1408)
1.49 MB
1.49 MB JPG
>>
File: Sigma_12181_.jpg (1.56 MB, 2816x1408)
1.56 MB
1.56 MB JPG
>>
File: Sigma_12182_.jpg (2.2 MB, 2816x1408)
2.2 MB
2.2 MB JPG
>>
File: Sigma_12183_.jpg (2.33 MB, 2816x1408)
2.33 MB
2.33 MB JPG
>>
File: Sigma_12184_.jpg (1.97 MB, 2816x1408)
1.97 MB
1.97 MB JPG
>>
File: Sigma_12188_.jpg (2.49 MB, 2816x1408)
2.49 MB
2.49 MB JPG
>>
File: 1722296032849388.jpg (3.72 MB, 1536x2560)
3.72 MB
3.72 MB JPG
first time using comfyui
>>
File: Sigma_12189_.jpg (2.28 MB, 2816x1408)
2.28 MB
2.28 MB JPG
We must unburden ourselves from what has been
>>
File: Sigma_12190_.jpg (1.76 MB, 2816x1408)
1.76 MB
1.76 MB JPG
>>101661333
Good first go!
>>
File: Sigma_12191_.jpg (2.45 MB, 2816x1408)
2.45 MB
2.45 MB JPG
>>
File: Sigma_12193_.jpg (1.62 MB, 2816x1408)
1.62 MB
1.62 MB JPG
>>
File: Sigma_12194_.jpg (2.05 MB, 2816x1408)
2.05 MB
2.05 MB JPG
>>
File: Sigma_12196_.jpg (2.31 MB, 2816x1408)
2.31 MB
2.31 MB JPG
>>
File: Sigma_12199_.jpg (1.45 MB, 2816x1408)
1.45 MB
1.45 MB JPG
>>
File: Sigma_12203_.jpg (3 MB, 2816x1408)
3 MB
3 MB JPG
>>
File: Sigma_12205_.jpg (2.61 MB, 2816x1408)
2.61 MB
2.61 MB JPG
>>
File: Sigma_12206_.jpg (2.28 MB, 2816x1408)
2.28 MB
2.28 MB JPG
>>
File: rabbit_bible_00002_.png (3.19 MB, 1728x1344)
3.19 MB
3.19 MB PNG
>>101661556
i like this one
>>
>>101661802
Mid Century Art of a cat
>>
File: Sigma_12213_.jpg (2.69 MB, 2816x1408)
2.69 MB
2.69 MB JPG
>>101661895
>>
File: Sigma_12214_.jpg (2.15 MB, 2816x1408)
2.15 MB
2.15 MB JPG
>>
File: Sigma_12217_.jpg (2.18 MB, 2816x1408)
2.18 MB
2.18 MB JPG
>>
File: Sigma_12225_.jpg (1.79 MB, 2816x1408)
1.79 MB
1.79 MB JPG
>>
>>101661083
>>101661333
>>101661802
>>101662136
VERY nice
>>
File: Sigma_12244_.png (3.59 MB, 2816x1408)
3.59 MB
3.59 MB PNG
>>101662519
ty
>>
absurd quality ITT
>>
File: Sigma_12250_.jpg (1.81 MB, 2048x2048)
1.81 MB
1.81 MB JPG
>>101656616
>what the fuck is this
Apparently there's a guide and it's really good imo https://huggingface.co/docs/peft/main/en/conceptual_guides/oft

>>101662662
Nice
>>
>>101662687
my feeling is that there isn't much "abstract photography" in the dataset but maybe im just failing to elicit it from the model
>>
>>
regardless, SD is outdated at this point. pixart looks better and better everyday and constantly surprises me
>>
File: Sigma_12251_.jpg (1.97 MB, 2048x2048)
1.97 MB
1.97 MB JPG
>>101662740
100% boring normie IRL stuff
>>
File: Sigma_12252_.jpg (2.91 MB, 2048x2048)
2.91 MB
2.91 MB JPG
>>101662817
>constantly surprises me
My same experience. It doesn't stop. How is 2k the same param count as 1k? Controlnet, etc. are definitely missing for Sigma though
>>
Pixart IPAdapter pls
>>
File: Sigma_12253_.jpg (2.85 MB, 2048x2048)
2.85 MB
2.85 MB JPG
>>101662923
Seems simple to train.. https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py

On the project list!
>>
File: Sigma_12208_.jpg (1.78 MB, 2816x1408)
1.78 MB
1.78 MB JPG
Good night
>>
File: image.png (1.6 MB, 1792x1152)
1.6 MB
1.6 MB PNG
>>
>>101661333
What model?
>>
>>101663059
>Seems simple to train..
That's what they said about pixart itself yet there are only a few people doing it :d
>>
>>101663139
kolors, it's surprisingly good for not being a finetune
>>
>>101663076
>>101663097
Very cool
>>
>>101663558
Thanks, I notice it now
>>
>>
>>101664006
How do you prompt for this?
>>
>>101664101
lucky seed
https://files.catbox.moe/9bj8li.png
>>
>>
File: 116775249263549068-SD.png (3.93 MB, 1200x1776)
3.93 MB
3.93 MB PNG
>>
File: Image.jpg (734 KB, 1792x1152)
734 KB
734 KB JPG
>>
>>
File: 116775249263549073-SD.png (2.65 MB, 1200x1200)
2.65 MB
2.65 MB PNG
>>101664439
>>
>>
File: 00_sig12.jpg (308 KB, 1336x1336)
308 KB
308 KB JPG
>>101657675
Cool
>>
I'm unable to refrain from exploring latent space
>>
File: Image.jpg (1.96 MB, 1792x2304)
1.96 MB
1.96 MB JPG
>>
>>101662687
>Apparently there's a guide and it's really good imo
Did you try it? I wonder if it's upgrade for loras
>>
>>101664551
what model is that? gnarly nails aside it looks quite nice, good gen.
>>
official pixart bigma and lumina 2 and that hunyuan finetune waiting room
>>
>>
>>101665627
Pony realism
>>
File: 1693970610832092.png (909 KB, 749x753)
909 KB
909 KB PNG
Are there any models or local apps that can do video inpainting like pika with 8gb of vram? I want to use ai to make women look pregnant
>>
>>101666646
ty, mind if i ask for catbox? really digging the style.
>>
>>101663299
There's no one training anything, it's a dumb metric.
>>
>>101666880
>its my comfy workflow and i get to choose the fetishes
keke
>>
https://blackforestlabs.ai/announcing-black-forest-labs/
>>
File: Sigma_12254_.jpg (1.93 MB, 2048x2048)
1.93 MB
1.93 MB JPG
>>101663299
Hey anon, I train at night so there's fun results to check in the morning. Maybe speak for yourself

>>101663690
ty

>>101664710
ty

>>101664958
>I'm unable to refrain from exploring latent space
This ^

>>101665134
Seems like a more intelligent way of preserving "Hyperspherical Energy" without adding another linear layer like DoRA.. tldr; probably better and smaller but need to test still

>>101667606
>no one training anything
No u
>>
>>101667993
>12b
w-what a formidable fatty. any vram rich wanna give this a spin? seems like it's already supported by comfyui according to their huggingface repo.
>>
>>101667993
>training
No matches found
>>
>>101668071
i would but i don't see any comfyui workflow and i really don't want to fuck around with settings only to wind up disappointed
>>
>>101668015
>I train at night
What did you train last night?
>>
>>101668015
My point is saying "Sigma isn't being trained" is stupid because there's basically no movement anywhere, it's just a couple people, as always, doing 99% of the work.
>>
File: GRID_2.png (1.31 MB, 1024x585)
1.31 MB
1.31 MB PNG
A new 12B parameter model just got open sourced, the examples are looking pretty good too.
https://blackforestlabs.ai/announcing-black-forest-labs/
>>
>>101668456
It's not open source, there is no training code. It's open weights and not even that, they're delivering only the cucked distilled version. Means no training.
>>
>>101668468
There's two versions, if you're talking about schnell: https://huggingface.co/black-forest-labs/FLUX.1-dev
>>
>>101668495
>FLUX.1 [dev] is an open-weight, guidance-distilled model for non-commercial applications.
>>
comfy thread
>>
>>101668427
you're mom
>>
File: Sigma_12256_.jpg (2.69 MB, 2048x2048)
2.69 MB
2.69 MB JPG
>>101668427
Sigma 2k

>>101668438
It's okay anon, nobody expects you to know everything. Sigma _is_ being trained. There have been a lot of Sigma fine tunes released recently. And recently, I train it nightly.

>>101668456
Impressive but so huge. The gens coming out are absurdly good

>>101668946
kek
>>
>>101668015
>>101669184
why do you think there arent MORE people training pixart? there are people doing it, this is true. i just expected more adoption at this point.
>>
>>101669354
no one is training anything, if you stand back for a second you'd realize that Pony is the only real training SDXL ever got.
>>
>>101669364
>no one is training anything
fair
>Pony is the only real training SDXL ever got.
perhaps
so why arent furfags jumping to sigma?
>>
>>101669364
>pixart
>hunyuan
>kolors
>lumina
>auraflow
>flux
all of this junk and not a single good finetune for any of them. this is how it's going to be for the next 5 years. endless pumping out 'almost good enough' base models that get forgotten in a week thanks to boring datasets and local finetuners lacking the compute needed to make anything with them.
>>
>>101668094
>comfyui workflow
https://comfyanonymous.github.io/ComfyUI_examples/flux/
>>
>>101669411
Because people are dumb and need to be lead to water. Also base Sigma simply doesn't have enough parameters for something like Pony so you need someone to do something like 1.3B.

>>101669427
Auraflow is still in training, why would anyone fine tune something that is still in the oven? Pixart Next will be coming and that's got Nvidia money and their team actually gives a fuck about local training. Kolors is DOA because it's Unet. Hunyuan and Lumina require 40GB+ VRAM computers to train.
>>
bigma status?
>>
File: file.png (54 KB, 256x256)
54 KB
54 KB PNG
>>101669475
>white dog
It seems to be working on finer details now but things are a lot more exploded than usual.
>>
>>101669488
this made me smile :)
>>
>>101669488
I love him.
>>
>>101668456
>12B
will that work on my 24gb gpu though?
>>
>>101668468
>It's not open source, there is no training code. It's open weights and not even that, they're delivering only the cucked distilled version. Means no training.
that's pretty easy to make the training code, stop bitching we got the weights lol
>>
>>101669559
I expect to see your training code soon then.
>>
>>101669567
anon, the worst part of imagegen model is to spent millions of dollars to train a good model, making some code with chatgpt is easy as fuck in comparaison, why are you crying?
>>
>>101668456
https://huggingface.co/black-forest-labs/FLUX.1-dev/discussions/1#66ab9dc4fd4ae9a7c49be855
>I have a 3090 with 24gb vram. But 12b parameters in float16 format are still ~24GB and this does not include the two text encoders nor the internal state of the model.
lmaoooo, what's the point then if no one can run it?
>>
>>101669636
Don't have to worry about competition if you make your model too big to run. It's what I would've done if I was SAI unironically.
>>
File: 1699741388573.png (594 KB, 1602x900)
594 KB
594 KB PNG
>>
>>101669657
How do you convince your investors to spend millions of dollars on a model no regular user can use though? Sounds like suicide
>>
>>101669706
wtf? it's the FLUX.1-dev model?
>>
>>101669732
under the false premise that you can make money selling access via an api
>>
>>101669706
what's this picture? can you provide the source? that's interesting
>>
File: Sigma_12259_.jpg (1.78 MB, 2048x2048)
1.78 MB
1.78 MB JPG
>>101669354
A two months ago there were 0 fine tunes. Momentum starts slow. How long was SDXL out before Pony?

>>101669458
>Because people are dumb and need to be lead to water.
It's worse. They need to follow someone who says it's drinkable that they trust already.

>>101669488
HYPE!!! Mostly white background too
>>
here are some 12b gens by an /lmg/ anon
>>101668789
>>101668964
>>101669042
>>101669149
>>
>>101655488
>>
>>101669873
lmg sama i love you
>>
>>101669873
I'll see if I can get it running since bigma crashed
>>
>>101669912
>bigma crashed
NOOOOOOOOOOOOOOOOOOOOOOOO!!!!!!!!!!!!!!!!!!!
>>
>>101669925
No, training sometimes crashes the GPUs and I have to restart the computer and it takes 15 minutes to load all the images so might as well dick around for a bit.
>>
>>101669873
that's insane, are we back? I waited so long for that day to happen!
>>
>>101669873
Wow looks really good, ty for linking!

>>>/g/ldg eating good
>>
>>101669939
if you get it running could you try this prompt? >>101669706
>>
File: Sigma_12260_.jpg (1.97 MB, 2048x2048)
1.97 MB
1.97 MB JPG
>>101669912
Bigma anon doesn't stop winning even during a crash. I've never had it crash during training btw. Are you getting random OOM's or something else?
>>
>>101669982
It's probably because I use my computer while it's training, I assume there's some sort of memory leak at the Nvidia driver level as it completely kills the video drivers where the screen just starts to stutter then freezes.
>>
File: file.jpg (156 KB, 1024x1024)
156 KB
156 KB JPG
it doesnt fuck up anatomy of crouching/sitting subjects like SD3 holy shit thats nice
>>
File: 5111951916.png (781 KB, 1367x824)
781 KB
781 KB PNG
bruh it uses 999gb of vram swapping all that shit takes 2 min+ to make a image, barely uses the GPU (4090)
>>
File: file.png (322 KB, 1024x1024)
322 KB
322 KB PNG
you can try out the model for free on replicate
>https://replicate.com/black-forest-labs/flux-dev
>>
File: file.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
tried this >>101669706
oh wow
>>
File: sample.jpg (403 KB, 1024x1024)
403 KB
403 KB JPG
>>101670100
https://replicate.com/black-forest-labs/flux-pro
This one also works without an account, crank the safety tolerance to 5 so it doesn't stall on you.
>>
>>101670081
fellow 4090 user, same here i'm trying it out on /h/. the swapping is fucking brutal which is a shame because the gen speeds aren't that bad if it could actually stay loaded
>>
File: 4799.png (2.39 MB, 1024x1024)
2.39 MB
2.39 MB PNG
>>101670081
Meant RAM*, it hits the disk swap like crazy on 32gb
>>101670155
yeah, hopefully the model can be trimmed a bit to fit
>>
File: file.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
>>101670148
>without an account
im using it fine without an account
>flux-pro
i believe flux-pro is their api only version, the flux-dev model is they released
>>
File: sample (1).jpg (261 KB, 1024x1024)
261 KB
261 KB JPG
>>
File: file.png (1.03 MB, 1024x1024)
1.03 MB
1.03 MB PNG
>touhou, crino, 1girl, she has gigantic ass tits wooow
>>
>>
File: file.png (1.27 MB, 1024x1024)
1.27 MB
1.27 MB PNG
>an image of hatsune miku holding out both her hands, on her right hand is a red pill, on her left hand is a blue pill
>>
File: file.png (1.29 MB, 1024x1024)
1.29 MB
1.29 MB PNG
>an image of hatsune miku, a large number of blue and red pills are coming out of her nostrils
i'll try "nose" instead next
>>
>>
File: file.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
>>101670277
>an image of hatsune miku, a large number of blue and red pills are coming out of her nose
>>
>>101670246
We're so back.
>>
File: file.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
>>
File: sample (3).jpg (413 KB, 1024x1024)
413 KB
413 KB JPG
>>
File: file.png (1.68 MB, 1024x1024)
1.68 MB
1.68 MB PNG
>a picture of new york city, there is line of giant blue and red pill shaped buses on the road. the blue pill buses all have the face of hatsune miku on them
>>
File: sample (4).jpg (313 KB, 1024x1024)
313 KB
313 KB JPG
>>
>>
File: sample (5).jpg (269 KB, 1024x1024)
269 KB
269 KB JPG
>>101670326
wrong Pitbull
>>
>>
File: file.png (1.39 MB, 1024x1024)
1.39 MB
1.39 MB PNG
>a deranged serial killer using a crude cutout of hatsune miku's face as a mask, it is taped onto his face. he is very muscular with big nipples that look like sharp spears
not what i asked for but alright. maybe it will do better with non esl prompting
>>
>>101669427
>>101669636
Maybe this is a good time for me to complain for a bit.

How come no existing training scripts can make efficient use of multiple consumer GPUs? I made an LLM training script (qlora-pipe on Github) that does pipeline parallelism. With that + full bf16 training + Kahan summation in the optimizer, I can match the performance of mixed precision while full finetuning something like llama 3 8b on 4x4090. But with SDXL, despite being a mere 2.6B parameters, I can't FFT it (not without compromises) using any training script.

OneTrainer doesn't even support multi-GPU (lol, lmao even). With kohya, FSDP doesn't work. Deepspeed only got support recently, but DS Zero forces you into mixed precision training (where weights, grads, and optimizer state are all kept in fp32). Plus Zero has high inter-GPU bandwidth requirements and a decent amount of VRAM overhead it seems like. Basically I can't do a proper FFT of SDXL even on a fucking 4x4090 machine.

Full bf16 training + adam with kahan summation uses 10 bytes per parameter. SDXL should easily be able to be FFT'd on just 2 3090s, which is a common setup for AI enthusiasts (at least in LLM land). No training script can even get close to this. And for the new flux 12b for example, a pipeline parallel training script ought to be able to do a decent rank lora on 2x3090 as well.

At this point I just need to make a pipeline parallel training script for diffusion models I guess.
>>
File: 1734.jpg (1.18 MB, 3072x2048)
1.18 MB
1.18 MB JPG
fluxsisters... I need more ram
>>
If Flux can't do female feet, it's not worth using.
>>
File: Sigma_12261_.jpg (2.15 MB, 2048x2048)
2.15 MB
2.15 MB JPG
>>101670013
Whoa.. I can even game while training on arch/kde. Base KDE/X uses 1GB VRAM and no more

>>101670080
Look at that hand!

>>101670222
Why so blurry?

>>101670246
Amazing
>>
File: file.png (1.22 MB, 1024x1024)
1.22 MB
1.22 MB PNG
since it's a transformer i hope it can be quanted like llms
>a holy painting of jesus christ caressing his big pregnant belly. there is a speech bubble above him saying "I shall name him PixArt Bigma"
>>
>>101670414
Because I asked for it. It's unfortunate that low-quality stuff like that is unreliable at best with these models, they filter out the low-quality images for training and that's all I want to generate.
Maybe in the future there's gonna be image upload/customization options like Midjourney, that would really be something.
>>
>>101670441
kekd
>>
>>101670395
Basically no effort was done on this because they know that true democratization of training kills them. There's a reason why they keep the requirements >24 GB.
>>
>>101670184
I don't see much difference in image quality between pro and dev, that's cool
>>
File: file.png (1.4 MB, 1024x1024)
1.4 MB
1.4 MB PNG
>an image of a cat just sitting there looking at the viewer. a speech bubble above him says "Near a tree by a river
There's a hole in the Ground
Where an old man of Aran
Goes around and around
And his mind is a beacon
In the veil of the Night
For a Strange kind of Fashion
There's a wrong and a Right"
>>
>>101670413
>female feet
>>
File: file.png (1.71 MB, 1024x1024)
1.71 MB
1.71 MB PNG
>an image of "the last supper" but everybody has been replaced with drag queens and shemales, there is a child in a yellow coat for some reason
>>
>ai generated
>>
>>101670556
I'm counting 5 toes on each foot, it's worth using.
>>
File: file.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
>an image of hatsune miku holding out both her hands, on her right hand is a red pill, on her left hand is a blue pill, there is a speech bubble above her saying "the right makes you constipated, the left gives you diarrhea. choose wisely.", 1badass
it gave me a 1cute instead
>>
File: 489362.png (1.19 MB, 1216x832)
1.19 MB
1.19 MB PNG
KEK
>>
>ERROR
>You have reached the free time limit.
Death to non-local
>>
>>101670629
hunyuan feet anon is going to love this
>>
File: 9762.png (1.3 MB, 1216x832)
1.3 MB
1.3 MB PNG
>>101670632
>Death to non-local
>ERROR
>You are out of memory.
>>
File: image.jpg (121 KB, 1024x1024)
121 KB
121 KB JPG
>>
>woman bowing forward, seen from behind, bikini
will not make her bow at all
>>
File: image (1).jpg (66 KB, 1024x1024)
66 KB
66 KB JPG
>>
this had to be trained on synthslop, i can just tell. betting the paper will confirm another journeyDB masterpiece
>>
>>101670725
yeah it looks like it desu
>>
>>101670725
yes look at those feet
>>
File: file.png (852 KB, 1024x1024)
852 KB
852 KB PNG
>an image of a cute little chibi anime girl smiling at the viewer, a speech bubble above her says "i'm going to say the nigger word"
>>
File: image (2).jpg (124 KB, 1024x1024)
124 KB
124 KB JPG
>>
https://replicate.com/black-forest-labs/flux-pro/examples
>Hardware: CPU
>Total duration 20.7s
wtf? how can it be so fast with cpu?
>>
>>101670767
AMD MI300
>>
File: sillytest.png (606 KB, 815x721)
606 KB
606 KB PNG
i'm impressed by stable-fast-3d with how it handles non character images

https://huggingface.co/stabilityai/stable-fast-3d
>>
File: file.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
>>101670303
KEK
>>
>>101670767
https://replicate.com/pricing
Also says the CPU is 4X. Is it regular 4 cores? or are they using some server grade 64 core x 4 = 256 cores?
>>
File: image (3).jpg (120 KB, 1024x1024)
120 KB
120 KB JPG
>>101670725
>synthslop
yep
>>
>>101670148
I'm not feeling this model at all. I am not happy with the results from the pro version and that is the one you don't get to download. Dev is supposedly even worse. 12B model too.
I ran bunch of tests and I prefer Dall-E 3 outputs over this pro model.

This does not seem like a local model competitor, but more like a DE3, SD3 Large and Midjorney competitor.
>>
>>101670643
It's pretty bad at making feet actually. Worse than the Chinese models.
Also I think you can easily finetune the china models for feet, but this not so much.
>>
File: image (4).jpg (114 KB, 1024x1024)
114 KB
114 KB JPG
>>
File: flux.jpg (328 KB, 768x1280)
328 KB
328 KB JPG
>>101670733
something about it looks like Midjourney v4 outputs aesthetically. doesn't look authentic. ai trained on ai vibe. cool comprehension and a fun model but doesn't seem like the improvements line up with increased resource requirements. feels like an 8b model at max
>>
>>101670794
I really don't give a shit about 3D modeling
>>
File: file.png (1004 KB, 1024x1024)
1004 KB
1004 KB PNG
>>101670836
hunyuan feet anon is going to hate this...

>Patchouli Knowledge from touhou, her hair and eyes are purple and has many ribbons tied to her hair and other parts of her clothing. She wears pink pajama-like clothing and a night-cap with a gold crescent moon on it. Her dress has stripes of purple and violet, she is eating a cigarette
>>
>>101670811
you can test out the dev version here
https://replicate.com/black-forest-labs/flux-dev
it's not that different to pro imo, and it's easily the best local model we ever had, that's a great day for me, fuck SAI
>>
File: mohammed.jpg (129 KB, 1024x1024)
129 KB
129 KB JPG
>>
>>101670148
>>101670424
>Great image quality
>Ok prompt understanding
>Can do NFSW
>Nice anatomy
>Apache 2.0 Licence
That's insane, I never thought we would get that day, WE ARE SO BACK
>>
https://huggingface.co/black-forest-labs/FLUX.1-schnell/tree/main

24GB model. Need a distilled version under 8GB
>>
File: image (6).jpg (74 KB, 1024x1024)
74 KB
74 KB JPG
>>101670932
nah i dont think this is it
>>
File: file.png (976 KB, 1024x1024)
976 KB
976 KB PNG
>a little anime girl sitting on a dirty couch looking wasted, dark circles under her eyes. she has a cigarette in her mouth, hand and nose. next to her is a beer can with a cigarette in it instead of a straw. her room is dusty and decrepit, there is a thought bubble forming above her head, inside it is an image of a pack of cigarettes
>>
>>101670081
why you use the schnell one? it's the worst version
>>
>>101670942
what about the dev version? it's the better one no?
>>
https://comfyanonymous.github.io/ComfyUI_examples/flux/
>If you don’t have t5xxl_fp16.safetensors or clip_l.safetensors already in your ComfyUI/models/clip/ directory you can find them on: this link. You can use t5xxl_fp8_e4m3fn.safetensors instead for lower memory usage but the fp16 one is recommended if you have more than 32GB ram.
How does that work? you can run the model on the GPU and the text encoder on the CPU?
>>
>>101670954
Dev version is bit better and is trained on the PRO versions
>>
File: file.png (1.47 MB, 1024x1024)
1.47 MB
1.47 MB PNG
>A tense diplomatic negotiation in a grand hall, featuring representatives from 20 different countries, each wearing traditional attire. The scene should include interpreters, aides whispering to their leaders, and visible emotional reactions ranging from frustration to hope.
Duonald trump
>>
>>101670947
idk, I picked the one that was running on the HF space
>>
>>101670303
where is he off to?
>>
File: sample.jpg (322 KB, 1440x1440)
322 KB
322 KB JPG
>>101670932
yes anon, we're at insane level of back
>>
>>101670932
>Can do NFSW
can not, try to do do woman from behind bowing
>>
Dallefags in shambles. We actually won
>>
>>101670980
it can do nude, and has great anatomy, finetunes will help for the poses
>>
>>101670978
/sdg/
>>
>>101670932
>requires RTX 4090
>most likely bitch to train and make loras for
>anatomy still lacking compared to Kolors for example

I doubt that this is it, but maybe. Apache 2.0 license is the biggest thing that sets it apart from all the others.
>>
File: sillytest2.png (1.45 MB, 1198x1111)
1.45 MB
1.45 MB PNG
>>101670856
wasn't this supposed to be a safe space for alternative things?
>>
File: Sigma_12267_.jpg (2.26 MB, 2048x2048)
2.26 MB
2.26 MB JPG
>>101670725
Can't copyright AI output so likely the investor-safe route. Still Sigma 0.6B btfo by this 20x larger model though got dayum
>>
>>101671010
>Apache 2.0 license is the biggest thing that sets it apart from all the others.
not just that, the image quality is insane, and it has great prompt understanding and is perfect at text, this one is truly at API levels >>101670979
>>
File: image (8).jpg (185 KB, 1024x1024)
185 KB
185 KB JPG
>>101670979
>>
>>101671030
it's also great at hands, holy fuck I never expected to get such a great local model in my lifetime
https://reddit.com/r/StableDiffusion/comments/1ehknmh/new_ai_model_flux_fixes_hands/
>>
https://comfyanonymous.github.io/ComfyUI_examples/flux/
Can someone provide the links for the flux nodes?
>>
>>101671016
that's pretty cool ngl kek
>>
File: file.png (1.75 MB, 1024x1024)
1.75 MB
1.75 MB PNG
>>
https://huggingface.co/camenduru/FLUX.1-dev/tree/main
>flux1-dev.sft
what's a .sft? that's the model? why isn't it a safetensor model like the others?
>>
File: Sigma_12269_.jpg (1.78 MB, 2048x2048)
1.78 MB
1.78 MB JPG
>>101671016
It's not a safe space but he doesn't even have a gen. Fuck that guy and keep posting your local gens like the OP suggests
>>
>>101671091
>SFT
>SaFeTensors
>>
>>101671065
comfy has native support for this type of model. doesn't need extra nodes.
>>
>>101671091
.sft is safetensors according to comfy
>>
>>101671065
flux came with day 1 native comfy support
>>
File: ComfyUI_0036.jpg (99 KB, 1024x1024)
99 KB
99 KB JPG
>>
File: file.png (983 KB, 1024x1024)
983 KB
983 KB PNG
>>
File: file.png (1.02 MB, 1024x1024)
1.02 MB
1.02 MB PNG
>ginger woman squatting, she is wearing round glasses and a stripped top with overalls a small frilled skirt and knee high pink boots,
>>
>>101671030
I believe when I can see that people can easily make finetunes and loras for this beast. I want to see good paper with no ClosedAI bullshit where all the sauce is hidden. I want training examples, training code etc.

Just dumping the weights on the internet is not "open source" enough for me.
>>
File: file.png (1.09 MB, 1024x1024)
1.09 MB
1.09 MB PNG
>a profound image of an anime girl in deep meditation, a white glow emanating from her head as she attains nirvana, the mighty glow from her eyes cause the entire image to tremble and warp. there is a speech bubble above her head saying "what if pixart 12b"
>>
>>101671054
But terrible at feet. What a price to pay. Still better than SD3.
>>
>>101671104
>>101671100
oh ok my b I'm a retard, thanks kek

>>101671146
it's way harder to get millions of dollars to train a 12b model than making a training code, don't worry about it, the model is so good everyone will make the training work
>>
>>101671169
Also, keep in mind file extensions are just cosmetic, they merely inform programs of what to expect, you can rename a model to .jpg and still load it just file as long as the program recognizes it should try to load .jpg files as tensors
>>
File: file.png (1.72 MB, 1024x1024)
1.72 MB
1.72 MB PNG
shrek lying on a recliner next to a pool, hes is drinking margarita and saying on a speech bubble "life is good"
>>
>>101670478
>they keep the requirements >24 GB
I mean it's not some big conspiracy. Larger models are better, and will require high VRAM GPUs or multiple smaller GPUs to even train a lora. That's fine. I'm just surprised that with how popular imagegen is, none of the training script creators have put much effort into efficiently splitting models across 2+ GPUs so they can be trained with consumer hardware. It's possible and not even particularly difficult, it's just nobody seems to care. Like I said, at this point I'm seriously considering making my own pipeline parallel training script (will be open source if I do it), especially if this flux model or the new larger pixart model are any good.
>>
File: file.jpg (267 KB, 1360x768)
267 KB
267 KB JPG
we're so back
>>
>>101671222
wtf pikachu
>>
>>101671216
>it's just nobody seems to care
not enough people care, images are useless, meanwhile LLMs can actually do things, hence most people who know about ml are working on that instead
>>
>>101671103
>comfy has native support for this type of model. doesn't need extra nodes.
I always thought I would never use this Spaggheti shit but here we go... the model is too god to be avoided at this point
>>
>>101671216
I'm just saying that as a business decision targeting >24 GB is a smart choice as it gives you brownie points and publicity having a "local" model while forcing most people to use your API. Honestly the best license would be something like:
"Commercial use except for on-demand image generation via an API"
>>
>>101671222
This image made me realise I definitively have some weird cloth fetish.
>>
File: file.png (1.4 MB, 1024x1024)
1.4 MB
1.4 MB PNG
>image of an heavenly immortal anime girl seated in deep mediation, her cultivation breaking through to 12b pixart biggerma realm, heaven and earth shatter as a speech bubble appear above her head saying "what if pixart 12b"
>>
>>101671216
>I mean it's not some big conspiracy. Larger models are better
this, don't blame the model creators, blame Nvdia for nerfing the VRAM, it's still at 24gb since FUCKING 2018 (Rtx-Titan)
>>
>>101671241
That is changing now as all the modalities seem to be converging. Audio, text and image all on a single multimodal LLM.
>>
>>101671286
a colossal model that no one will be able to run too
>>
File: file.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
>photo electric effect
>>
File: ComfyUI_00226_.jpg (749 KB, 2048x2048)
749 KB
749 KB JPG
The way it works with texts, even in cursive, is fucking amazing
>>
>>101671366
LMAOOOOOO
>>
>>101671366
so you made in run in comfy ui anon? how much VRAM does it ask? (image model + text encoder)
>>
>>101671241
For productive use, LLMs are more impactful than imagegen, yes. But for hobbyist use for "fun" (porn), I think imagegen is way more popular than anything people do with LLMs. Look how many loras and models are on civit compared to community LLM finetunes on huggingface.

You see all these anons in this thread complaining that larger models are way too hard to finetune. This is only because existing training scripts are shit and can't do it, theoretically it's easily achievable. With a 2x3090 machine you should be able to FFT SDXL or train a lora on Hunyuan or the new flux 12b model.

Fuck it, this weekend I'll make an attempt at a pipeline parallel training script for diffusion models, at least just to try to judge how much work it would be. Probably it shouldn't even be that much work if I reuse all the dataset loading code from kohya and make it based on HF Diffusers.
>>
File: file.png (10 KB, 287x213)
10 KB
10 KB PNG
>>101671401
Even in offload mode it maxes out my 4090
>>
File: file.png (894 KB, 1024x1024)
894 KB
894 KB PNG
>A colossal anime woman towers above a plain field, her gigantic form stretching across the sky. The left side is bathed in a brilliant blue sky, while the right side is shrouded in a deep, velvety night sky, which she wears like a cape. Stars twinkle like diamonds across her gown, and the moon casts a silver glow on her majestic form.
not what i asked but looks quite nice
>>
https://huggingface.co/camenduru/FLUX.1-
>ae.sft
>clip_l.safetensors
>flux1-dev.sft
>t5xxl_fp16.safetensors
>t5xxl_fp8_e4m3fn.safetensors
can someone help a retard that will use Comfy for the first time of his life? do I have to download everything? what does those files mean?
>>
Fresh bread is ready to eat...
>>101671236
>>101671236
>>101671236
>>
File: file.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
>>101671446
>A colossal anime woman towers above a plain field, her gigantic form stretching across the sky. But it's what she's wearing that's truly striking: the night sky itself, draped across her shoulders like a majestic cape. Stars twinkle like diamonds, and the moon casts a silver glow on the folds of her celestial garment, as if the very fabric of the universe has come to life to adorn her.
>>
>>101671426
>Even in offload mode it maxes out my 4090
what do you offload? the encoder text?
>>
>>101671453
ty baker
>>
>>101655488
I haven't come to /g/ in ages. What's the difference between this general and the stable diffusion one? It seems like anon are posting the exact same type of content in both.
>>
>>101671426
what if you load in 8bit?
>--fp8_e5m2-text-enc --fp8_e5m2-unet
>>
>>101671544
/sdg/ allows saas gens, /ldg/ is local only
>>
>>101671544
the amount of free mental healthcare available in the country of the frequenters
>>
>>101671544
>exact same type of content in both.
lurk long enough and you'll realize sdg is just a discord chatroom kek
>>
>>101671617
>lurk long enough and you'll realize 4chan is just a discord chatroom kek
>>
File: 1720013034361301.png (1.16 MB, 1024x1024)
1.16 MB
1.16 MB PNG
>>
>>101671630
>he doesn't know most of sdg is avatarfags saying gm and gn to eachother and sharing suno songs
>>
>>101671544
There's none. This thread has no right to exists.
>>
Ok. I came here because of Flux. I noticed /ldg/. and /sdg/ for some time now but I don't know the exact difference. I guess /ldg/ is more about tech than sharing gens, where /sdg/ is just about sharing gens? Or is there some drama that I missed that explains the split? If you could explain it as a veteran /sdg/ fag I'd be grateful
>>
>>101672532
/sdg/ allows dalle and gemma gens, /ldg/ is strictly local
>>
>>101672553
hmm. That doesn't sound right. Why is it called /sdg/ then? Also... that's it? No tripcode drama war or spamming autists fighting each other?
>>
>>101672586
it's a avatarfag and blogpost central as well
>That doesn't sound right. Why is it called /sdg/ then?
no clue, they just stopped caring for whatever reason
>>
>>101672614
fwiw, I like the "local [ai topic] general" naming better (like /lmg/) although iirc it didn't exist when /sdg/ came out. (or maybe it didn? I forget which came first) I know /lmg/ split from /aicg/ which is actually cancerous.
>>
>>101672685
anon frequently compares sdg to aicg and ldg to lmg, this is true
>>
>>101672685
>I know /lmg/ split from /aicg/ which is actually cancerous.
/ldg/ spilt from /sdg/ for similar reasons, it happened a while before the sd3 launch
>>
nice
>>
>>101675016
sure...
>>
>posting in the previous previous bred
>>
Oni girl rocks
>>
>>101675045
Thanks
>>
>>101675016
>>101675031
what did she mean by this
>>
just wanted you guys to know that i have a boner
>>
would you like help with that, anon
>>
>>101671423
>But for hobbyist use for "fun" (porn), I think imagegen is way more popular than anything people do with LLMs
lmao no, ERP is addictive as crack, at least the first few months
the reson you barely see less loras on LLMs is because the smallest LLMs are 4 times as big as the biggest image gen models



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.