[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


Discussion of free and open source text-to-image models

Previous /ldg/ bred : >>102908985

SANA Round Two Edition

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
ComfyUI: https://github.com/comfyanonymous/ComfyUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
SD.Next: https://github.com/vladmandic/automatic
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://aitracker.art
https://huggingface.co
https://civitai.com
https://tensor.art/models
https://liblib.art
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/sd-scripts/tree/sd3

>Flux
https://huggingface.co/spaces/black-forest-labs/FLUX.1-schnell
https://comfyanonymous.github.io/ComfyUI_examples/flux
DeDistilled Quants: https://huggingface.co/TheYuriLover/flux-dev-de-distill-GGUF/tree/main

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/comfyanonymous/hunyuan_dit_comfyui
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Maintain thread quality
https://rentry.org/debo

>Related boards
>>>/aco/sdg
>>>/aco/aivg
>>>/b/degen
>>>/c/kdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/tg/slop
>>>/trash/sdg
>>>/u/udg
>>>/vt/vtai
>>
this thread is for mourning the death of bigma
>>
File: file.png (1.44 MB, 1248x1248)
1.44 MB
1.44 MB PNG
Once I get over the VAE compression I think I can accept this model for dicking around.
>>
Now for drugs and sleep, good night anons.
>>
File: file.png (160 KB, 346x261)
160 KB
160 KB PNG
>>102919465
the fuck is wrong with their eyes? if we can change the VAE maybe it'll be saved idk
>>
File: file.png (1.95 MB, 1248x1248)
1.95 MB
1.95 MB PNG
>>102919480
It's hard to tell what's the VAE and what's from the model being undercooked. There's a reason why they haven't released the weights (it's not done).
>>
>>102919516
>There's a reason why they haven't released the weights (it's not done).
I really don't get what they're doing, what's the point of releasing an uncooked demo in the first place? They wanted to be clowned on?
>>
cascade anon has been on suicide watch for so long someone check up on him
>>
>>102919537
>schizo anon is talking to himself again
>>
>>102919530
Who knows, they made some weird decisions, like going way too hard on the VAE compression. Even 16x would've been impressive and achieved their goal. Same with switching to Gemma saving some headroom on the text encoder. If it were me I would've figured the requirements to train the model on a 24 GB VRAM GPU at 1024px then size the model for that, either 2B or 3B with 16x compression. The model is too experimental like Cascade.
>>
>>102919576
yep, they made a serious mistake there, no one care about small models that produce bad images, we want quality first, and they could've achived that with a 5b model + a normal VAE
>>
>>102919638
3B for SD3 barely fits at 768px for training on a 4090 at batch 1 with all the optimization tricks. 5B is a dream.
>>
>>102919645
>5B is a dream.
the 5090 (32gb) will be there soon, so it won't be a dream anymore
>>
what is 8x8?
is it better than fp8?
>>
>>102919668
8x8 is 4x4 doubled.
>>
>>102919668
>what is 8x8?
the what? where did you find this?
>>
>>102919652
That might mean 1024px for SD3, maybe batch size 2. 5B would still be a dream. 1.6B wouldn't have been bad if the AE wasn't so extreme. But I won't poo poo it until I can train it myself. Their training methods are questionable.
>>
>>102919689
>That might mean 1024px for SD3
I don't get it, SD3 is a 2b model, yet we managed to train SDXL (2.7b) on a 3090 though?
>>
File: file.png (18 KB, 1503x80)
18 KB
18 KB PNG
>>102919686
>>
>>102919699
Transformers uses more VRAM.
>>
>>102919638
>5b model + a normal VAE
Then where is the research? That's just another SD clone then. At the end of the day I'm glad they tried something new. Later zhangs will read their paper and create the ultimate 1girl generator with 128x compression VAE
>>
anyone using juggernaut v11?
>>
>>102919711
>That's just another SD clone then.
it's not, SAI tried to make DiT models, and they suck ass, I'd say it's a smaller version of Flux, we don't need a 12b model, I'm sure we can reach the same quality with 5b, but definitely not with 1.6b, which is my point
>>
So is Illustrious shit or promising? I wasn't around when it was fresh. Seems like a step down from my playing with it, even for coomers.
>>
>>102919427
forgot SANA links, retard

https://github.com/NVlabs/Sana

https://huggingface.co/collections/mit-han-lab/dc-ae-670085b9400ad7197bb1009b

https://ea13ab4f5bd9c74f93.gradio.live/

>>102919768
try out NoobAI which is a derivative of illustrious
>>
>>102919771
>forgot SANA links, retard
sana doesn't deserve to be in the OP, it sucks ass
>>
File: file.png (902 KB, 1248x896)
902 KB
902 KB PNG
>>102919757
You're such a size queen, you would think after seeing Florence2 that size isn't the be-all-end-all of models. Pixart at 600m was just fine. 1.6B would be fine especially for niche models with 100,000 image datasets. Sana could very well be *the* porn model.
>>
>>102919789
>1.6B would be fine
sana proved it's not, stop coping, they tried all the tricks that existed on earth for that one, and it still looks like ass
>>
>>102919775
>>102919802
chill out anon we're just having fun with a new model it's not that deep
>>
>>102919802
Sana didn't prove anything. I already just said Pixart is fine with 600m. You don't know the final result of Sana, you just know what an undertrained alpha model looks like. Honestly you're exactly why no one ever posts anything because you're incapable of abstract thought. Congrats anon, you're a faggot. You're someone that sees the ingredients of a cake and say "I CANNOT EAT THIS SHIT"
>>
I'm looking to upgrade my video card to something a little bit more fitting for AI image generation than my current GTX 970. I've got a Ryzen 5 5600 and a 550 watt power supply. I'm going to guess I'm not going to be able to support the latest and greatest of 4090's on that, so what's a non-AMD card with a comfortable amount of VRAM that my current hardware can support?
>>
>>102919832
>you just know what an undertrained alpha model looks like.
that's your assumption, what makes you believe it's undertrained and it's not the final result?
>>
>>102919757
>it's not, SAI tried to but it sucked ass
>implying SAI is competent at all
>>
File: file.png (1.14 MB, 896x1152)
1.14 MB
1.14 MB PNG
>>102919771
Thanks anon, taking a look at the gallery, still seems to have that sketchy, uncertain quality to the images.
>>102919832
Is the reason pixart hasn't been adopted because of anti-hype?
>>
>>102919825
>>102919832
let me guess, you were SD3 shills back in the days aswell?
>"Just trust Lykon bro, SD3 will be the best model ever"
you can't stop taking L's don't you? kek
>>
File: file.png (2.34 MB, 1024x1408)
2.34 MB
2.34 MB PNG
>>
>>102919851
>Is the reason pixart hasn't been adopted because of anti-hype?
it's because it hadn't beaten the previous local sota model which was SDXL, simple as that
>>
>>102919851
>gallery
researcher gens are always ass but i wont pretend it's not wonky
>>
>>102919422
Because the promt results are the same as in the paper, which uses prompt rewriting. For example try

portrait photo of a girl, photograph, highly detailed face, depth of field

It looks basically the same
>>
File: file.png (1.51 MB, 1248x896)
1.51 MB
1.51 MB PNG
>>102919851
600m is fine but is restricting and definitely at the niche size category and not what people are looking for for a base model. But comparatively 600m for Pixart is the same overall quality of say SD 1.5.
>>
Excuse me, this is the pixart bigma thread, chud.
>>
>>102919889
I'm glad I decided to become an engineer rather than a researcher, looks like a field of litteral retards
>Make a paper about an unfinished product
>Put shitty pictures as their cherry picked pictures
>Add a demo of their undertrained turd
I swear to god if I was working with such dumbasses I would end my life
>>
>>102919916
Maybe because researchers can see the bigger picture when you say hit your face saying "I can't eat raw flour"
>>
>>102919427
Another bake that deliberately snubs all 1girl in favor of low effort slop. Do I need to make the real collage myself?
>>
>>102919932
>Maybe because researchers can see the bigger picture
nah, look at the SD3 researchers, they are actual retards, and the sana team will join that list of retards
>>
>>102919938
There are three 1girls in the collage anon
>>
File: file.png (1.31 MB, 896x1152)
1.31 MB
1.31 MB PNG
>>102919851
Best of 4 with the same prompt (with replaced prompt conditioning), illustrious definitely has better prompt adherence than pony. There's at least some feeling of Ornifex there. But the quality is so much better, and I heaped on the retarded sd1.5 prompt that seems recommended. Can't say base pony was too much better. As with all these things, we'll just have to see how things shake out.
>>
>>102919947
SD3 took way too long and had no results. Sana is coming out after 5 months from Pixart Sigma. Completely different story.
>>
>>102919958
its different for that anon because he is incapable of abstract thought or nuance
>>
>>102919958
>Sana is coming out after 5 months from Pixart Sigma.
what's the point? I prefer them to took 1 year but make a diamon than going for 5 month to shit out of a turd
>>
>>102919950
No, there are zero.
>>
>>102919966
**its not different
>>
>>102919966
>>102919978
you lost the nuance in your grammar anon kek
>>
File: file.png (1.77 MB, 1140x1137)
1.77 MB
1.77 MB PNG
>>102919970
Because it's 3girls? BRAIN BLAST
>>
>>102919969
I know you're incapable of thought, but maybe the more important part is they managed to reduce the resources required to train a model 8x? That means someone can make your precious 5B model 8 times faster. Do you know what it means to work smarter not harder?
>>
>>102919993
>they managed to reduce the resources required to train a model 8x?
again, what's the point? it's a little turd, you won't make it great with more training, we've been training SDXL (2.7b) for a year and a half at this point, and it'll never reach Flux, when you're small you're small, cope with it
>>
>>102920004
What's the point of making better tools for training models that makes training future models 8 times faster? Surely you're not this stupid, right?
>>
>>102920004
I don't know why you're here, obviously AI is way too experimental for you. Maybe you should come back 10 years when things are more like your iPhone 14.
>>
>>102920019
>Surely you're not this stupid, right?
tell that to the sana team who decided to go for a little turd even though they would've proven to everyone how great their technique was if we had a big model in our hands
>>
Sana is the biggest leap fod local models since sd 1.0. I've heard rumors that sana 2 started training recently and the team is already shocked by the results. Apparantly it is beating flux on most benchmarks after just 50 H100 hours of training time.
>>
>>102920034
I accept your concession
>>
File: file.png (1.17 MB, 1248x896)
1.17 MB
1.17 MB PNG
>>
nothing like a new model to bring out the retards
>>
This general was founded on an irrational millennarian enthusiasm around the release of Pixart Sigma; today, while we continue to use models finetuned from Stability models or made by former Stability employees, Pixart drops another dud. The thread is, predictably, in shambles. And we absolutely deserve it.
>>
>>102920049
>nothing like a new model to bring out the retards
true, that's why you're here right now
>>
File: ComfyUI_00836_.png (1.67 MB, 1280x1024)
1.67 MB
1.67 MB PNG
>>
>>102920054
Weird because you're an sdg faggot that prefers to be here for some reason.
>>
>>102920058
yeah
>>
>>102920065
"some reason"... you know full well what his name is
>>
File: file.png (209 KB, 746x512)
209 KB
209 KB PNG
>>102920067
I was just joking, and you were supposed to call me a nigger faggot, now I feel bad saying it, sorry anon
>>
has anyone done a direct comparison to sigma? to see how much it improved if at all?
>>
File: file.png (1.15 MB, 1248x896)
1.15 MB
1.15 MB PNG
The issue is technology has been infested with tech illiterate retards that wouldn't be here if smartphones didn't have web browsers. Something about AI brings the 90 IQ retards.
>>
>>102920088
>The issue is technology has been infested with tech illiterate retards that wouldn't be here if smartphones didn't have web browsers.
sana was literally made so that it could be run on a smartphone, it's written on their paper and I hate every line of it
>>
>>102920087
Absolutely no point in doing it until the weights are dropped, every couple seconds there's another H100 batch size 1024 being dropped on it.
>>
>>102920096
And? It's still a prototype, it's like going to alpha-tech trade show showing a prototype 1-inch OLED and saying "duurrrr dat is too small"
>>
File: file.png (1.24 MB, 1152x896)
1.24 MB
1.24 MB PNG
>>102920038
>>
>>102920123
>And? It's still a prototype
the classic "prototype cope"
>Hurdur, Pixart Sigma is a prototype, the next iteration will destroy everything
>next iteration (sama) comes
>looks like shit
>well... it's a prototype duh! It's not like we were waiting for this or something, 2 weeks anon!
of course with this method I'll never win right?
>>
>>102920104
so true, i think ive heard the researchers drop another turd on it
>>
>>102920125
My sources are people close to the sana team. I can't reveal their names or exact positions because NvLabs may pull funding.
>>
File: file.png (1.29 MB, 1248x896)
1.29 MB
1.29 MB PNG
>game developer posts pre-alpha footage for nerds and techies
>retard gamer with smart phone: durr da game has bad grafix
>>
>>102920151
Me too I have sources, and they said that they won't do anymore models after sana, it's over. I can't reveal their names or exact positions but trust me bro it's true
>>
File: file.png (891 KB, 1248x896)
891 KB
891 KB PNG
I do think Gemma is too retarded for prompt rewriting.
>>
>>102920153
did you seriously made an analogy between image models and video games? Can't believe you're this retarded, image models are all about graphics, if it looks like shit no one will give a fuck, that's the only goal of an image model, to produce good looking pictures and accurate to your prompts
>>
It's funny because any criticism of Flux is met with "but dey gave it tou you fer FREE!"
You got Sana, for free. Calm down.
>>
File: file.png (1.01 MB, 1248x896)
1.01 MB
1.01 MB PNG
>>102920184
did you seriously make an analogy between video games and image models? Can't believe you're this retarded, video games are all about graphics, if it looks like shit no one will give a fuck, that's the only goal for video games, to produce good looking graphics.
>>
>>102920166
Don't come crawling back when Sana 2 releases. Aparantly it will be ready in a month or even weeks.
>>
>>102920211
>Aparantly it will be ready in a month or even weeks.
2 weeks?
>>
>>102920211
I heard maybe even two weeks. They had a breakthrough in quantum shrinkflation and after the lead researcher got shrinkflated he was able to design a hyperbolic time training algorithm, I heard it from the engineers
>>
>>102920211
>Don't come crawling back when Sana 2 releases.
took them 5 month to make this piece of turd that is sana, and you're expecting us to believe they'll make a better model in a month?
>>
>>102920220
I can't give further details but I can see it will be at least more than a week. I'm already risking my source's identity from what I've revealed so far
>>
>>102920232
oh my god he's retarded
>>
>>102920241
I accept your concession.
>>
>>102920245
(we're all roleplaying anon)
>>
File: file.png (956 KB, 1248x896)
956 KB
956 KB PNG
>>
File: file.png (250 KB, 1024x1408)
250 KB
250 KB PNG
>>
>>102920240
Oh dear sana (((alledged))) employee anon, why do sana 1 looks so bad?
>>
at least black forrest labs had the balls to send a peon into the thread to answer questions
chang where are you?
>>
>>102920269
no you dont understand hes been here the whole time, be careful to not speak badly of the model or you will summon him
>>
File: file.png (1.31 MB, 1248x896)
1.31 MB
1.31 MB PNG
>>
>>102919771
>demo saves as webp
>>
File: file.png (1.74 MB, 1248x896)
1.74 MB
1.74 MB PNG
>>
File: file.png (741 KB, 1024x1408)
741 KB
741 KB PNG
>>
File: image.jpg (286 KB, 1024x1024)
286 KB
286 KB JPG
Sana is the best!
>>
File: file.png (1.55 MB, 768x1280)
1.55 MB
1.55 MB PNG
>>
>>102920371
turn PAG guidance down pls
>>
>>102920362
True anon, Sanoa iis the Hhe Sanao the but gest
>>
Rip. Application is busy.
>>
can it gen sanna marin?
>>
>>102920375
what's the difference between PAG guidance and CFG? why isn't it working with just CFG like every normal models?
>>
>>102920442
i remember when PAG came out but can't recall what its purpose is. i think it looks bad desu.
>>
File: file.png (790 KB, 1024x1024)
790 KB
790 KB PNG
trying out old sigma prompts on sana, i plan on killing myself soon
>a cute, chubby little raccoon in a mystical forest full of glowing creatures and fauna, the image is in a low poly style
>>
>>102920462
Base sigma could do lowpoly?
>>
>>102920462
>i plan on killing myself soon
why? because it looks worse than old sigma?
>>
>>102920468
Without seeing the Gemma prompt Gemma could've gobbled up the low poly style part of the prompt.
>>
File: file.png (1.17 MB, 1024x1024)
1.17 MB
1.17 MB PNG
>>102920462
for comparison this is what sigma gave
>>102920468
>>102920470
sigma was pure soul in safetensor form
>>
File: file.png (1.05 MB, 1024x1024)
1.05 MB
1.05 MB PNG
>>
>>102920462
>>102920474
yikes, it looks way worse than its predecessor, how could've the fucked it up this bad?
>>
i like to cope and think that maybe, just maybe, sana is not part of the pixart family of models
>>
>single flux gen ITT
>>
File: file.png (1.44 MB, 1024x1024)
1.44 MB
1.44 MB PNG
>>
File: 1704406258862936.png (967 KB, 1024x1024)
967 KB
967 KB PNG
it can do low poly but the prompt adherence is awful
>low poly style render of an old rusted robot wearing pants and a jacket riding skis in a supermarket
>>
THIS IS WHY YOU DONT USE SYNTHETIC MIDJOURNEY IMAGES YOU STUPID FUCKING CHINKS
WHY IS LOCAL SO FUCKING INCOMPETENT
every fucking model since sdxl has been trained on dogshit synthetic data, we could've had local midjourney or dalle already if these faggot bakers didn't cuck their shit.
>>
>>102920531
From my basic tests the Gemma 2 prompt expander is very ass, it will change your prompt for the worst and miss the primary intent and style.
>>
>>102920503
no wonder they just released the demo, they knew it was shit enough to be clowned on, I hope they'll improve it now, if you're reading this sana employee, get back to work!
>>
File: file.png (1.9 MB, 1024x1024)
1.9 MB
1.9 MB PNG
>>102920532
amen
>>
File: file.png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
4chang detects my post as spam if i try to post the prompt
>sana
https://pastebin.com/5CcFUbGh
>>
>>102920562
>pixart sigma output
>>
I think it's easily worse than SD 1.5
>>
File: 1729546641457839.jpg (1.45 MB, 2120x2488)
1.45 MB
1.45 MB JPG
Let's talk about CtrLoRA again.

https://github.com/comfyanonymous/ComfyUI/issues/5314
https://github.com/xyfJASON/ctrlora

>ControlNet (Zhang et al., 2023) adds an extra network that accepts a condition image, turning a T2I
model into an image-to-image (I2I) model. In this manner, ControlNet is able to generate images
according to a specific kind of condition image such as canny edge, significantly improving the
controllability. However, for each condition type, an independent ControlNet needs to be trained
from scratch with a large amount of data and computational resources. For example, the ControlNet
conditioned on canny edge is trained on 3 million images for around 600 A100 GPU hours.
>To address this problem we propose a CtrLoRA frame-
work that allows users to conveniently and efficiently establish a ControlNet for a customized type
of condition image. As illustrated in Fig. 2(a), we first train a Base ControlNet on a large-scale
dataset across multiple base condition-to-image tasks such as canny-to-image, depth-to-image, and
skeleton-to-image, where the network parameters are shared by all these base conditions. Mean-
while, for each base condition, we add a condition-specific LoRA to the Base ControlNet. In this
manner, the condition-specific LoRAs capture the unique characteristics of the corresponding condi-
tions, allowing the Base ControlNet to focus on learning the common knowledge of image-to-image
(I2I) generation from multiple conditions simultaneously. With our framework, in most scenarios, we can learn a customized
type of condition with as few as 1,000 training data and less than one hour of training on a single
GPU. Moreover, our method requires only 37 million LoRA parameters per new condition, a sig-
nificant reduction compared to the 361 million parameters required by the original ControlNet for
each condition.
>>
>>102920586
I don't think you realize how fucking bad SD 1.5 is. Let's get those rose tinted glasses off buddy.
>>
>>102920596
>I don't think you realize how fucking bad SD 1.5 is.
this, I played with base SD1.5 a month ago, it was horrible, we really improved our shit since then
>>
File: 1714875072433212.jpg (1.85 MB, 2054x4106)
1.85 MB
1.85 MB JPG
>low poly render of a man wearing glasses with a sign that says "IT'S OVER"
> Image Style: 3D Model
>>
File: file.png (569 KB, 1024x1024)
569 KB
569 KB PNG
>>102920609
>hatsune miku with her tits out
>>
>>102920587
>Let's talk about CtrLoRA again.
there's a model that'll work on flux?
>>
File: IMG-20241022-WA0009.jpg (135 KB, 1280x1280)
135 KB
135 KB JPG
Meta seems to be the best text to image Gen AI for me and it works best in WhatsApp. Even Instagram one sucks despite sharing same Llama version.

Is there any Android app that let's me create unlimited images everyday for free like WhatsApp does?
>>
>>102920630
this is a local thread anon
>>
File: 1704855265718923.jpg (1.42 MB, 2054x4106)
1.42 MB
1.42 MB JPG
>low poly render of a man wearing glasses with a sign that says "IT'S OVER"
> Image Style: (No style)
>>
>>102920639
they took away it's soul... give it back.... GIVE IT BAAAAAAAAACKKKKKKK!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>102920633
My bad, I thought Llama based txt to img would be considered on topic considering it's open source
>>
nvidia will pay
>>
>>102920653
not only it removed it soul, but it looks worse, Flux doesn't have much soul but at least the images look consistenly good
>>
>>102920657
>it's open source
aren't. meta doesn't release their image models unfortunately
>>
>>102920623
Seems like not
>5. CONCLUSION AND LIMITATIONS
>We speculate this issue might originate from the capabilities of the network architectures, specifically the architectures of VAE, UNet-based Stable Diffusion, and ControlNet. To enhance the capabilities of our framework, it is worth developing our CtrLoRA using more advanced DiT-based (Peebles & Xie, 2023) backbones such as Stable Diffusion V3 (Esser et al., 2024) and Flux.1, which we leave for future work.
>>
>>102920671
>meta doesn't release their image models unfortunately
and their video model (that one looks amazing, goddam I hate it)
>>
File: file.png (368 KB, 500x500)
368 KB
368 KB PNG
>>102920678
ok so that's a nothingburger
>>
File: 1710118777278704.jpg (1.18 MB, 2054x4106)
1.18 MB
1.18 MB JPG
>low poly render of hatsune miku
>>
>>102920692
Did you not read
>which we leave for future work.
?
Time to start putting those mikus to use and port it to flux
>>
File: file.png (1.32 MB, 1024x1024)
1.32 MB
1.32 MB PNG
I think they're overly obsessed with using numbers to guide their training and just like overly using aesthetics scores, using scoring to determine prompt adherence probably obliterates concepts.
>>
What is better for genning: a 4060 Ti 16GB VRAM card or a 4070 12GB VRAM card?
>>
>>102920704
not my problem, if they want to prove it work on flux, they have to do it
>>
File: 1726397625582719.jpg (1.07 MB, 2054x4106)
1.07 MB
1.07 MB JPG
>low poly render of donald trump
>>
>>102920716
Lmao not going to happen because like everything else, you need an H100. No one is going to train even Control Net for Flux without a hefty grant or access to H100s laying around.
>>
>>102920725
that's why I said it's a nothingburger
>>
File: ComfyUI_00845_.png (1.06 MB, 1280x1024)
1.06 MB
1.06 MB PNG
>>
File: 1705521549275545.jpg (1.22 MB, 2054x4106)
1.22 MB
1.22 MB JPG
>low poly render of the solar system
>>
File: file.png (1.44 MB, 1024x1024)
1.44 MB
1.44 MB PNG
>>
>>102920562
nice
>>
File: file.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
this one isn't too bad, i feel like it beats sigma here
>sana
https://pastebin.com/F7mZzSvB
>>
File: 1700954526636367.jpg (1.47 MB, 2054x4106)
1.47 MB
1.47 MB JPG
>nvidia geforce rtx 5090 gpu
>>
File: file.png (1.55 MB, 1024x1024)
1.55 MB
1.55 MB PNG
>>102920759
sigma
>>
File: file.png (1.05 MB, 1024x1408)
1.05 MB
1.05 MB PNG
>>
>>102920759
hmmm... on second thought... no visible brushstrokes and the hands are worse...
>>
>>102920781
Aesthetics are easy to fix.
>>
File: file.png (761 KB, 1024x1024)
761 KB
761 KB PNG
>>102920791
i zope so, sigma had some of the best aesthetics for a local model back then
>sana
>a candle that looks like a cute cat
>>
File: file.png (1.17 MB, 1024x1024)
1.17 MB
1.17 MB PNG
>>102920807
I think they did a worse job with their captioning.
>>
>This application is currently busy. Please try again.
STOP HOGGING IT ANON
>>
File: file.png (1.03 MB, 1024x1024)
1.03 MB
1.03 MB PNG
>>102920807
>pixart soulma
>>
>>102920820
It's cute they let us use their office 4090.
>>
demo queue has been stuck on 1 guy for a while... did he request 8 billion steps or something
>>
>>102920856
>their office 4090
I hope that's not a 4090, they claimed it would be ultra fast to make a picture, but when I'm on the top of the queue and the generation is starting, it takes more than a mn
>>
>>102920864
they sort though naughty prompts by hand and are confused by mine
>>
>>102920867
No it doesn't. It's like 2 seconds. Actually watch the queue, it's a couple of seconds when it's processing your prompt.

>>102920864
4K with a negative prompt takes a bit but I think it might have crashed or something.
>>
File: file.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
Definitely has Warhammer 40K in the dataset.
>>
>>102920791
lol no they're not. aesthetics/style are a key part of comprehension. which is why fluxjeets can't even do this simple midjourney prompt despite being able to do complex text on signs
https://www.reddit.com/r/StableDiffusion/comments/1g6q1x3/whats_the_process_to_create_this/
the flux results look like dogshit in comparison, a fundamental misunderstanding of aesthetic construction thanks to butchered training data.
>>
File: file.png (1.65 MB, 1024x1024)
1.65 MB
1.65 MB PNG
>>102920867
takes like 2 seconds for each gen, it's just that there's alot of people in queue
>sana
https://pastebin.com/8YbwfDkX
>>
>>102920890
I know you're stupid, but the reason why this problem comes up is because AI captioning is extremely bad at using style keywords in the prompts. The only way to avoid this problem is hand written prompts or including meta information.
>>
>>102920902
>I know you're stupid
projection
>>
Sorry anon I've been prompting "1girl" with 40 steps on random seeds this whole time. I'll stop.
>>
File: file.png (2.04 MB, 1024x1024)
2.04 MB
2.04 MB PNG
>>102920900
>pixart sigma
>>
>>102920913
I just explained to you why it happens. If you did anything productive with your time you would've known this yourself. You bitch about training but never have you captioned 100k images.
>>
File: file.png (2.37 MB, 1024x1408)
2.37 MB
2.37 MB PNG
>>
>>102920900
>takes like 2 seconds for each gen
I fucking hate the current era we're in, or else we got a giant model (Flux) that takes minutes for a single image, or else we got small little shit that produce turds in 2 seconds, why are they so weary on going for the sweet spot? Something big but not too big
>>
File: file.png (91 KB, 793x729)
91 KB
91 KB PNG
>we will try our best to...
Fuck
>>
>>102920932
im assuming they plan on using sana as a base for something else like a video model or whatever, explains why they minmaxxed speed and efficiency so much
>>
>>102920939
>Model zoo
that's the model weights right?
>>
>>102920932
Sounds simple enough anon, I'm glad you're spearheading this. Oh wait, you want other people to spend thousands of dollars so you can call their work a turd.
>>
>>102920948
>I'm glad you're spearheading this.
thanks anon, it sure needs to be talked about
>>
File: file.png (849 KB, 1024x1024)
849 KB
849 KB PNG
>sana
>origami figure of a cute girl with cyan hair and long twintails, the girl's name is hatsune miku
sana still has a bit of soul remaining
>>
>>102920944
Yes that would be every model. The demo is the 1024px model. But I'd expect there to be a 512, 1024, 2K and 4K model. For Pixart they spent the most time on the 2K model.
>>
File: file.png (1.27 MB, 1024x1024)
1.27 MB
1.27 MB PNG
>>102920963
sigmo
>>
File: file.png (1.92 MB, 1024x1024)
1.92 MB
1.92 MB PNG
For the record this is pretty aligned with Gustav Klimt's work.
>>
>>102920986
why's she looking at me like that?
>>
>>102920932
>why are they so weary on going for the sweet spot? Something big but not too big
SAI wanted to make SD3 4b for the sweet spot but it got canceled for (((whatever))) reason
>>
File: file.png (2 MB, 1024x1024)
2 MB
2 MB PNG
>sana
>a line of pill shaped buses with hatsune miku's face on it in new york city, honk honk
it's not too bad, i feel like sana's salvageable
>>
File: file.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
>>102921012
The big question is if it delivers on the trainability aspect.
>>
does anyone have the flux masterchief prompt?
>>
>>102920741
What is this?
>>
File: file.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
Knows Master Chief
>>
>>102921041
>does anyone have the flux masterchief prompt?
I have
>Photo of Criminal in a ski mask making a phone call in front of a store. There is caption on the bottom of the image: "It's time to Counter the Strike...". There is a red arrow pointing towards the caption. The reda arrow is from a Red circle which has an image of Halo Master Chief in it.
>>
>>102921058
the single best flux image. no others come close.
>>
File: file.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
>a painting by leonardo davinci of a pregnant jesus lovingly carrasing his belly, a speech bubble above him has a blurry screenshot of donald trump badly photoshopped onto it
3rd try, not what i asked for but eehhhehrhehhh
>>102921058
thank you anon
>>
File: file.png (1.32 MB, 1024x1024)
1.32 MB
1.32 MB PNG
>Photo of Criminal in a ski mask making a phone call in front of a store. There is caption on the bottom of the image: "It's time to Counter the Strike...". There is a red arrow pointing towards the caption. The reda arrow is from a Red circle which has an image of Halo Master Chief in it.
>>
File: file.png (1.27 MB, 768x1248)
1.27 MB
1.27 MB PNG
>>102921062
It's impressive as a concept but practically isn't how any prompts or want to prompt. Especially given how most people just 1girl prompt.
>>
File: ComfyUI_00007_.png (1013 KB, 1024x1024)
1013 KB
1013 KB PNG
>>102921062
>the single best flux image. no others come close.
it was a way too much sophisticated prompt to be understood the very first day of Flux's release, it was probably made by some BFL employee, and it sure did had the wow effect he was expecting, it's such a good prompt to show the strengths of Flux
>>
File: collage.jpg (1.55 MB, 2163x2000)
1.55 MB
1.55 MB JPG
This is what the collage should have been. Bakers shouldn't be allowed to be allergic to 1girl.
>>
>>102921073
how come yours was much better?
>>
File: file.png (21 KB, 622x281)
21 KB
21 KB PNG
>>102921081
I'm sure we haven't discovered the optimal settings
>>
>>102921075
>it was probably made by some BFL employee,
I've been wondering, for awhile now, about how many of those really good early Flux images were BFL employees using Pro.
>It's time to counter the strike
>Zoom call
>powerpoint presentation
etc...
>>
>>102921092
if it was only pro pictures it wouldn't have the impact it had, it was such a big deal because we were able to replicate those with dev aswell
>>
>>102921092
I mean I used their prompt and still got an amazing result. It's a good prompt, Flux is a good model, it's just impossible to train and BFL has ghosted us. I mean I guess we have a year to solve the Flux or SD3 problem until the next massive model comes. The best thing Flux will do is make a target and something we often see in tech is people like to smash targets.
>>
>>102921109
>it's just impossible to train
with the undistilled models we have now, not anymore, but yeah still hard because it's a big ass motherfucker
>>
>>102921113
The undistlled models are as alpha as Sana.
>>
File: file.png (396 KB, 1024x1024)
396 KB
396 KB PNG
>blurry cctv footage of donald trumpy menacingly floating in the night sky, full moon behind his back
i remember having to fiddle alot more to get this image on flux, i feel like sana 'gets' what i'm going for more
>>
>>102921116
absolutely not, dev dedistill has the same quality of vanilla dev and has all the guidance removed, so it's ready to be trained on, and someone is already up to the task
https://huggingface.co/SG161222/Verus_Vision_1.0b
>>
File: file.png (1.72 MB, 1280x864)
1.72 MB
1.72 MB PNG
>>
You said Sana is shit but yet the demo queue is fucking long
>>
File: file.png (2.8 MB, 1568x1024)
2.8 MB
2.8 MB PNG
>>
File: file.png (2.14 MB, 1280x864)
2.14 MB
2.14 MB PNG
>>102921137
There's some gold to dig in there
>>
>>102921137
it's probably someone trying to get past the nsfw heart image filter and gen some sana tits
>>
>>102921137
it's like watching a car crash, it's horrible but a lot of people are gathering around to see the damage done
>>
>>102921155
Is there a reason you don't post images, are you poor or something?
>>
File: file.png (2.92 MB, 1568x1024)
2.92 MB
2.92 MB PNG
>>
>>102921165
>he says, while not posting an image
>>
File: file.png (2.06 MB, 1280x864)
2.06 MB
2.06 MB PNG
>he never posts images
>>
STOP HOGGING THE DEMO, FUCK OFF!!
>>
File: file.png (967 KB, 1000x822)
967 KB
967 KB PNG
>>102921175
good goy, I asked you to post an image and you did!
>>
What are the best 1.5 models at this point? I never moved on from yuzu.
>>
>>102921214
sana
>>
The more I use Sana the less I hate it.
>>
>>102921269
>The more I use Sana the less I hate it.
Show us some pictures that made you love Sana more
>>
File: file.png (1.08 MB, 1024x1024)
1.08 MB
1.08 MB PNG
>>102921269
yeah i feel the same, it's salvageable
>an anime screenshot of a wide open field, a gargantuan celestial anime girl towers up into the sky, to the left is a bright blue sky and to the right of the girl is a starry night sky, which she wears like a cape
didn't get what i wanted exactly but that's more of a bad prompt issue
>>
>>102921278
No you're a contrarian faggot, but it would funny if I posted some Flux images and had you pretend you thought they were shit.
>>
>>102921291
>No
Concession Accepted.
>>
>>102921269
im slowly losing interesting in using the demo. gimmie local damnit!
>>
CFG vs PAG... what's that about?
>>
File: image.jpg (335 KB, 1024x1024)
335 KB
335 KB JPG
My first Sana gen. I'll have to tweak some settings. Too early to tell whether this is promising or not.
>>
File: file.png (2.45 MB, 1568x1024)
2.45 MB
2.45 MB PNG
>>102921269
>the less I hate it.
I never hated it nor did I love it. It simply is, and I simply am.
>>
>>102921319
i know pag is a new thing that helps sd1.5 with anatomy and image coherence, haven't used it myself
>>102921329
try this anon's settings >>102921086
>>
File: file.png (1.03 MB, 1280x864)
1.03 MB
1.03 MB PNG
With 1.6B parameters the first thing to do is split the extreme styles apart into separate models.
>>
File: file.png (1.69 MB, 1024x1024)
1.69 MB
1.69 MB PNG
>celestial princess hatsune miku, her face is replaced by a spiraling galaxy, armpit hair
prompt understanding can be a bit hit or miss at times, or maybe it's because im esl
>>
>>102921358
I doubt "replaced with" shows up much if at all in the captions. And fetishes like "armpit hair" is never in the captions.
>>
>>102921377
>And fetishes like "armpit hair" is never in the captions.
i'm killing myself
>>
File: file.png (3.47 MB, 1568x1024)
3.47 MB
3.47 MB PNG
>>
File: file.png (1.4 MB, 1024x1024)
1.4 MB
1.4 MB PNG
>sailor moon eating the moon
>>
>>102921397
I wonder if it knows more characters than Migu and Sailor Moon kek
>>
>>102921404
It's the same basic cast that Flux knows.
>>
>>102921397
delicious lunar crisp
>>
>>102921404
we could know if some demon wasn't hoarding the demo doing 4k batches. LEAVE CHANG'S OFFICE GPU ALONE!
>>
File: file.png (2.44 MB, 1280x864)
2.44 MB
2.44 MB PNG
>>
>>102921319
CFG - model adherence to original images
PAG - I could give the real definition, but I would rather describe it as the amount of ritalin the model does. Occasionally something brilliant will come out of it. Usually, it will screw it up if your does is too high.
>>
File: file.png (1.18 MB, 1024x1024)
1.18 MB
1.18 MB PNG
>sailor moon eating the moon, armpit hair
not sure if that's a tooth brush or a strange leek
>>
>>102921423
Honestly shocked the AE manages that fine of details. Maybe it can be saved.
>>
>>102921436
clearly Luna's femur
>>
likely to be non commercial license OH NONONO
https://github.com/NVlabs/Sana/commit/7d32332055abbcacc97d00918d43eabe0af950f9#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R13
>>
File: file.png (1.15 MB, 1568x1024)
1.15 MB
1.15 MB PNG
>>
File: file.png (16 KB, 570x243)
16 KB
16 KB PNG
it's coming bbbbbbbbs
>>
>>102921475
LMAOOOOOOO, it's fucking DOA, Schnell has an apache 2.0 licence and it's way better than this small piece of shit
>>
>>102921443
It's nice to have an alternative model with high channel VAE
>>
i guess we'll be stuck with sdxl forever huh
>>
File: file.png (256 KB, 1024x1344)
256 KB
256 KB PNG
style wise it feels like 1.5 in a good way
>>
>>102921494
License doesn't matter when training uses consumer hardware.
>>
>>102921622
we already have SDXL for that
>>
File: image (8).jpg (87 KB, 512x512)
87 KB
87 KB JPG
Sana does not work at 512x512, can confirm. Does anyone know their actual buckets?
>>
>>102921670
Looks fine to me, anon
>>
>>102921677
kek
>>
File: file.png (947 KB, 1024x1024)
947 KB
947 KB PNG
remember omigen? they have a demo out now, github also says they plan on releasing the model. china save us from china?
>https://github.com/VectorSpaceLab/OmniGen
>https://arxiv.org/abs/2409.11340
demo
>https://huggingface.co/spaces/Shitao/OmniGen
prompt
>a cute cat holding a sign saying "china hello china cheeenaaaa lalalala", ultra high definition
>>
File: actually shit.png (1.27 MB, 1024x1168)
1.27 MB
1.27 MB PNG
>>102921687
Get this non pixanasexual shit out of here
>>
File: file.png (448 KB, 1024x1344)
448 KB
448 KB PNG
>>
File: file.png (1.71 MB, 1024x1024)
1.71 MB
1.71 MB PNG
>>102921687
ehh...
>>
>>102921723
why did you make miku a n*gger?
>>
>>102921708
the sana-samas have failed us, the age of the pixart sexual is over. it's the dawn of the planet of the omnigenders
>>
>>102921743
why not?
>>
>>102921755
that's not a fucking answer bitch
>>
>>102921762
it is nigger
>>
File: 1720618472117538.png (1.88 MB, 1152x896)
1.88 MB
1.88 MB PNG
trying to recreate Breezewood, Pennsylvania
>>
>>102921775
is that sana?
>>
>>102921778
Flux
>>
>>102921783
o
>>
File: file.png (473 KB, 1024x1344)
473 KB
473 KB PNG
>>
>>102921073
>but practically isn't how any prompts or want to prompt
It's short, simple, and to the point. What do you mean?
>>
File: file.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
>>102921687
kek
>>
>>102921847
yep... it's china time. i just hope the thing isn't so damn slow locally though
>>
File: image (10).jpg (342 KB, 768x1280)
342 KB
342 KB JPG
I don't understand. This is 1.01 cfg, 1.01 pag. Why does it look like I didn't touch the settings at all?
>>
>>102921687
>they have a demo out now, github also says they plan on releasing the model.
so that's the new meta now? releasing the demo before the model? I mean it makes sense but I hate it being teased like that kek
>>
>>102921891
what's your prompt
>>
>>102921847
looks better than Sana, maybe this shit is the real deal
>>
>>102921904
you funny guy
>>
File: image.jpg (170 KB, 1024x1024)
170 KB
170 KB JPG
>>102921687
>miku holding a sign that says "omnigen > sana"
uhhh sanabros? omnigen bros? what does this means? And now I'm out of credits to I can't try other variations
>>
>>102921904
there's also that whole built in controlnet thing but it keeps erroring out when i try it in the demo
>>
>>102921916
china promotes equality
>>
File: 1700159250618347.png (2.01 MB, 1152x896)
2.01 MB
2.01 MB PNG
should I just train a lora? not sure if I can with my 8GB of vram
>>
>ani is making a sdcpp gui
how do we get him over here to be our guy?
>>
>demo erroring out
It's over
>>
>>102921959
which one? the sana one or the omnigen one?
>>
>>102921938
source?
>>
>>102921968
>>102921169
>>
File: file.png (595 KB, 1024x1024)
595 KB
595 KB PNG
>blurry cctv footage of donald trumpy menacingly floating in the night sky, full moon behind his back
images look alot cooler at cfg 1.1 and pag 1.1
>>
>>102921966
sana
>>
>two new models and a new gui coming
we eating good /ldg/
>>
File: file.png (684 KB, 1024x1344)
684 KB
684 KB PNG
>>102921974
>cfg 1.1 and pag 1.1
yep. CFG can go up to about three as well. the only reason for a higher PAG is if you're doing text.
>>
>>102921893
It could be to garner free publicity. If they gain enough attention and some investor believes in the potential evolution of their alpha version then they will release nothing to the open source community and shift the project towards a SAAS by scaling, optimizing, tuning, etc. If they gain no publicity and no investment then the alpha will be released as a last resort, again, to get free publicity.
>>
>>102921966
Both
>>
so what now?
>>
>>102922034
we gen
>>
>>102922038
im depressed
>>
>>102922044
its okay anon im here for you
>>
>>102922049
*sob* *sob... uwaaaaahhhhhh... *hic*
>>
>>102922034
Back to 1girl for you. Meanwhile I will go back to my drew threads and when that's done I will go back to jacking off to /v/ butt threads while you play pretend with the local models. I only show up for big events like flux, sana etc.
>>
File: file.png (996 KB, 1024x1344)
996 KB
996 KB PNG
>>
>>102922060
thanks for stopping by
>>
>>102921938
about a half year ago when he found out that C++ is hard. If he hasn't released he won't.
>>
File: file.png (892 KB, 1024x1344)
892 KB
892 KB PNG
>>
https://huggingface.co/rhymes-ai/Allegro

babe wake up new text to video model just dropped
>>
>>102922188
>Single GPU Memory Usage 9.3G BF16 (with cpu_offload)
>check downloads folder
text encoder is 19GB
>>
>>102922217
>text encoder is 19GB
"architectures": [
"T5EncoderModel"
It's the classic T5_XXL, so we'll only be using the encoder part, which is roughlly 9.2gb of vram
>>
File: file.png (2.73 MB, 1730x1170)
2.73 MB
2.73 MB PNG
>>102922188
>Apache 2.0
>6-second videos at 15 FPS with 720x1280 resolution
>175M parameter VideoVAE and a 2.8B parameter VideoDiT model
Pretty nice, I can feel it's a new local sota
>>
New

>>102922252
>>102922252
>>102922252



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.