[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Settings Mobile Home
/g/ - Technology

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

File: tmp.jpg (1.28 MB, 3264x3264)
1.28 MB
1.28 MB JPG
General dedicated to creative use of free and open source text-to-image models

Previous /ldg/ bread : >>101301739

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
ComfyUI: https://github.com/comfyanonymous/ComfyUI

>Auto1111 forks
SD.Next: https://github.com/vladmandic/automatic
Anapnoe UX: https://github.com/anapnoe/stable-diffusion-webui-ux

Nodes: https://github.com/kijai/ComfyUI-KwaiKolorsWrapper

>Pixart Sigma & Hunyuan DIT
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Use a VAE if your images look washed out

>Models, LoRAs & training


>Index of guides and other tools

>View and submit GPU performance data

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Share image prompt info

>Related boards
Blessed thread
official pixart bigma and lumina 2 waiting room, now with kolors
File: 1690080318567.jpg (823 KB, 2048x1024)
823 KB
823 KB JPG
Pretty damn accurate to prompt, and as far as character traits go honestly exactly how I would have rendered this as a "photorealistic", bing's text-to-style advantage still hard to comp though
oh no i typoed that image text haha
me so chinee
Damn, dall-e 3 got sovl
I tried that infamous prompt with kolors, it's cheating a bit by hiding hands and only showing a portrait, but it's not cursed, best result so far of any base model
File: 1693783440149.jpg (224 KB, 1024x1024)
224 KB
224 KB JPG
yeah it does
i'm pretty much just longing for the day I can recreate all the things it's given me locally
How long ago did the Kolors team start their project? Feels odd to not release a DiT model desu.
adding the prompt as text would be nice
>i'm pretty much just longing for the day I can recreate all the things it's given me locally
I mean, you could train a lora on it's output. I guess someone might've even done that for you already.
File: PA_0025.jpg (642 KB, 2560x1536)
642 KB
642 KB JPG
yeah, I'm really impressed by the anatomy, it's not perfect but for a base model it destroys everything that existed so far
that's awesome. could i have to prompt if you don't mind?
I've been thinking I should get into experimenting with that.

pastoral fantasy cottage,Fierce woman napping getting a tan,pale gold hair,long straight hair,blunt bangs,straight bangs,hime haircut,straight fringe,gold eyes,fierce eyes,leonine eyes,narrow face,toned triceps,NOT nude athletic female,fullfigured,curvy,great figure,body positive,ivory skin,bare arms,bare back,mature flat cartoon illustration,imperious,soft fields,rolling hills,tall grass,hedges,wildflowers,white walls,hazy mountains,watercolor painting muted wispy colors
it's the same team that made Kling AI
How did that Bing image not get dogged?
File: file.png (2.1 MB, 1024x1024)
2.1 MB
2.1 MB PNG
the model even added a bit of cleavage in by itself, how immoral!
It can do literally full nudity, unlike the prude SAI christians!
indeed, second try gives even better insight
File: PA_0031.jpg (820 KB, 2560x1536)
820 KB
820 KB JPG

A harbinger resembles a whirling mechanical model of a planetary system. Its sun, moons, and planets are crafted of tiny glowing jewels whose color shifts with the harbinger’s mood. Electric arcs and sparks of magical essence dance between its whirling pieces. The harbinger can dismantle itself into its component parts to squeeze through tiny gaps, its rings dissolving into glowing vapor while its star and planets remain. Harbingers shed light as brightly as a torch under normal conditions, but can dim to a faint glow if desired, and frequently spin off bits of their essence into dancing lights. A harbinger archon’s voice resembles the ringing of a dozen tiny bells, and regardless of the language the archon speaks, it retains that tinkling musical quality. Most harbinger archons form from souls that led simple lives of wisdom and piety before their deaths. They remember nothing of their mortal existences, but retain a decidedly practical, if parochial, view of worldly matters. Only rarely does a harbinger want for a parable or proverb apropos to the task at hand.

File: long dick general.jpg (3.41 MB, 3264x3264)
3.41 MB
3.41 MB JPG
Doesn't it feel good when a company isn't terrified of making a fully uncensored model and ends up making a great base model at the end? The chinks have way more balls than the fucking western cucks that's for sure
never lucky
since this worked so well I tried squatting and the result is fortunately covered in the right places

i always laugh at this guy's funny ass voice
Because their idea is to make it to the market first and stay on top defeating the competition. China must be #1
superseded by chang.
oh wow, long time no see/hear from em'
god I was such a noob back when I watched his stuff
File: 1695843471467.jpg (191 KB, 1024x1024)
191 KB
191 KB JPG
It's a First Month image, I assume that same prompt would get blocked now

I haven't touched bing at all in months, I had set up scripts to hammer it and just autogen images all day long and retry dogs. With enough brute force and delicate promptsmithing you could get it to pop some tiddies out
File: PA_0034.jpg (697 KB, 2560x1536)
697 KB
697 KB JPG
It does and I always said it, if the western cucks don't want to do the job proprely, the chinks will do it at their place and take all the praise, welp, fuck them, now we got a great base model and I hope great finetunes will come out from it
its kooler timeeeee
File: fsedrgfth.png (1.97 MB, 1691x646)
1.97 MB
1.97 MB PNG
i'm sorry pixart sexuals, but I unironically like the red sd3 cow
>it can do a ballpoint sketch of a cow
>great, can it do anything else?
pixart bigma will win you back
File: 351tt50563bd1.png (1.51 MB, 1344x768)
1.51 MB
1.51 MB PNG
Really nice
They can release ChatGPT-4o with Scarlett Johansson voice and name it China#1FUSA as a model and app name and people would download it.

It will take ages for Scarlett to sue China.
Exactly, they don't give a fuck about celebrities fee fees, they just want to make a great product, that's how it should be in the first place, the west definitely has fallen
File: PA_0037.jpg (1.06 MB, 3328x1152)
1.06 MB
1.06 MB JPG
They can't do the opposite where it will shit on China in the same model. Just different laws
you can clown the China's president on the model though, and still, it's way less censored than SAI's model
fuck off poseidon i'm not sacrificing any more bulls to you!!
File: PA_0040.jpg (905 KB, 2560x1536)
905 KB
905 KB JPG
File: kolors1.jpg (388 KB, 1024x1024)
388 KB
388 KB JPG
File: kolors2.jpg (397 KB, 1024x1024)
397 KB
397 KB JPG
File: ComfyUI_KolorsXL_0045.jpg (745 KB, 1792x2304)
745 KB
745 KB JPG
very good, damn
I never thought we would finally find an alternative to SAI but here we go, thank you chinks ;-;
how about that, huh
16ch VAE when?
Okay, this one is impressive
we already have a MiT 16ch VAE, it needs a bit of training to be adapted to Kolors though
i know im just surprised that it seems like only one dude is working on that
bless him
>bless him
yeah, we need more heroes like him, 16ch VAE anon if you read this, thanks a lot for your work
what is a 16ch vae
16 channel VAE
ok what does it do? every time I googled "vae" it says "makes your colors better" but I haven't noticed any difference between using one or not
i just don't get it
its why sd3 (outside everything else) is so detailed and "crisp"
more channels = higher quality output
File: file.png (1.71 MB, 967x1511)
1.71 MB
1.71 MB PNG
I'm a simple layman, but from my understanding VAE is resposible for decoding whatever is being generated. I guess you could say it's kind of like decompressing the size of a file, except here it's decompressing an image's visual detail? More channels could imply it has more neuron layers, meaning more complex decompression/decoding process, or being able to translate more visual detail from smaller amount of data. See >>101313339
it's the thing that turns the latent image (math and shit) into pixels you can see. i think the vae compresses the latent image and more channels mean it's less compressed and therefore more fine details are retained? someone correct me if im wrong
Wake me up when Kolors can run on my system desu
*suffocates you with pillow*
I think latents are the compressed state, and VAE is the decompression process. Picrel is supposedly an example visualization of how SDXL works.
Me too please.
>VAE is the decompression process
oh i see, thank you anon. *suffocates you with a pillow*
suffocate everyone who trains style loras with a trigger word
intredasting, thanks. I've been using sdxl-vae-fp16-fix.safetensors, I'm thinking of ways to finish this sentence but I can't
Protip, finish with a full stop. Simple as.
Is this censored or is it indicative of what I'd get with a local install of kolors? (it's not giving me booba)
it's kinda censored, but easy to break. ssometimes it gives u fucked up looking nips though.
I'm using fixFP16ErrorsSDXLLowerMemoryUse_v10, and I don't even remember where I got it from.
File: ComfyUI_KolorsXL_0093.jpg (720 KB, 1792x2304)
720 KB
720 KB JPG
with Kolors upscale pass instead of XL. I prefer the XL upscale of a Kolors base gen
Can you give an example of someting non-realistic. Anime, cartoon, anything stylized. Whatever you're doing with realism seems to work well, so I wonder if your approach improves on other styles.
>pixart_every person has two lives
but I've not a single one
File: ComfyUI_KolorsXL_0107.jpg (945 KB, 1792x2304)
945 KB
945 KB JPG
there is nothing special im doing though

havent tested too much anime on it but it can do backgrounds it seems?
pretty good
File: file.png (3.48 MB, 1920x2176)
3.48 MB
3.48 MB PNG
looks very NFT pilled and pretentious enough for a cybermonkey collector to buy it
File: ComfyUI_KolorsXL_0113.jpg (602 KB, 1792x2304)
602 KB
602 KB JPG
>an illustration of Son Goku
File: ComfyUI_KolorsXL_0115.jpg (770 KB, 1792x2304)
770 KB
770 KB JPG
>an illustration of Mnokey D Luffy
File: tmpdiec36pt.png (2.93 MB, 4032x932)
2.93 MB
2.93 MB PNG
File: file.jpg (1.03 MB, 1920x2176)
1.03 MB
1.03 MB JPG
precisely what im going for, nice
will likely switch up styles soon tho
File: tmpdiec36pt.png (2.8 MB, 2352x2304)
2.8 MB
2.8 MB PNG
File: tmpdiec36pt.png (2.54 MB, 2352x2304)
2.54 MB
2.54 MB PNG
>1girl and a girl are retarded
>girl is fine
this amuses me
File: tmpdiec36pt.png (3.17 MB, 2352x2304)
3.17 MB
3.17 MB PNG
File: file.png (3.2 MB, 1920x2176)
3.2 MB
3.2 MB PNG
a guy propositioned me once to mint a gen but it seemed like a scam
not minting as a whole but his specific thing
interesting find i would say even the lens flare looks better as well
File: ComfyUI_temp_auxcf_00006_.png (2.23 MB, 1120x1472)
2.23 MB
2.23 MB PNG
testing kolors
File: ComfyUI_temp_auxcf_00009_.png (1.95 MB, 1120x1472)
1.95 MB
1.95 MB PNG
What a weird mix of Kid Goku and Goku.
File: ComfyUI_temp_auxcf_00020_.png (2.08 MB, 1120x1472)
2.08 MB
2.08 MB PNG
File: file.jpg (629 KB, 1920x2176)
629 KB
629 KB JPG
qrd on kolors? it works in comfy-ui only, right?
File: ComfyUI_temp_islql_00001_.png (2.71 MB, 1120x1472)
2.71 MB
2.71 MB PNG
File: ComfyUI_temp_islql_00006_.png (2.38 MB, 1120x1408)
2.38 MB
2.38 MB PNG
File: ComfyUI_temp_islql_00018_.png (3.64 MB, 1472x1472)
3.64 MB
3.64 MB PNG
File: ComfyUI_temp_islql_00021_.png (3.58 MB, 1472x1472)
3.58 MB
3.58 MB PNG
File: ComfyUI_temp_islql_00023_.png (3.68 MB, 1472x1472)
3.68 MB
3.68 MB PNG
File: image.jpg (292 KB, 1344x768)
292 KB
292 KB JPG
>ancient greek femboy in style of modern anime
kolor does indeed have a closeup bias
File: ComfyUI_temp_islql_00030_.png (3.15 MB, 1280x1536)
3.15 MB
3.15 MB PNG
yeah, A1111 is always 2 years behind
I'm a simple anon, if nta. I'll be satisfied once it runs like forge.
forge is dead no? the guy making the repo said he's focusing on gradio 4 and some shit and that every extension will be broken in consequence
File: file.png (3.34 MB, 1920x2176)
3.34 MB
3.34 MB PNG
How much vram do I need to run kolors, more than SD3?
20gb of vram, that's because it's using the LLM aswell, but that can probably be optimised (put the LLM on the cpu + quantize)
File: kek.jpg (307 KB, 1363x935)
307 KB
307 KB JPG
looks like this guy is trying to fix SD3M or something?
you can try the demo to see how good it is though
>20gb of vram
welp, It's painful being a vramlet ;_;
like I said, it'll be optimised, if you make a 4bit LLM, put that on the cpu and use the 8bit unet, it'll ask probably for 6gb of vram, that would work for you but you need to be patient kek
File: image (2).png (1.99 MB, 1152x1024)
1.99 MB
1.99 MB PNG
doesnt give you many attempts unfortunately
yeah I know, still better than learning how to use spaghetthi Ui shit kek
Rumors say the dev branch of auto is nowhere close to forge performance, so I'm about to check it myself. If I won't be able to comfortably use Pony/SDXL, I might be forced to stick with it anyway, or switch to something else, even though none of the alternatives suit me.

Why is it so fucking hard for ui devs to get inpainting right.
I don't know man... it's so frustrating aswell, if only there was a mix between ComfyUi backend with A1111 frontend, that would be the best of both words
This one is pretty good. What was the prompt?
>ComfyUi backend with A1111 frontend, that would be the best of both words
Currently either Metastable or StableSwarm are our best alternatives in that regard.
yeah I know those ones but that's not exactly A1111 in the frontend, I really don't want to change, the first guy willing to make a real A1111 + ComfyUI will make bank that's for sure
File: tmp27s6obby.png (349 KB, 512x512)
349 KB
349 KB PNG
>Rumors say the dev branch of auto is nowhere close to forge performance
1024x1024 took me ~13s with Forge on 8vram
dev branch A1111 took ~1 minute.. for a 512x512

The VAE is like zipping for latent spaces. You take an image and zip it into latent space and pair it with a caption during training. During inference take a prompt, generate a latent image for that prompt and then unzip it with the VAE.
Adam Mini is my new favorite optimizer, at bfloat16 it's lighter weight compared to 8bit AdamW and it seems to have a more pleasant result.
but is it better than when anon CAME?
CAME is way more heavy especially with the 1.3B Pixart model, it also seems to have similar quality to CAME.
have you compared the loss function for the both of them to be sure CAME and adam mini are equivalent?
I like to eye ball it and use my emotions. I like how Mini trains although I think Sophia is the best but it's very unstable.
And CAME is simply unusable if you talk strictly about performance, Mini can do more than double CAME's batch size.
File: image (3).png (1.74 MB, 1152x1024)
1.74 MB
1.74 MB PNG
File: 00079-16598489.jpg (721 KB, 2432x1664)
721 KB
721 KB JPG
File: PA_0443.jpg (732 KB, 2560x1536)
732 KB
732 KB JPG
File: 0.jpg (257 KB, 1024x1024)
257 KB
257 KB JPG
File: file.png (2.3 MB, 1920x2176)
2.3 MB
2.3 MB PNG
What about Fooocus? It's not abandoned, is it?
File: kolors_00177_.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
How are we feeling about kolors? I'm genuinely undecided.
File: PA_0448.jpg (832 KB, 2560x1536)
832 KB
832 KB JPG
File: kolors_00181_.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
love it
    quantization_code = "RANDOMSHITGO!"

kernels = Kernel(

yep looks great to me can't wait to use it
It's okay, but it's also kind of whatever for as long as it doesn't have proper UI support, finetuning infrastructure and lower spec requirements.
File: PA_0449.jpg (670 KB, 2560x1536)
670 KB
670 KB JPG
Can you elaborate for the retards
>tfw anon mentioned me
File: image7.jpg (222 KB, 1024x1024)
222 KB
222 KB JPG
quite good, it does really good fireballs.
File: kolors_00187_.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
Y'know I'm some what of a retard myself. Some yahoo on internet called it out and I'm his echo.
File: image.jpg (181 KB, 1344x768)
181 KB
181 KB JPG
>glowing wraith made of goo, attractive, pretty, feminine, cute, surrounded by darkness
File: PA_0451.jpg (788 KB, 2560x1536)
788 KB
788 KB JPG
is this one better?
File: file.png (2.07 MB, 1920x2176)
2.07 MB
2.07 MB PNG
so/so but it can make nice images like >>101304716 >>101307281 >>101307693 >>101309054 >>101309754>>101313041 >>101314782 >>101316113
we'll see what happens
both are cool but im an abstract anon myself
>no training code
>no training examples
>no info about dataset
>bullshit china-license
>bad at following prompts
>bloated with china LLM

Idk bros. I'm not that excited.
File: image (1).jpg (168 KB, 1344x768)
168 KB
168 KB JPG
>female wraith made of glowing goo, fullbody artwork, stylized cartoon, dutch angle, from below, standing
Eh. Think I'll just wait for the next Pony release. Then again, it's literally:
>Effective Training of Diffusion Model for PHOTOREALISTIC Text-to-Image Synthesis
Feel you.
You're shooting yourself in the foot by prompting "full body" with a landscape aspect ratio
File: PA_0453.jpg (953 KB, 2560x1536)
953 KB
953 KB JPG
get me out of monster gens. drop me in to abstract ones
But horizontals have some of the most interesting full body compositions. Pony gets it.
File: kolors_00215_.png (1.13 MB, 1216x768)
1.13 MB
1.13 MB PNG
Do you have to include something like "laying" or does pony do it
Why exactly is u-net bad?
DiT is newer therefore better
File: image9.jpg (276 KB, 1024x1024)
276 KB
276 KB JPG
at least it knows some copyrighted characters, i wonder if it knows any popular artists
I think you need to be mega famous to be recognized by the model. I'm pretty sure all of the tagging is done by LLM, so if the LLM doesn't know what megumin is, it won't produce megumin even if it's in the dataset.
It can probably produce old public domain artists like Picasso, but don't expect your favorite furry inflation artist to be in there.
File: file.png (1.66 MB, 1920x2176)
1.66 MB
1.66 MB PNG
Jackie Chan punching Xi Jinping
File: kolors_00221_.png (1.15 MB, 1216x768)
1.15 MB
1.15 MB PNG
File: PA_0457.jpg (868 KB, 2560x1536)
868 KB
868 KB JPG
File: tmpwo3zfng6.png (1.1 MB, 1344x768)
1.1 MB
1.1 MB PNG
>Do you have to include something like "laying" or does pony do it
File: file.png (1.5 MB, 1920x2176)
1.5 MB
1.5 MB PNG
 surfeit, drawing sketch lithograph of a young woman standing in an empty field, 
File: PA_0460.jpg (955 KB, 2560x1536)
955 KB
955 KB JPG
File: PA_0462.jpg (852 KB, 2560x1536)
852 KB
852 KB JPG
young grill in an empty field
heheh nice
No, really. I don't understand what that specific code means.
A code to run "randomshitgo!" At kernel level
File: PA_0467.jpg (524 KB, 2560x1536)
524 KB
524 KB JPG
File: PA_0468.jpg (551 KB, 2560x1536)
551 KB
551 KB JPG
File: PA_0469.jpg (575 KB, 2560x1536)
575 KB
575 KB JPG
File: 0.jpg (358 KB, 1024x1024)
358 KB
358 KB JPG
File: image.png (3.27 MB, 2582x794)
3.27 MB
3.27 MB PNG
File: PA_0536.jpg (843 KB, 2560x1536)
843 KB
843 KB JPG
File: PA_0537.jpg (1.06 MB, 2560x1536)
1.06 MB
1.06 MB JPG
File: PA_0538.jpg (1000 KB, 2560x1536)
1000 KB
1000 KB JPG
>This section presents a cult with a simple if ominous motto:
>“The end of the world isn’t merely at hand—it’s at our fingertips.”
>Agenda: Destroy the world.
>Structure: A loosely organized cult of lunatics.
File: PA_0539.jpg (939 KB, 2560x1536)
939 KB
939 KB JPG
Oops, ignore that one please...
File: file.png (1.24 MB, 2304x1792)
1.24 MB
1.24 MB PNG
I'm guessing you are a fan of one line drawings
so artistic
look at the line
im artist
File: file.png (515 KB, 1152x896)
515 KB
515 KB PNG
>as you can see from the chart behind me, it says i am real artist
File: kolors_00243_.png (1.29 MB, 1216x768)
1.29 MB
1.29 MB PNG
File: kolors_00262_.png (1.36 MB, 1216x768)
1.36 MB
1.36 MB PNG
Do I want to know what it would look like if you went beyond 1?
Thought that was the One Ring for a moment
I've been working on training a niche realistic fetish lora, on a variety of models. I'm just here to say that bigASP has fucking insane variety and "authenticity", for lack of a better term. A basic prompt makes each image look completely different. Different ethnicities, faces, hair, clothing. Sometime it fucks up with anatomy or other things, the model seems a bit undertrained. But when it hits, it looks like an actual amateur photo.

Whatever this guy did, just improve it: even more training images, optimized hyperparameters, longer training. Someone get him some funding.
File: tmpjr3bqlng.png (1.32 MB, 896x1152)
1.32 MB
1.32 MB PNG
Fellas, is it unhealthy to share your drink with your friend?
bigASP the what now? Sdxl?
File: PA_0543.jpg (893 KB, 3328x1152)
893 KB
893 KB JPG
File: PA_0545.jpg (932 KB, 3328x1152)
932 KB
932 KB JPG
File: Kolors_00011.png (1.23 MB, 832x1216)
1.23 MB
1.23 MB PNG
File: file.jpg (848 KB, 1920x2048)
848 KB
848 KB JPG
File: PA_0548.jpg (663 KB, 2560x1536)
663 KB
663 KB JPG
https://civitai.com/models/502468/bigasp-v1 ?
File: 3974805805.jpg (68 KB, 768x768)
68 KB
Since we are somewhat on the topic.

How do you test new models?

Basic prompt
Ksampler Efficient
CFG, STEPS, Scheduler
File: Grid.jpg (879 KB, 5760x2048)
879 KB
879 KB JPG
just restart your modem, you should have a dynamic ip address so you'll get a new one everytime you turn it on
Cool little guy
i desire a local model with the anatomy of kolors and the sovl plus architecture of pixart
File: montage.jpg (2.24 MB, 8294x958)
2.24 MB
2.24 MB JPG

oh god
So basically Pony
>When she's on her period
Let's go even higher and further beyond
File: montage.jpg (1.59 MB, 7680x591)
1.59 MB
1.59 MB JPG
this one is even funnier

> i forgive you <3 !
>was what you thought i was going to say !!!
See >>101319608 up from 0.85
That, dear anon, screams sovl to me.

Lack of sovl in Pony gens is nothing but a skill issue.
no, sorry, i like pony but that's not sovl. she has that same plastic yucky look in her eyes that's in every 2.5d pony slop gen.
>Lack of sovl in Pony gens is nothing but a skill issue.
maybe, but i'd say most of them look soulless and it's not the prompter's fault.
File: tmpg2gj3jjx3.png (565 KB, 900x1024)
565 KB
565 KB PNG
File: tall2.jpg (230 KB, 1624x1120)
230 KB
230 KB JPG
Gen some surrealism
File: Kolors_00006.png (1.05 MB, 832x1216)
1.05 MB
1.05 MB PNG

you can use fucking sketches in img2img, what the f are you talking about
>no info about dataset
who cares about that? no one share the dataset for the pretraning because they do it on copyrighted images
if they did the same training on DiT it would be even better, which is a shame because it's the best base model we got so far
File: Kolors_00010.png (1.23 MB, 832x1216)
1.23 MB
1.23 MB PNG
what does that have to do with the sovl factor of a model? i'm talking about pure prompting text to image + refinement with inpainting. pony sloppa still lose
File: Kolors.jpg (1.58 MB, 4924x1728)
1.58 MB
1.58 MB JPG
For a base model, Kolors is blowing everything out of the water, too bad it's still a unet model and the prompt understanding is not great, it's like we're having a base SDXL model if SAI were actually not cucked and more competent
>it's like we're having a base SDXL model if SAI were actually not cucked and more competent
pretty much it. there's another stable cascade based model that hasn't been released yet, so keep an eye out for that too.
File: Kolors_00009.png (1.33 MB, 832x1216)
1.33 MB
1.33 MB PNG
Oh wow, finally we got models that can render high resolution pictures, that's what it actually needs, more pixels to get good details everywhere, and we can see it works well, the hands look good on that one. Do you know if they
... Do you know if they plan on releasing the model or not?
...no i don't actually
Does anyone know what model is this person using? specifically for the faces (NSFW) https://x.com/Doreiko_Ai
novelai, maybe? i think you'll have better luck asking this in the >>>/h/ and >>>/d/ generals
File: 0.jpg (611 KB, 2048x1024)
611 KB
611 KB JPG
File: SDXL_0001.jpg (860 KB, 2048x2048)
860 KB
860 KB JPG
The only decent image I got out of this thing
NTA but /h/ and /d/ are weird... not like the things they are into, but the way they treat diffusion models is weird as fuck. They'll all about muh refiners and muh known characters
File: SDXL_0002.jpg (881 KB, 2048x2048)
881 KB
881 KB JPG
>They'll all about muh refiners and muh known characters
to be fair, both are pretty important for porn pictures
what vae should i use for pony?
File: 0.jpg (337 KB, 1024x1024)
337 KB
337 KB JPG
File: SDXL_0006.jpg (768 KB, 2048x2048)
768 KB
768 KB JPG
upscaler fuck up
sdxl vae?
File: orb1i.jpg (203 KB, 1304x1304)
203 KB
203 KB JPG
dead worthless general no one ever wanted or asked for
File: 0.jpg (310 KB, 1024x1024)
310 KB
310 KB JPG
zzzzzz.... mimimimi... zzzzzz.... mimimimi
Go back to your hugbox general where you can spam 1girls with a bunch of 14 year olds :)
What the fuck is this?
this thread has better images
What do you think it is?
4bit quantization for Kolors
How far along is you model, anon?
quantizing a unet architecture to 4bit is a really bad idea, it has been tried before and it was a disaster
File: 00330-3467391478.jpg (920 KB, 1260x1680)
920 KB
920 KB JPG
Good job it only applies to the large language model then
File: 01582-710939302.jpg (840 KB, 1814x1210)
840 KB
840 KB JPG
File: 00325-3467391480.jpg (501 KB, 1260x1680)
501 KB
501 KB JPG
I like this one better
ty dudes
Interesting how .85 looks odd compared to the numbers around it
File: 0.jpg (503 KB, 1024x1024)
503 KB
503 KB JPG
based reddit gen
File: orb6i.jpg (267 KB, 1304x1304)
267 KB
267 KB JPG
anon sleeping
VAE designed by members of 16chan
Why aren't we doing 32 channels already, since it clearly looks better than 16? VRAM requirement issues?

Hope you had your normal amount of coffee today
You can do it on T5 in PixArt.
People are already saying 16 channels is harder to train and converge with.
Forget what my retarded ass said. That's why I don't like to get up from my chair.
higher vae channels makes it harder to train the model i believe
could you not
try right click copy image to clipboard and then paste it onto the reply box
File: ComfyUI_Kolors_00645_.png (1.31 MB, 1216x832)
1.31 MB
1.31 MB PNG
What are you fellows drinking today?
File: file.png (1.6 MB, 1024x1024)
1.6 MB
1.6 MB PNG
water (no orange slices)
Hibernation mode
Nothing to talk about and we are all out of fresh prompts
File: ComfyUI_Kolors_00679_.png (1.52 MB, 1216x832)
1.52 MB
1.52 MB PNG
half-assed prompt for kolors trying to get it to make samus aran since i'm playing super metroid.
Looks good, got anymore?
File: file.png (1.42 MB, 1920x2048)
1.42 MB
1.42 MB PNG
File: ComfyUI_Kolors_00688_.png (1.62 MB, 1216x832)
1.62 MB
1.62 MB PNG
A few but they're basically all the same sort of thing.
File: file.png (2.41 MB, 1920x2048)
2.41 MB
2.41 MB PNG
do you guys think you'll be able to run the new lumina model when it releases?
Biden's signature after his stroke.
i imagine not
File: file.png (1.45 MB, 1920x2048)
1.45 MB
1.45 MB PNG
Don't think so. Damn these 4090 prices
Get AMD code up to speed and we can live rich with two 7900xtx
Need a miracle for it.
ran out of those in 2020
File: CLD.20241.jpg (181 KB, 1056x1400)
181 KB
181 KB JPG
File: ComfyUI_00004_.png (325 KB, 512x512)
325 KB
325 KB PNG
File: PA_0001.jpg (499 KB, 2560x1536)
499 KB
499 KB JPG
File: Kolors_00012.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
File: Kolors_00014.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
Is there anything I can run in a Docker Container that uses CPU?
Also, does anyone know if using OpenCL requires any special kind of Docker configuration to access that hardware?
I've got an RK3588 that I'd like to try running Diffusion on.
Yes, I know it'll be slow as balls, but if there's software that incorporates upscaling low-res images, it'd be nice to be able to host that on my RK3588 homeserver as opposed to getting queued on online services (which generally don't provide you with a lot of control).
File: Kolors_00018.png (1.34 MB, 1024x1024)
1.34 MB
1.34 MB PNG
SD models will be around 1~2 min for 512x512
Pixart 3~5 min for 1024x1024
Straight from the oven...
So I'm using easy diffusion, my stable diffusion install took a shit when KDE took a shit and I tried to update everything. No other GUI will work (I got used to Automatic but it was six months ago) now with Easy it's...easy, and does what I want but I don't know how integrate like, adetailer in to fix faces and it's built in face "fix" is a literal horror show

wat do

I'm dumb, so...
File: prompt.png (2.39 MB, 1377x688)
2.39 MB
2.39 MB PNG
Time to start learning Chinese. The meh prompt adherence seems to stem from the model's understanding of English being inferior to Chinese.

Left is English prompt, right is chinese.

A dog next to a cybernetic cat, the dog has a coat with the number "1" written on it |
File: pro.png (1.68 MB, 1139x574)
1.68 MB
1.68 MB PNG
Jesus Christ riding a stegosaurus while smoking a cigar and wielding a thor's hammer. He is calling down lighting in the chaotic scene around him |
Can I just feed it to Google translate?
File: pro.png (1.44 MB, 1031x515)
1.44 MB
1.44 MB PNG
I'm afraid not. You must now learn Mandarin chinese.

sonic the hedgehog riding in a new york taxi while giving a peace sign |
heh heh time to see if these yellow devils have a word for vagina
why are they downscaled?
I just screen capped the output because I couldn't be bothered to save and stich the image together and I didn't have an output grid.
Shadows make no sense
Welcome to AI

[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.