[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Settings Mobile Home
/g/ - Technology

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

File: tmp.jpg (1.12 MB, 3264x3264)
1.12 MB
1.12 MB JPG
General dedicated to creative use of free and open source text-to-image models

Previous /ldg/ bread : >>101329150

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
ComfyUI: https://github.com/comfyanonymous/ComfyUI

>Auto1111 forks
SD.Next: https://github.com/vladmandic/automatic
Anapnoe UX: https://github.com/anapnoe/stable-diffusion-webui-ux

Nodes: https://github.com/kijai/ComfyUI-KwaiKolorsWrapper

>Pixart Sigma & Hunyuan DIT
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Use a VAE if your images look washed out

>Models, LoRAs & training


>Index of guides and other tools

>View and submit GPU performance data

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Share image prompt info

>Related boards
Blessed thread
File: PA_0071.jpg (923 KB, 2560x1536)
923 KB
923 KB JPG
Post https://imgsys.org/rankings
next time.
File: 0.jpg (708 KB, 2048x1024)
708 KB
708 KB JPG

>Someone creates model that makes 30 second videos that fake steps to drawing an image
>Artists pissing farting and cumming with rage because they used the the steps in making the art as a way to certify the image was not AI generated
File: PA_0001.jpg (792 KB, 3328x1152)
792 KB
792 KB JPG
File: file.png (3.86 MB, 1408x2816)
3.86 MB
3.86 MB PNG
>cumming with rage
File: file.jpg (1.43 MB, 1408x2816)
1.43 MB
1.43 MB JPG
File: PA_0007.jpg (757 KB, 3328x1152)
757 KB
757 KB JPG
Which one is better. Type 1
true art
File: PA_0005.jpg (663 KB, 3328x1152)
663 KB
663 KB JPG

Type 2
type 1 but you should combine elements of both in post to get the best of both worlds
whats the difference workflow wise?
File: PA_0008.jpg (627 KB, 3328x1152)
627 KB
627 KB JPG
about 40 seconds extra for type 1
File: PA_0009.jpg (590 KB, 3328x1152)
590 KB
590 KB JPG
I mean Type 2
File: PA_0016.jpg (466 KB, 2560x1536)
466 KB
466 KB JPG
Type 1
File: PA_0017.jpg (443 KB, 2560x1536)
443 KB
443 KB JPG
Type 2
This one
File: PA_0029.jpg (584 KB, 2560x1536)
584 KB
584 KB JPG
File: PA_0028.jpg (526 KB, 2560x1536)
526 KB
526 KB JPG
Not going to specify which one is what. Choose one please.
kek first one is overly detailed to the point of being wonkier
File: kolors_00532_.png (1.49 MB, 1024x1024)
1.49 MB
1.49 MB PNG
File: PA_0031.jpg (523 KB, 2560x1536)
523 KB
523 KB JPG
Well then, now I need to figure out how to shave off 40 seconds and fix this problem every 2nd image
File: file.png (108 KB, 256x256)
108 KB
108 KB PNG
We simply just wait for other models, AI art models aren't going out of fashion. I do believe Pixart is going to be the 3B/7B model of the local AI art space, it has a ton of potential and is attainable to train. Things are progressing with my 1.3B from scratch model even though it's slow. But keep in mind with the 50 series coming soon it will be even easier to train with local hardware. We used to wait years for improvement in AI stuff. I still remember when the first art models came out and how crude they were.
File: Grid.jpg (2.01 MB, 4096x1920)
2.01 MB
2.01 MB JPG
File: PA_0033.jpg (573 KB, 2560x1536)
573 KB
573 KB JPG
fixed lost 3~5% quality gained 10 seconds
File: PA_0036.jpg (561 KB, 2560x1536)
561 KB
561 KB JPG
File: PA_0038.jpg (861 KB, 3328x1152)
861 KB
861 KB JPG
File: PA_0040.jpg (950 KB, 3328x1152)
950 KB
950 KB JPG
Night everybody!
File: ComfyUI_00808_.png (3.36 MB, 1096x2392)
3.36 MB
3.36 MB PNG
>get bad hands
>add negative prompts to help
>get good hands
>with 6 fingers
It's like it's mocking me.
File: ComfyUI_00815_.png (3.48 MB, 1096x2392)
3.48 MB
3.48 MB PNG
Type 1 feels more ethereal, eldritch, otherworldly, etc. Type 2 feels like a cringy band t-shirt or something.
I see you kek
Remove the extra with gimp or IOPaint
File: file.jpg (1.27 MB, 1920x2048)
1.27 MB
1.27 MB JPG
they finetuned the image gen capabilities back to meta's chameleon model. i am not sure if these models can be quanted like llms.

>We have provided open-source model weights, code, and detailed tutorials below to ensure that each of you can reproduce these results, and even fine-tune the model to create your own stylistic variations.
Love it
small dataset maybe 30-40 images, high res images, sometimes very wide or tall ratios.

for Lora training, is it worth manually zoom/cropping 1:1, 3:4, and 4:3 to capture more detail and artificially increase dataset? If so, any other ratios I should consider cropping to?
official pixart bigma and lumina 2 waiting room
the fact this is possible at all is incredible. good stuff
I heard a rumor about the 5090 only having 28GB VRAM, hope it has more but I don't expect good things.
File: file.jpg (1.12 MB, 2688x1536)
1.12 MB
1.12 MB JPG
It's best to have most or all of the images in the same resolution. Another way to artificially increase the dataset is to do img2img at low denoise. The more dissimilar the images are, the more training time is required in order for the concept to be learned correctly.
File: file.jpg (932 KB, 2688x1536)
932 KB
932 KB JPG
File: stablediffusion12.jpg (323 KB, 1552x1200)
323 KB
323 KB JPG
File: file.jpg (1.12 MB, 1920x2048)
1.12 MB
1.12 MB JPG
Anyone tested it yet?
can it finally make good hentai?
When I was little I was told I was bad at drawing. Of course I was never told you could learn. Anyway, I often daydream about a machine that would read my thoughts and generate images. And now that dream is basically reality.
It's so strange when I think about it. It felt impossible but some people were able to make it possible.
been able to for awhile now
>Specifically, Anole-7b-v0.1 was developed using a small amount of image data (5,859 images, approximately 6 million image tokens) and was fine-tuned on just a few parameters (less than 40M) in a short time (around 30 minutes on 8 A100 GPUs). Despite this, Anole-7b-v0.1 expresses impressive image generation capabilities.
They made a LoRA with ~6k images in 30 mins + merged it? lul
File: 0.jpg (506 KB, 1024x1024)
506 KB
506 KB JPG
File: PA_0001.jpg (637 KB, 2048x2048)
637 KB
637 KB JPG
The other rumor is it's 50% faster than the 4090. The 4090 is significantly faster than the 3090. If they have the same leap, that alone is insane. It's not all about VRAM and nowadays you can shard your extremely large models but no one really does that and certainly not faggots like SAI that want to ensure there's an 80GB minimum on fine tuning.
File: PA_0007.jpg (1002 KB, 3328x1152)
1002 KB
1002 KB JPG
File: PA_0016.jpg (907 KB, 2432x1664)
907 KB
907 KB JPG
File: PA_0021.jpg (1024 KB, 2432x1664)
1024 KB
1024 KB JPG
File: PA_0027.jpg (582 KB, 2432x1664)
582 KB
582 KB JPG
File: 00006-1977336930.jpg (255 KB, 1576x1120)
255 KB
255 KB JPG
File: PA_0031.jpg (744 KB, 2432x1664)
744 KB
744 KB JPG

Might be a good news
AMD could, you know, pay several developers $5 million and just get CUDA compatibility.
Does anyone have a good rotoscoping tutorial? I want to understand it rather than the minimum to get it to work.

>not all about VRAM and nowadays
>you can shard your extremely large models
>SAI that want to ensure there's an 80GB minimum

I feel like you wanted to have 3 conversations here.

nvidia would hire them away when the project was at 70%
I can have a multifaceted point about this.

Large models are unnecessary and has been shown multiple times now, a small model can beat a large model in a single domain/discipline. Large models excel at being jacks of all trades but master of none and it's done with huge diminishing returns. A 2B model trained only on anime is going to be better at anime than Dalle-3.

Sharding is efficiency, you don't need to make something bigger just because you introduce efficiency. This means you could train your smaller model faster with the same hardware.

SAI is a shit company that purposely hamstringed local AI training.
>nvidia would hire them away when the project was at 70%
Not everyone is as unscrupulous as you and quit projects just because they got an opposing bid. I realize you'll sell your mom for a dollar but that's just projection.
>Large models are unnecessary and has been shown multiple times now, a small model can beat a large model in a single domain/discipline......
Agreed. Too bad investors can't get behind the idea of a doing one thing well, instead of adapting the mega then to do the one thing.

>Sharding is efficiency, you don't need to make something...
you have a very specific definition of efficiency. Sharding decreases hardware efficiency to save you on human efficiency.

>SAI is a shit company
relax dude, we all know, there is nothing new that can be said at this point.

Your entire post was projection. It's okay anon, life will get better.
>Your entire post was projection. It's okay anon, life will get better.
Life must suck as a nihilistic assuming the worse of people all the time.
Your entire post was projection. It's okay anon, life will get better.
I like
what image model do you guys suggest for western fantasy art, I'm talking like mtg art. I want to generate goblins and dragons.
Just use tensorrt for comfyui on your favourite model for a 50% increase in gen speed. The current version has a memory leak but it's not terrible and will get more dev time allocated if more people use it.
No one cares about generation speed.
So you're rocking along using your cpu, good for you.
>The powerful LI-DiT-10B will be available through the online platform and API after further optimization and security checks.
This conversation is not about inference speed, but it's cute you use some distilled model anon to generate corpo approved images.
I don't want to have to juggle a mess of different models and loras to synthesize multiple disparate concepts. No future there.
>wastes time training a model that has no other purpose than to flood the internet with fake process videos
work, sleep, work, sleep.. No time to even gen anything
Generating for sake of just Generating is not good practice
The girl in the middle looks hot

Any more gens of her?
You solved a captcha to post this
File: 0.jpg (369 KB, 832x1216)
369 KB
369 KB JPG
File: oeoeoo_00348_.png (2.18 MB, 1280x1280)
2.18 MB
2.18 MB PNG
Really frustrating that SD3 basically refuses to produce figures that aren't in a neutral standing position facing forward or backward. It can produce really interesting detail and actually does know a lot of significant artists' names/styles.
File: PA_0036.jpg (513 KB, 2560x1536)
513 KB
513 KB JPG
This one?
File: 00000-3064829863.png (2.45 MB, 1248x1824)
2.45 MB
2.45 MB PNG
File: 00001-1477610561.png (1.77 MB, 1216x1216)
1.77 MB
1.77 MB PNG
File: PA_0038.jpg (741 KB, 1664x2432)
741 KB
741 KB JPG
File: PA_0039.jpg (778 KB, 1664x2432)
778 KB
778 KB JPG
File: PA_0041.jpg (725 KB, 1664x2432)
725 KB
725 KB JPG
File: 00003-3792186198.png (2.4 MB, 1248x1824)
2.4 MB
2.4 MB PNG
Kinda reminds me of game of thrones, only missing the purple eyes but in the tv series they all blue eyed anyway kek
File: SDXL_0004.jpg (754 KB, 1664x2432)
754 KB
754 KB JPG
We'll never see winds of winter release date.
File: 00006-3089850714.png (1.09 MB, 832x1216)
1.09 MB
1.09 MB PNG
I dunno, depending on grrm we could at least see what he wrote for Winds of winter, a dream of spring is a pipedream though so I gave up on that at least house of the dragons is kinda nice even if I disagree with many things, the velaryons being one of them but far from the main one
File: SDXL_0008.jpg (399 KB, 1664x2432)
399 KB
399 KB JPG
Refuse to watch that on the grounds on how GoT ended
You Generate for fame and glory?
File: SDXL_0009.jpg (623 KB, 1664x2432)
623 KB
623 KB JPG
File: SDXL_0013.jpg (661 KB, 1664x2432)
661 KB
661 KB JPG
File: SDXL_0015.jpg (697 KB, 1664x2432)
697 KB
697 KB JPG
File: SDXL_0016.jpg (648 KB, 1664x2432)
648 KB
648 KB JPG
File: SDXL_0017.jpg (873 KB, 1664x2432)
873 KB
873 KB JPG
File: SDXL_0018.jpg (860 KB, 1664x2432)
860 KB
860 KB JPG
File: SDXL_0019.jpg (801 KB, 1664x2432)
801 KB
801 KB JPG
File: 00014-2354484914.png (1.4 MB, 1024x1024)
1.4 MB
1.4 MB PNG
File: SDXL_0021.jpg (722 KB, 1664x2432)
722 KB
722 KB JPG
File: SDXL_0025.jpg (1.01 MB, 1664x2432)
1.01 MB
1.01 MB JPG
File: SDXL_0026.jpg (610 KB, 1664x2432)
610 KB
610 KB JPG
File: SDXL_0027.jpg (677 KB, 1664x2432)
677 KB
677 KB JPG
File: PA_0053.jpg (961 KB, 2560x1536)
961 KB
961 KB JPG
File: PA_0055.jpg (942 KB, 2560x1536)
942 KB
942 KB JPG
File: PA_0056.jpg (1.13 MB, 2560x1536)
1.13 MB
1.13 MB JPG
File: PA_0058.jpg (1017 KB, 2560x1536)
1017 KB
1017 KB JPG
Descending into madness
File: PA_0064.jpg (927 KB, 2560x1536)
927 KB
927 KB JPG
I've become convinced that "synthetic data" is how SD3 ended up fucked. Especially the CogVLM captions.

I've been fucking around with image-to-text, and all the "best" models have the same kinds of problems. It's always "The image depicts X. The overall composition of Y suggests a tone of Z." That's not a damn caption, it's a conversation with a digital assistant trying to hit a word count, and typically making shit up to get there. Cut it all down to just the "X" part, and you'd be fine.

So I eventually just found myself using BLIP-2. It's ancient, by the pace we're going at, but it clearly wasn't trained on output from some LLM, it doesn't hallucinate, it doesn't try to hold a conversation, and it isn't overly verbose. I doubt I'll ever find another model that straight-up generates "an anime girl is getting fucked by several men".
File: PA_0065.jpg (875 KB, 2560x1536)
875 KB
875 KB JPG
The problem is they trained *only* on long synthetic captions when in reality it should be a mix of short, medium and long captions.
If you want verbose model?

File: PA_0076.jpg (1.32 MB, 2560x1536)
1.32 MB
1.32 MB JPG
I hope you enjoy this piece
File: PA_0078.jpg (1.11 MB, 2560x1536)
1.11 MB
1.11 MB JPG
Good night.
File: 00000-2010508896.png (3.2 MB, 1536x1536)
3.2 MB
3.2 MB PNG
Good night
Prompt challenge?
In the course of centuries, Man has devoured the Earth itself. The Machine Age has dried up the seas of oil. Industry has consumed the heartlands of coal. The Atomic Age has plundered the rare elements — uranium, cobalt, plutonium — leaving behind worthless deposits of lead and ashes. Starvation is at hand. Only here, in the void of space, is there a new source of atomic power. Above us, in the debris of the solar system, in the meteorites and asteroids, are the materials needed to drive the reactors. Yet in their distant, silent orbits, these chunks of matter are beyond the reach of man, beyond the reach of human hands, but not beyond the reach of human minds. Driving along a country road in an ordinary car is a modest man: Harold J. Finley, quiet and profound...
File: 1720670820684.jpg (83 KB, 626x626)
83 KB
I'm diffoosin
What are the best prompts for benchmarking/comparisons?
synthetic data is an utter plague upon this field and has been horribly overused beyond applications where it makes sense.
File: 00064-1783398307.png (1.89 MB, 1024x1280)
1.89 MB
1.89 MB PNG
This is jungle fever

testing stealth PAPA NOVEMBER GOLF info transmission on channel Alpha

please acknowledge
SD3 or Kolors?
Both lobotomized?
catbox or prompt
File: 13212313212.png (51 KB, 1321x595)
51 KB
repeat - info transmission on channel alpha
File: Untitled.png (54 KB, 408x1080)
54 KB
what the FUCK, the Crystools node is slowing down all my generations by 4.5 seconds.

Uninstall that shit right now if you've got it.
File: ComfyUI_Kolors_00892_.png (1.84 MB, 1216x832)
1.84 MB
1.84 MB PNG
I like >>101360814 this one :D
File: zzz9_00011_.jpg (181 KB, 1200x1200)
181 KB
181 KB JPG
I made a super nice Mariana Cordoba Lora but civtai won't host it because it's a real person with nudity
but she's fucking dead!
only a dozen or so people got to delete it before mods yeeted it
is there any non-scared of lawsuits from dead people places to host the file?
These ones are cool too
It's not so much the captions but the images themselves being synthetic. Pixart uses LLM captioned images and works quite well.
But SD3 is more than that, from insiders: the safety training is what fucked it up the most.
I want to have her babies
it's kinda funny how civitai essentially becomes a hentai only website because of rules like this.
Like AI just becomes completely synonymous with hentai, all the professional researchers and scientists become associated with hentai, with billions of dollars going into funding the ability to make better and better hentai.
how do you get cpu, ram, etc stats like that
Those stats are all added by Crystools. Sorry I wasn't clear. But yeah those bars reporting data slow shit down.

I've also noticed other nodes slowing shit down, like the comfyui-profiler. You don't want to be running that all the time either.
File: file.jpg (1.37 MB, 2176x1920)
1.37 MB
1.37 MB JPG
>So I eventually just found myself using BLIP-2.
Same here 2.7b version + wd v3 tagger. I combine interrogations and it has been really good
File: PA_0002.jpg (1.27 MB, 2560x1536)
1.27 MB
1.27 MB JPG
File: 00014-3613041320.jpg (277 KB, 1400x1400)
277 KB
277 KB JPG
mind dropping a workflow?
File: 33380292.jpg (566 KB, 2048x1024)
566 KB
566 KB JPG
>chink models are the future
well, so be it then, I fully embrace our new chink ai overlords
File: 0.jpg (127 KB, 1024x512)
127 KB
127 KB JPG
File: 00442-3037234057.jpg (967 KB, 1440x1920)
967 KB
967 KB JPG
I use this
it also supports internlm-xcomposer2-vl-7b-4bit if you have beefy gpu and want to tag porn
File: 0.jpg (672 KB, 2048x1024)
672 KB
672 KB JPG
File: 1girl_marin.jpg (56 KB, 512x768)
56 KB
takigawa marin
Hello anon
Dev branch is great, highly recommend
Have ideas why LDG is so comfy comparing to other AI-threads?
File: 1692594552804026.jpg (1.96 MB, 9367x1050)
1.96 MB
1.96 MB JPG
how do i do a 2d grid for random seeds in swarmui? i did my cfg 4.5,5,..,7 and now i want to dig through say 4 random sneeds for each. ofc i can set seeds for the second field but that aint random. the image count in the default field next to cfg doesnt apply to grids.
Does swarm allow you to set grid seeds to -1?
that works but kinda defeats the purpose because it generates a random sneed every time instead of running cfg permutations of each random number.
Is there any promising new model with 16ch VAE?
Now that I understand the difference it makes with SD3, I don't want to go back. Too bad SD3 is so uncooperative.
Just upload only non-nude images with the LoRA. There are tons of pornstar and celebrity LoRAs on civit, and eveyone knows what they're used for.
>SFW only boys haha wink wink nudge nudge
Sure, no online genning, but I'm sure local gennera would still want it.
They could be sued by an estate of a dead person, I mean at least that's as likely as any other lawsuit

I thought nudity in the training data was fine, you just can't upload generations with nudity
1.5 and XL versions. Last I heard they were /close/ to being finished.
>1.5 + 16ch vae
Please I'll take it
No avatars.
Those are fake artists anyway with their undo button and unlimited colors.
I'm an oil painter and I love diffusion because it makes me painting ideas
>I'm an oil painter and I love diffusion because it makes me painting ideas
In next month I have to go photograph one dudes whole gallery and make a finetune/loras out of it. I'm free to use that dataset any way I want which is nice
File: 00000-3904996473.jpg (1.82 MB, 1960x2620)
1.82 MB
1.82 MB JPG
No scans available?
File: 00001-3904996473.jpg (1.17 MB, 1960x2620)
1.17 MB
1.17 MB JPG
>No scans available?
Nope. 20+ framed paintings. Tripod + camera will do, it should be alright. Decent natural light
>2 days ago
Isn't it time to wrap up the splitting experiment?
I don't want to associate with the faggots in SDG. If you want to hang out with the middle schoolers sharing TikTok dances go ahead.
I don't frequent imagegen and I pop in these threads time to time when I feel like baking pictures. It's not really obvious to me what's wrong with sdg, and I don't see any tiktok dances even in a non literal sense. But when I post it's probably going to be in one general or the other, can you make a case why this exists? It seems like a slower general that absorbs some quality discussion. I also see charges against sdg in ldg but not the other way around.
forgot to include, not that anon. I'm obviously asking what is the reason for "the splitting experiment"
The general is where all the high IQ anons are. SDG is a bunch of faggots spamming 1girls. You wonder why their is quality discussion here and not there? Can you apply yourself?
SDG is a circle jerk around one specific image generation company (SAI). They are actively hostile to other models even though a year from now no one will remember SAI because they will be irrelevant.
>You wonder why their is quality discussion here and not there?

I said the first part of that sentence, not the second half

I also see an equal amount of butthurt even when asking polite questions in a charitable way. I think you've explained everything, thanks
File: 1717204019374258.webm (2.43 MB, 856x1280)
2.43 MB
2.43 MB WEBM
I've been playing around with liveportrait and it works really well as long as the head does not move. wish they could figure out incorporating head movements though.
You'd have to be pretty stupid to look at the generic 1girl slop and Discord-tier social fagging in sdg and think it's equal to here.
File: 00003-3904996473.jpg (1.44 MB, 1960x2620)
1.44 MB
1.44 MB JPG
t: architect
This is not true or outdated. I've seen nothing but contempt for their censoring and ruining SD3 with it, and talent is jumping ship
Haha okay, anyways, you can go hang out in SDG now. This general isn't going away.
Nice. Very interesting.
I'd rather scroll passed low quality content than see butthurt that fits on screen
You can hang out with your underaged banned buddies now having middle school tier conversations.
It's fine that this general exists, I just wanted to know if it was worth typing ldg in the catalog
>tfw memeanon isnt here to point out b8 posts
You talk about middle school a lot and have your own lunch table, don't grow up too fast
If you want actual tech discussion this is the only place where it happens. SDG is just a Discord server, that's the difference.
File: media_GSIv_PWaUAURhiQ.jpg (161 KB, 1024x1024)
161 KB
161 KB JPG
Looks like this mf is making a really uncensored model based on the SD3 architecture
It's still censored because it's using censored datasets, it's also square cropped which is a shame. The white pill of all this is one guy did it.
Fuck... looks like we have to rely on the chinks (Kolors, Pixart, Hunyuan...) to move the imagegen community forward. That's how far the west has fallen.
It doesn't mean it won't be trainable or another good base model. But you're unlikely to get anything too crazy out of it out of the box.
If this model was trained on DiT instead of the SDXL architecture, it would've been midjourney tier, I'm not kidding, they know how to train their models well
File: 00006-3904996473.jpg (2.86 MB, 1960x2620)
2.86 MB
2.86 MB JPG
You can't sue China
Why is this so much better than previous architectures? Explicitly please.
dunno, but Sora uses that and it's a fucking beast
File: 00007-3904996473.jpg (757 KB, 1646x2201)
757 KB
757 KB JPG
File: SDXL_0002.jpg (644 KB, 1664x2432)
644 KB
644 KB JPG
It's just a better architecture than unet
>We explore a new class of diffusion models based on the transformer architecture. We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. We analyze the scalability of our Diffusion Transformers (DiTs) through the lens of forward pass complexity as measured by Gflops. We find that DiTs with higher Gflops -- through increased transformer depth/width or increased number of input tokens -- consistently have lower FID. In addition to possessing good scalability properties, our largest DiT-XL/2 models outperform all prior diffusion models on the class-conditional ImageNet 512x512 and 256x256 benchmarks, achieving a state-of-the-art FID of 2.27 on the latter.
File: Silverlight.jpg (616 KB, 1536x1536)
616 KB
616 KB JPG
I am trying to remove a logo from a shirt via inpaint, and all I want is it to be black, like the rest of the shirt. But every time I try to inpaint it, it keeps spitting out more logos despite me putting it in the negatives, having my prompt simply be "black shirt", etc. What can I do to not be retarded and get this stupid shit fixed? Trying latent noise and latent nothing only switched the logo from red to white and made it an even bigger eyesore.
first use photoshop/gimp/whatever to crudely paint it black and then run it tough it for fine details. Inpaint does not want to change the image too much, it tries to give you a variation of already existing stuff
File: 00000-3904996473.jpg (819 KB, 1646x2201)
819 KB
819 KB JPG
p h o t o s h o p
File: 00001-3904996473.jpg (943 KB, 1646x2201)
943 KB
943 KB JPG
too little too late eh
File: SDXL_0008.jpg (551 KB, 2432x1664)
551 KB
551 KB JPG
Anyone know if it's correct that the existing 1.5/XL models will need to be partially retrained to adapt them to the 16 channel VAE?
From my understanding that is correct but I feel like I also heard, once they figure it out, it wont be "that much work".
File: 00002-3904996473.jpg (1.04 MB, 1646x2201)
1.04 MB
1.04 MB JPG
I love these girls
just few thousand more and I'll be tired of making them
File: kek.jpg (134 KB, 965x1489)
134 KB
134 KB JPG
>A unet architecture model is more trending than a DiT model
that's how bad SD3M is
>a dit model
I mean, you can train a 1 param dit model and it also wont trend, good architecture cant salvage garbage dataset, hyperparams and whatnot
of course, that's why it's insane how much they destroyed SD3M in the sake of the """safety""" religion, the chinks don't give a fuck about that, they just want to make a great product, period
File: ComfyUI_00526_.png (1.6 MB, 1024x1024)
1.6 MB
1.6 MB PNG
to fluff or not to fluff
File: ComfyUI_00572_.png (1.45 MB, 1024x1024)
1.45 MB
1.45 MB PNG
no, SD3 can't do this.
File: IMG_6991.jpg (738 KB, 2432x1664)
738 KB
738 KB JPG
File: IMG_6996.jpg (605 KB, 2432x1664)
605 KB
605 KB JPG
File: 0.jpg (657 KB, 2048x1024)
657 KB
657 KB JPG
it can't do what?
File: SDXL_0013.jpg (748 KB, 2432x1664)
748 KB
748 KB JPG
newfag here.

i want to use AI to make portraits for characters for my superhero campain.

should i use a pony model or stick to 1.5? i'm looking mostly for ease of use, something i could learn a few tricks and be decent.
thanks in advance
/tg/ or /qst/?

Type of quality of image you're looking for?

Do you care about hands or feet?

Just portraits nothing else?
If you don't need any NSFW or females in revealing outfits, then I can suggest Dall-E 3.

It's still easily the best and very noob friendly.
Local Diffusion General
>Dall-E 3 Come on man, you can't be serious with this.
1. Tg

2. escencially decent quality is good enough

3. yes, i know my players will point it out so the best looking the better

4. portraits are the most important, but i will try to work it on scenary or background, tho that is a lesser importance

also, i have some ideas with CivitAi, i'm using forge, and got some idea with loras and negatives, beyond that i'm willing to look into plugins for hands and stuff like that
also should i go back to A1111? i just notice forge is no longer in the main OP post like it used to
You can start here

Or I can point to a basic one to start off with local that just will do the job to get you started.
forge is still fine for now and will be fine as long as you never update past the current version
the changes they're making will break compatibility with most extensions, but they still work as of now

grab the Q8 version to bounce ideas off of.
okay, i was following until this, whhat are those? models? add ons?
WestLake is quite familiar with Mystery Men RPG and can help you with being a good DM

as an addition to chat over hall I would recommend to you https://sillytavernai.com/
so you can test some of your characters there instead of bare bones koboldcpp
Chat (not pictures) like ChatGPT but local no data sent to a random fag in the cloud
ohhh, i get it now, thanks
Trying that new fal model that popped up an hour ago.
File: adfasdfasdf.jpg (870 KB, 2387x1128)
870 KB
870 KB JPG
>they fell for the text meme
File: 0.jpg (607 KB, 1024x1024)
607 KB
607 KB JPG
A smart enough model gets text for free unless you go out of your way to remove any images with text.
File: file.png (1.82 MB, 1024x1024)
1.82 MB
1.82 MB PNG
What's its token limit tho
File: file.png (2.03 MB, 1024x1024)
2.03 MB
2.03 MB PNG
>sfw prompt
>random seed
>pic rel
I figured another company would take up the low hanging fruit. Obviously AF is heavily monetized but SAI was fucking retarded thinking they couldn't be displaced in the API/Server For Hire "Open" Art model space.
File: SDXL_0015.jpg (742 KB, 2432x1664)
742 KB
742 KB JPG
File: PA_0007.jpg (874 KB, 2560x1536)
874 KB
874 KB JPG
File: file.jpg (1.09 MB, 2176x1920)
1.09 MB
1.09 MB JPG
File: AuraFlow_00020_.png (1.02 MB, 1024x1024)
1.02 MB
1.02 MB PNG
I'm kind of confused, does this model just spit out images of cats if it thinks the image isnt sfw?
This wasn't a NSFW prompt at all.
File: AuraFlow_00022_.png (1.27 MB, 1024x1024)
1.27 MB
1.27 MB PNG
Here's what I was trying to generate btw. If it's going to berate me for "not being safe" with a cat at every turn the model may have already sunk itself.
I presumed it was a post inference thing.
File: file.jpg (1009 KB, 1536x2560)
1009 KB
1009 KB JPG
There are three reasons to use local over Dall-E 3.

>Uncensored, for NSFW
>Total control over every aspect of the generation process
>No logs, no logging in

If all you care are pretty images with the least amount of effort and you have no need for NSFW and don't care if Microsoft can see what you gen, then Dall-E 3 is the best there is. I was just being honest.
I do bunch of shit locally, but I also use DE3, because it just works.
seems fluffable to me
Since it's basically come to light recently that anyone with a dataset and a few GPUs for hire can make models on par or surpassing SD3, (Kolors, AuraFlow, Pixart Sigma (soon bigma)) etc, when is someone going to ask the question of what the fuck SAI did with the hundreds of millions of dollars it was afforded in venture capital and forgiven loans?
File: ComfyUI_00101_.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
Towards the right is a cartoon dragon on top of a cliff, to the left is a anthromorphic fox wearing armor riding a horse. The horse is standing on top of a blue cube. In the background there is a flying eagle holding a sun. The sun has a angry face on it.
is there as much of a difference between seed 1 and seed 2, as there is between seed 1 and seed 1000000?`
File: plinko-stake.gif (1.25 MB, 636x640)
1.25 MB
1.25 MB GIF
Think of a seed being a big game of plinko and the number you choose being where the ball starts.
Comment all you want about the shitty art. The model nailed the prompt pretty much flawlessly.
File: AuraFlow_00040_.png (979 KB, 1024x1024)
979 KB
979 KB PNG
a shiny red ball reflecting the face of a man with a scraggly beard and glasses looking at the viewer with rage on top of a green cube, the green cube has a window on it and within the window is a sign that says "test" on it. The image is in a 3D cgi style,
how much if at all do you have to cherry pick
File: AuraFlow_00039_.png (781 KB, 1024x1024)
781 KB
781 KB PNG
Not much. Here's the previous where the face kind of bled out
File: ComfyUI_00102_.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
That was first try. This was 2nd. It got the angle holding the sun part wrong.
File: ComfyUI_00122_.png (1.21 MB, 1024x1024)
1.21 MB
1.21 MB PNG
(dramatic reptilian alien portrait photo taken on the bridge of a spaceship overlooking a alien planet:1.4), (it is wearing a translucent helmet, it's eyes are glowing:1.4), (dark background:1.2), (fantasy vibe:1.2), rich colors, high contrast, hard focus, intricate details, natural light, ethereal, expressive, intimate, elegant, vibrant bloom, whimsical, dramatic shadows, medium close-up, 85mm lens, f/2.8, atmospheric, moody, evocative, luxurious, textured, artistic, surreal, detailed, otherworldly
File: file.jpg (847 KB, 2560x1536)
847 KB
847 KB JPG
File: Grid.jpg (1.03 MB, 6144x2560)
1.03 MB
1.03 MB JPG
damn, it adhered your prompt really well, wonder how sd3's prompt adherence is compared to it
wait he released the model? where?

Best prompt following model so far.
it works on comfyUI? can it do nfsw?
Comfyui workflow, make sure to update to latest comfy and get the model from >>101375564
>can it do nfsw?
not sure but it's very underbaked so temper your expectations. it's basically a v0.1 beta
>16.4 GB
oof that's a big boy, does it run on a 24gb vram card? that's probably a 5b or a 6b right?
Kind of the same way pixart can do with careful prompting. But its clearly super undertrained still. Will be barbie doll like, doubt it has much nudity in it if at all.
I'm running it on a 4090 so yes. 6.8B.
does he intend on finishing the pretraining or is he expecting us to finish the job?
how many vram does it ask? if that's barely 24gb it means that we won't be able to run sd3-8b

>We are not done training! This model is an initial release to kickstart some community engagement. We will continue training the model and apply our learnings from this first attempt. We also noticed that smaller models or MoE’s might be more efficient for consumer GPU cards which have a limiter amount of compute power, so follow closely for a mini version of model that is still as powerful yet much much faster to run. In the meantime, we encourage the community to experiment with what we are releasing today.

>Our goal is to make this model a standard backbone that other innovative work can be built on top of. We look forward to community contributions. If you want to train finetunes, IP-Adapters, or quantizations of the current model, we are happy to support you in any way we can. There is already a vibrant community around fal and Aura models in our Discord. We invite you to join if you want to get involved.
the prompt following is impressive, how did he do that? he used CogVLM or something better?
maybe he's our savior, I hope he's not on the side of the "muhhh safety" freaks though and that he'll add human nudity on the pretraining aswell
I'm seeing around 16GB vram ish?
Weird that we know practically nothing about him, his methods, the price of the training so far, how much money he has nor his ultimate unstated goals.
holy fuck, it's probably even better than dalle3 on prompt understanding, please tell me the licence is good too?
It's the guy who made loras for text to image basically.
The size of the file is the size it will take up in vram
Apache License
File: GQr-cr1W8AA6Ns-.jpg (86 KB, 1004x1004)
86 KB
that sounds too good to be true!
>Great licence
>Great prompt understanding
>DiT architecture
>Big model (~6b) that can run on a 24gb card
>Not a single statement on "muhhh safety"
Maybe this is it anons, we are probably back
Okay okay, AuraFlow is cool and all. But what about training? That model is a big boy. Can it be cut down and made more manageable? Can Loras be trained on a consumer GPU? If the answer is no, it may be kind of dead.
not at all, SDXL is "only" 6gb big but when I do a 1024x1024 inference it asks for 11gb of vram (A1111)
Don't be a pussy, the /lmg/ community are training their 70b models on cloud, this model is 10 times lighter than that
You can test the demo here
This video and the comments are the funniest thing I've seen in weeks

Le Fresh Bread
I see you didn't answer my question about training it on a consumer GPU.
it won't work on a consumer GPU, it's too big, the future is the cloud, that's the price to pay if you want your model to be midjourney/dalle3 tier, you won't beat that with small model
Oh cool, another thing I have to "rent" to remain competitive. Fuck renting, fuck datacenters.
i gained absolutely nothing from clicking that link
File: file.jpg (1.16 MB, 2560x1536)
1.16 MB
1.16 MB JPG
I didn't see top left when it was posted. I love it when the collage has ones I missed.
I see you ignored the part where I said that big models are required if you want to get competitive against API models, don't whine at us, whine at fucking Nvdia for not providing us enough VRAM, and it being said that their 5090 will only be 28gb, fuck them
File: file.jpg (1.18 MB, 2560x1536)
1.18 MB
1.18 MB JPG

[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.