[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: tmp.jpg (1.13 MB, 3264x3264)
1.13 MB
1.13 MB JPG
General dedicated to creative use of free and open source text-to-image models

Previous /ldg/ bread : >>101344420

>Beginner UI
Fooocus: https://github.com/lllyasviel/fooocus
EasyDiffusion: https://easydiffusion.github.io
Metastable: https://metastable.studio

>Advanced UI
Automatic1111: https://github.com/automatic1111/stable-diffusion-webui
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
InvokeAI: https://github.com/invoke-ai/InvokeAI
ComfyUI: https://github.com/comfyanonymous/ComfyUI

>Auto1111 forks
SD.Next: https://github.com/vladmandic/automatic
Anapnoe UX: https://github.com/anapnoe/stable-diffusion-webui-ux

>Use a VAE if your images look washed out
https://rentry.org/sdvae

>Model Ranking
https://imgsys.org/rankings

>Models, LoRAs & training
https://civitai.com
https://huggingface.co
https://aitracker.art
https://github.com/Nerogar/OneTrainer
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts

>Kolors
https://gokaygokay-kolors.hf.space
Nodes: https://github.com/kijai/ComfyUI-KwaiKolorsWrapper

>Pixart Sigma & Hunyuan DIT
https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
Nodes: https://github.com/city96/ComfyUI_ExtraModels

>Index of guides and other tools
https://rentry.org/sdg-link
https://rentry.org/rentrysd

>View and submit GPU performance data
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance

>Try online without registration
txt2img: https://www.mage.space
img2img: https://huggingface.co/spaces/huggingface/diffuse-the-rest
sd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium

>Share image prompt info
https://rentry.org/hdgcb
https://catbox.moe

>Related boards
>>>/h/hdg
>>>/e/edg
>>>/d/ddg
>>>/b/degen
>>>/vt/vtai
>>>/aco/sdg
>>>/trash/sdg
>>
Blessed to bred you anon
>>
official pixart bigma and lumina 2 waiting room, now with good prompt comprehension
>>
File: ComfyUI_00102_.png (1.25 MB, 1024x1024)
1.25 MB
1.25 MB PNG
Gonna post this here as well:

New open source model, best prompt following outside of ideogram so far.
https://blog.fal.ai/auraflow/
https://huggingface.co/fal/AuraFlow

Towards the right is a cartoon dragon on top of a cliff, to the left is a anthromorphic fox wearing armor riding a horse. The horse is standing on top of a blue cube. In the background there is a flying eagle holding a sun. The sun has a angry face on it.
>>
File: media_GSQL07ragAEYTfq.jpg (168 KB, 1024x1024)
168 KB
168 KB JPG
Model: https://huggingface.co/fal/AuraFlow
Demo: https://fal.ai/models/fal-ai/aura-flow?share=45041643-4b84-4603-b6c8-b76be7869c4f
>There's a green triangle on top of a blue square, and a red sphere on top of the green triangle, and a yellow rabbit on top of the red sphere, and a pink sheep on the right, and a purple tiger on the left, and a black bat on the top right
That's really impressive, this guy is good, only one guy is making a better model than a whole fucking team (SAI), the SAI cucks should be ashamed of themselves
>>
File: 1720762644816254.png (1.23 MB, 1024x1024)
1.23 MB
1.23 MB PNG
>>101375771
And this was the actual first attempt at the prompt that did better.
>>
>>101375771
>>101375778
really promising model, with whatever this guy cooks up next and pixart bigma, we'll be eatin good.
>>
needs 16ch vae tho
>>
>>101375771
>>101375778
that looks like some monty python drawings kek
https://www.youtube.com/watch?v=pLpK_Htw-F8
>>
>>101375730
I use AI and my silly image and sometime porn generator, but does anyone else notice that the shitter an artist is, the more they hate AI?
>>
>>101375803
the 16ch vae exists but in its rough form, needs a bit of training to be adapted to some new models
>>
>>101375811
of course, AI can't reach the best artists level (yet), that's why only the shitty artists are seething, because they realized that they have so few talent the AI managed to beat them in less than a few years of its lifetime
>>
File: ComfyUI_00127_.png (1.49 MB, 1024x1024)
1.49 MB
1.49 MB PNG
>>101375804
It can do better aesthetics if prompted for it. I'm more interested in its prompt comprehension because everything else can be finetuned easily enough.
>>
>>101375771
>best prompt following outside of ideogram so far.
so for you the ranking would be ideogram > this model > dalle3 in terms of prompt understanding?
>>
>>101375843
Yes. And this model is supposedly still early in training.
>>
File: AuraFlow_00021_.png (1.09 MB, 1024x1024)
1.09 MB
1.09 MB PNG
It seems like a lot of the data for this model was pulled off of Ideogram.
Like, sometimes the "not safe" cat will appear at random times, like it has been sprinkled throughout the dataset in response to potentially unsafe prompts. There's nothing wrong with that on the surface, but it raises the question of how long the trainer plans to keep training a model that makes copies of copies and how much new and curated data will be introduced into the dataset?
>>
File: ComfyUI_00129_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
(extremely aesthetic charcoal drawing of a majestic western dragon looking at the viewer, the dragon is sitting on top of a red jeep:1.4), (dark background, rim lighting, epic, detailed background:1.2), (fantasy vibe:1.2), rich colors, high contrast, hard focus, intricate details, natural light, ethereal, expressive, intimate, elegant, vibrant bloom, whimsical, dramatic shadows, medium close-up, 85mm lens, f/2.8, atmospheric, moody, evocative, luxurious, textured, artistic, surreal, detailed, otherworldly
>>
>>101375862
>training a model with synthetic data
when will they learn?
>>
>>101375882
they will never learn...
>>
>>101375771
>best prompt following outside of ideogram so far.
>>101375862
>It seems like a lot of the data for this model was pulled off of Ideogram.
Now we know why the prompt following is ideogram tier, he's just trying to make a copy of ideogram
>>
File: ComfyUI_00130_.png (1.38 MB, 1024x1024)
1.38 MB
1.38 MB PNG
>>101375882
Its the reason why the model is so good at following prompts. There is nothing to learn. The best current models are using mostly synthetic data from image gen to text gen.
>>
>>101375885
They will machine learn
>>
>>101375890
>Its the reason why the model is so good at following prompts.
Doubt. It's how good the dataset is tagged.
>>
>>101375890
>Its the reason why the model is so good at following prompts.
that's bullshit, dalle3 is good at following prompt because they used GPT4V on real pictures the captioning, not by using synthetic data
>>
File: ComfyUI_00132_.png (1.61 MB, 1024x1024)
1.61 MB
1.61 MB PNG
(extremely aesthetic charcoal drawing of a majestic western dragon looking at the viewer, the dragon is driving a red jeep, the dragon is wearing a top hat, the jeeps license plate has the "DRAG" on it:1.4), (dark background, rim lighting, epic, detailed background:1.2), (fantasy vibe:1.2), rich colors, high contrast, hard focus, intricate details, natural light, ethereal, expressive, intimate, elegant, vibrant bloom, whimsical, dramatic shadows, medium close-up, 85mm lens, f/2.8, atmospheric, moody, evocative, luxurious, textured, artistic, surreal, detailed, otherworldly
>>
>>101375890
>The best current models are using mostly synthetic data
models such as?
>>
>>101375899
Wizard, phi, gemma... well filtered mostly synthetic data. Dalle / midjourney, tons of synthetic data...
>>
>>101375905
Itcametomeinavision_xl and imadeitup_1.5
>>
>>101375909
and that's why they are all slopped, and they did that on the FINETUNING stage, not on the pretraining one, you don't pretrain a model with synthetic data that's bullshit
>>
>>101375920
>and that's why they are all slopped
I mean you say that but they are all the best performing models for their size.
>>
File: ComfyUI_00133_.png (1.61 MB, 1024x1024)
1.61 MB
1.61 MB PNG
>>
>I love the look of AI sloppa
>Let's throw some into our model
>>
>>101375928
no, I mean that they are all slopped, /lmg/ complain about that a lot, and LLMs and imagegens aren't 1 to 1 equivalent. You don't become the best by just being a cheap copy of the bests, Midjourney/dalle/chatgpt, they are the bests and they never trained their models on AI slop, they did this on real data, as it should
>>
>>101375942
? Gemma / wizard are some of the best models touted on lmg atm. Phi though sure, but its tiny.
>>
>>101375949
why do you repeat the same arguments like a broken disk or something? you're wasting my time anon
>>
>>101375941
I think there should be some so the model knows it as a concept. SD face, midjourney style etc. Perhaps even what the common artifacts from them look like
>>
File: ComfyUI_00139_.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
(extremely aesthetic charcoal drawing of a majestic western dragon looking at the viewer, the dragon is driving a red jeep, the dragon is wearing a top hat, the jeeps license plate has "DRAG" on it. To the right is a minotaur driving a purple suv, the suv's license plate has "BULL" on it. They are racing towards the camera.:1.4), (dark background, rim lighting, epic, detailed background:1.2), (fantasy vibe:1.2), rich colors, high contrast, hard focus, intricate details, natural light, ethereal, expressive, intimate, elegant, vibrant bloom, whimsical, dramatic shadows, medium close-up, 85mm lens, f/2.8, atmospheric, moody, evocative, luxurious, textured, artistic, surreal, detailed, otherworldly
>>
>>101375965
i don't believe it works like that anon
>>
>>101375941
The trick is to use synthetic data to fill out the gaps in its knowledge. But you balance it out with aesthetic training, then you have the advantages without the "slop" style.
>>
>>101375965
that's not what it is doing though, it's not putting a ideogram picture and making the model understand it's an AI picture, it's training it as if it's a real picture, that's dumb as fuck, the model is learning reality through AI sloppa, it's like recording a VHS out of a VHS, you just lose accuracy with this inbreeding technique
https://www.youtube.com/watch?v=nqy_hYDI0As
>>
>>101375978
>use synthetic data to fill out the gaps in its knowledge
Name a single concept, object, etc that does not have enough real data available.
>>
>>101375974
>>101375983
Well then it's extremely gay and not based at all, technically speaking of course
>>
>>101375987
Find me a equivalent image for each and every concept in this dataset.
https://huggingface.co/datasets/ProGamerGov/synthetic-dataset-1m-dalle3-high-quality-captions
>>
>>101375998
I don't want a model to look like a cheap copy of dalle3, that's insanity
>>
>>101376009
Read my comment again: >>101375978 You can avoid getting the "style" of said images while gaining the "concepts". Its gonna need a lot more training though.
>>
>>101376006
>Models trained or fine-tuned on
ProGamerGov/synt
>Zero text-to-image
I wonder why. You should train one and lets us know how it looks.
>>
>>101376006
The world captures over 5 billion real images daily your argument is invalid
>>
>>101376019
Are you not following the thread / the last one? That is what Auraflow is doing and its looking good even as undertrained as it is so far. Best in class even.
>>
>>101376006
if you're not lazy you could get the same results by training your model with really complex REAL drawings that has a shit ton of stuff in it
>>
>>101376018
>You can avoid getting the "style" of said images while gaining the "concepts". Its gonna need a lot more training though.
Have any examples of this? Until it happens with aura I sleep.
>>
>>101376043
That looks like shit. I honestly prefer the dalle "style."
>>
Any guesses to what that one (1) person as spent on training this? Something isn't adding up.
>>
>>101376047
>>101375941
>>
>>101376018
Nah I don't bite it, an AI picture will always be an approximation of reality, training a model with less than 100% accurate data when 100% accurate data (Real pictures!!) exist is retarded, it should not be done in the pretraining process, for finetuning why not, people are free to make the model more AI sloppa for what I care. A base model should be neutral in the first place so that everyone can mold it in any way they want
>>
File: poor-taste.jpg (51 KB, 450x548)
51 KB
51 KB JPG
>>101376047
>>
Though I don't think people are giving dalle a fair shake either. Actual stylized images on dalle actually dont have a bad style. Is the fake 3D / realism where its shit.

https://cdn-lfs-us-1.huggingface.co/repos/ee/1b/ee1bd318fa77f0f576a7f4f9aed9ef47229a9abd078b2ad9e56f71078c3b5622/c8b1e0635c4e6176e66aec1152002bb5471a5db21eb58774a01d7cd29f785314?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27highlights_grid.jpg%3B+filename%3D%22highlights_grid.jpg%22%3B&response-content-type=image%2Fjpeg&Expires=1721028770&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyMTAyODc3MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VlLzFiL2VlMWJkMzE4ZmE3N2YwZjU3NmE3ZjRmOWFlZDllZjQ3MjI5YTlhYmQwNzhiMmFkOWU1NmY3MTA3OGMzYjU2MjIvYzhiMWUwNjM1YzRlNjE3NmU2NmFlYzExNTIwMDJiYjU0NzFhNWRiMjFlYjU4Nzc0YTAxZDdjZDI5Zjc4NTMxND9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=YPzjiBEaPyaRAtSdUzk%7EYWe6nEpI18gRak199LfejzLWB52Uu9YBYHI9ae8XXbtuOedjkdabxZDkM5-r%7E8Ge4WSUDtiG7nZ-0BBuu5MBu9WUkXfg7UOMmmB3PQSrelM6La0lArVMB-HEjizie80xuN6FUsJpuPswTme3Fsb30s890z-UlS9k2bZixiGjGsDHEwsBXgW1e866SfleDKmLYKnMtd1iwCRBNmiTJ1-g0Ta6DOUs3q0bHRGN6L5xWcGhLkJ6Ld-TKwOMIrdNRstdo7D20pxiwckDU62dV%7EVwb73X%7Emebh8xpxyi970jIZA0gJp7rhgDRbDhF2LfffN9MxA__&Key-Pair-Id=K24J24Z295AEI9
>>
>>101376130
I'm not clicking that big ass link nigga
>>
>>101376038
>its looking good
Prompt comprehension wise because of the way it was tagged. That's it. The fact that it uses ai images in training does not make it better at comprehending a prompt.

>>101376130
>Is the fake 3D / realism where its shit.
It's only more apparent with those styles.
>>
>>101376130
it doesn't matter how "supposedly good" dalle3 is looking, that's not the fucking point, the pretraining process should be about making the AI understanding the reality, and the reality is REAL PICTURES NOT FUCKING AI SLOP MOTHERFUCKER
>>
>>101376135
https://files.catbox.moe/97u9xh.jpg
>>
>>101376050
they do it for the love of the game bless their heart
>>
>>101376144
A curated dataset of only high quality stylized images is what you would use. Im not saying to use fake RL photos. That is where model's like dalle / midjourney suck. For stylized the best images can be nearly impossible to tell if its from AI or not.
>>
>>101376164
>the best images can be nearly impossible to tell if its from AI or not.
You can always tell. You can always tell.
>>
>>101376050
?
>>
>>101376178
How much money has this individual spent on generating synthetic data and GPU rental. How are they funding it?
>>
>>101375882
The creation of /ai/ will come before that and shorty after, the heat death of the universe
>>
File: AiSlopCantBeatThat.jpg (792 KB, 1800x900)
792 KB
792 KB JPG
>>101376164
>For stylized the best images can be nearly impossible to tell if its from AI or not.
That's a lie and you know it, a model shouldn't learn drawings through AI slop, but through real artists, period
>>
>>101376170
>You can always tell.

You can not honestly tell me that there are not some images (mostly stylized ones) that would not appear out of place on any art website. For example, some of the pixel art ones here: >>101376146 / the more textured / stylized ones.
>>
>>101376006
hey, you absolute fucking dumbass. how do you think dall-e was able to learn all that if no prior image of it existed? it's because they used REAL FUCKING IMAGES and captioned them properly. god localjeets are SO FUCKING STUPID they're actually so completely retarded that they willingly sink their own projects at every turn.
YOUR CHINESE SYNTHETIC SLOP MODELS LOOK LIKE SHIT! you will NEVER compete with midjourney and dall-e at this rate because you have NO QUALITY CONTROL
the few somewhat smart people in this thread are all the keeps me from wishing total localoid death. it's actually completely fucking inexcusable how far behind local image models have fallen due to incompetent chinks and self-flagellating ethicsfags
please, for the sake of local development, NEVER POST about ai again. just jump off a bridge you sabotaging faggot
>>
>>101376214
>YOUR CHINESE SYNTHETIC SLOP MODELS LOOK LIKE SHIT! you will NEVER compete with midjourney and dall-e at this rate because you have NO QUALITY CONTROL
Exactly this, how the fuck do you expect to beat midjourney if your only goal is to only be a cheap copy of it
>>
>>101376214
Amen, fucking amen.
>>
>>101376214
midjourney was trained on a ton of dalle gens btw, same with ideogram
>>
>>101376243
I don't believe that, Midjourney pictures look way better than dalle
>>
>>101376164
ai image gen isn't at the point where it can replace real pictures and art, saas or not. with enough synthetic slop it'll just just end up poisoning the model with it's flaws (weird lines, nonsensical colors, stupid fucking fish eye lens effect, centered images, that strange obsession with symmetrical imagery, nonsensical upclose detail, everything is packed into a square resolution so it makes the image feel cramped and claustrophobic). not to mentioned that deepfried look dalle gens have, openai probably adds it intentionally so they can stay out of trouble.
>>
>>101376209
Anon. We disagree on "what looks good" but now the argument isn't about that, it's about being able to tell if an image is AI. If you have trouble pointing out the tells of every single one of those images then it's unironically over for you.
A VAST majority of people dislike the typical AI styles. You may enjoy it but no one else does. That's why >>101375920 is right and it should be entirely omitted during pretraining.
>>
>>101376255
do you really not understand the concept of aesthetic training? You can have your sytheic cake and eat it too
>>
>>101376243
you know its easy to spot an mj gen, right? because of its style?
>>
>Pretrain your model on dalle3 AI sloppa
>Is surprised the image quality is worse than dalle
You can't make this shit up, local fags will never improve if all they're doing is trying to cheap out the pretraining
>>
>>101376276
Don't lump us all together like that. Only a few actually enjoy "AI style".
>>
>>101376265
No you don't, and I need a proof that Midjourney was using dalle to train their models, are you on the MJ team or something?
>>
>>101376273

>>101376265
>>
>>101376214
>YOUR CHINESE SYNTHETIC SLOP MODELS
pretty sure base pixart didn't include any synthetic images in training but since they only published the captions we will never know
>>
>>101376285
Its called using midjourney when it first came out on discord. It was dalle with some aesthetic training on top that slowly diverged away.
>>
>>101376299
And pixart has worse quality images but is also super undertrained.
>>
>>101376284
The problem is that there isn't a lot of people making base imagegen models, and the few that does that are all doing something retarded in the pretraining

>SAI: Poison their model with """safe""" DPO cucking + insane censorship on the dataset
>ComfyUi and pony: Remove all children on the pretraining???
>Hunyuan: That's obvious they use AI sloppa for the pretraining
>Pixart: Looks fine but undertrained
>Kolors: outdated architecture + shit licence
>AuraFlow: Spam its model with ideogram pictures on the pretraining
Fuck man, they all suck at the end of the day
>>
>>101376276
>trying to cheap out the pretraining
I don't get it. Isn't scraping images pretty cheap? Flickr, boorus, public domain etc.

>>101376284
>Only a few actually enjoy "AI style".
I think it's a fun concept, but it should be tagged and trained as a style not something that comes out as a standard.
>>
>>101376315
you cant deny its aesthetically superior especially something like bunline which includes virtually zero AI images
>>101376323
put "dreamshaper" in the negatives of SD3 and post the results
>>
>>101376323
>I don't get it. Isn't scraping images pretty cheap? Flickr, boorus, public domain etc.
It's not cheap at all because you still need to caption those pictures, and make good captions with it, it's way easier just to use ideogram API and write whatever random complex shit you want in it and you'll get a picture that is somehow close to what you've written
>>
>>101376302
Do you have a source or something?
>>
But yea, today I learned that /g is horribly misinformed on synthetic data training. They don't know that you can get away from having the ai "style" while filling gaps if you properly balance the dataset.
>>
>>101376345
Go fuck yourself nigger, if you can't find real pictures to fill the gap, you have serious skill issue
>>
Why can anon kmmediately tell when a lora or tunes dataset is mostly synthetic data?
>>
>>101376345
>>101376360
you need to read this >>101376263
>ai image gen isn't at the point where it can replace real pictures and art, saas or not. with enough synthetic slop it'll just just end up poisoning the model with it's flaws (weird lines, nonsensical colors, stupid fucking fish eye lens effect, centered images, that strange obsession with symmetrical imagery, nonsensical upclose detail, everything is packed into a square resolution so it makes the image feel cramped and claustrophobic).
>>
>>101376355
Find me a image of a anthro fox riding a horse standing on a blue cube while... ect...

You can have both the concepts of the far off concepts. and once you train it enough, the style of the hand made images. The farther off the concept the more "synthetic" the image is going to look due to the limited training so far. That can be fixed with more training / aesthetic training.

Educate yourself.
>>
oh brother
>>
>>101376265
>>101376264
>That's why >>101375920 is right and it should be entirely omitted during pretraining.
>>
>>101376372
>Find me a image of a anthro fox riding a horse standing on a blue cube while... ect...
You're so fucking retarded, why do you pretend you can't find complex prompts out of real pictures? It won't be an anthro fox with whatever bullshit you've written, but if a model is trained with enough complex REAL pictures with accurate captions from it, there's no reasons it can't make new complex things after that, just admit you're a lazy fuck who wants to use AI sloppa to make your job easier, but that's the difference between the best (dalle/midjourney) and you local cucks, they don't take shortcuts to greatness, they don't rely to other AI sloppa to make their model great. You are the inbreeding cancer to this community, you should be ashamed of yourself.
>>
>>101376400
Lol did you work on dalle or something?
>>
>>101376372
>That can be fixed with more training / aesthetic training.
That's the problem retard, if you only pretrain a model with real pictures, you don't need to fix anything in the first place. All you're doing is to add tape to some broken wall, instead of making a good wall in the first place
>>
>>101376409
Did you read the paper or something? They clearly said that they used real pictures + synthetic captions (GPT4V) to achieve this prompt understanding level
>>
File: file.jpg (1.16 MB, 1792x2304)
1.16 MB
1.16 MB JPG
Gen
>>
>>101376410
What image model are we talking about that was pretrained on a fully synthetic dataset?
>>
>>101376423
STOP MISSING THE POINT, A PRETRAINED MODEL SHOULD HAVE 0 SYNTHETIC PICTURES IN IT, KILL YOURSELF
>>
>>101376372
>Find me a image of a anthro fox riding a horse standing on a blue cube while... ect...
it doesn't need that, just put enough pictures of anthro foxes, people riding riding horses and blue cubes and a model with good prompt comprehension will be able to generalize it.
>>
>>101376006
remember the anon who said he was going to use this dataset? i wonder where he is now
>>
>>101376426
ok bud
>>
>>101376332
>good captions
I would like to know what the good captions are. I don't think having to string overly long sentences together is way to go or just having booru style tags. I worry that these new models will be trained so that people have to use text models for translating prompts
>>
>>101376430
pixart devs?
>>
>>101376441
say sike
>>
>>101376435
>I would like to know what the good captions are.
Dunno why this should be a debate, a good caption is something that completely describe the pictures will all the necessary details. And with real sentences, because just using tags leads to confusion

If you write "woman, chair, table, sitting" the model doesn't know if the model is sitting on the chair or the table.
>>
>>101376441
bro?
>>
>>101376444
Check their discord.
>>
>>101376451
screenshot it
>>
>>101376455
I don't remember enough to find it in the search, you look.
https://discord.com/invite/rde6eaE5Ta
>>
>>101376469
nowhere does the dev, the only one i know of lawrence-c, say anything to the effect of "we have used or will use synthetic images in our training"
you are bullshitting
>>
>>101376448
I hope it's like this, natural looking sentences. It just sometimes looks like these text models try to maximize token usage and pad the description like first year university student does when writing an essay. Perhaps that's not terrible either I just wouldn't want to use prompts like that
>>
File: Untitled2.png (8 KB, 727x120)
8 KB
8 KB PNG
>>101376477
Do I have to do everything.
>>
File: stablediffusion03.jpg (286 KB, 1552x1200)
286 KB
286 KB JPG
>>
>>101376485
Yeah, CogVLM does that, it just add unnecessary shit on the caption instead of just being objectively descriptive.
Captioning pictures is easily the hardest part, you can't really do that manually, you have tens of millions of pictures to caption to make a good pretraining, but using captioning models are shit too (they won't do NFSW and add shit fluff and won't be as accurate as humans)

That's the moat of OpenAI, they hired hundreds of african slaves to make manual captions kek
>>
>>101376495
>rando discord user
>talking about a model thats not pixart
are you being purposefully obtuse?
>>
>>101376495
And do we have to repeat everything? LLMs and imagegens aren't equivalent? And no one like Phi this shit is ultra slopped, so if you're trying to make your point out of this LLM models you're failing hard

Besides, it's not because they got good results with synthetic data that it means you can't have the same result with human data, that's a fallacy and you know it, they never did any comparaison to reach that conclusion in the first place
>>
>>101376514
>slop
Not everyone uses these things for porn you know. For real world tasks phi is indeed sota for its size.
>>
File: file.png (21 KB, 613x116)
21 KB
21 KB PNG
>>101376495
if you bothered to scroll up you'd notice he's arguing against using synthetic data. also i have no idea if this guy is a pixart dev or not.
>>
>>101376526
What's your point? We are in the imagegen community, people want AI models that output images as close as possible as reality, people want soulful drawing, they don't want AI sloppa that can put an AI sloppa dog on the top of an AI sloppa green triangle, what are you smoking mate?
>>
>>101376006
Honestly cannot comprehend someone looking at those images and thinking "yeah these look good I should include them in my dataset". Absolutely zero taste.
>>
>>101376501
I think models like https://huggingface.co/internlm/internlm-xcomposer2-vl-7b-4bit should be able to make pretty damn accurate descriptions even for pornographic images
>>
>>101376495
your eyes must be fucked up, not only do you enjoy the look of ai sloppa but your text rendering is trash kek
>>
>>101376526
1) People use synthetic data because they don't have much choice, it's expensive and too much time consuming to do everything by hand, they don't do that because they like it
2) the LLM community accept the sloppa more because they want from their AI objectively good answers, the "aesthetic" part which is the way the AI talk is kinda irrelevant if you want to use it for professional use
3) That's the difference with the imagegen community, we want both. We want a model that produce a picture that is accurate to the prompt, but at the same time we want the "aesthetic" that looks like real life, and that's the moment synthetic data is unwelcomed, because you can't have the cake and eat it too with that AI slop method
>>
>>101376566
Did you use it anon? Is it really more accurate than CogVLM? And can it really caption NFSW pictures?
>>
>>101376583
you had me angry with #1 but by #3 i agreed
>People use synthetic data because they don't have much choice, it's expensive and too much time consuming to do everything by hand, they don't do that because they like it
plenty of people use 100% real data. the ones you mention are simply lazy.
>>
>>101376595
>plenty of people use 100% real data. the ones you mention are simply lazy.
For pretraining that's kinda impossible to do it all alone, you have tens of millions of pictures you need to have good caption out of them. At some point you need to use sythetic captions (still better than using the shitty laion captions)
>>
>>101376320
>>Hunyuan: That's obvious they use AI sloppa for the pretraining
Is that why the skin texture looks so smooth and unnatural?
>>Pixart: Looks fine but undertrained
That's every single anon should be training it. It's a solid base.

>>101376606
Again, you are insane if you're purporting that there isn't enough real world images out there.
>>
>>101376623
>Again, you are insane if you're purporting that there isn't enough real world images out there.
No, I think you're missing my point. I'm all for using 100% real pictures, I'm talking about the caption of those pictures, you won't do them by hands, you need the help of an AI for that
>>
>>101376623
>Is that why the skin texture looks so smooth and unnatural?
No, its because all these models are incredibly undertrained. Undertrained models lack detail.
>>
>>101376630
You're right I misread your reply. The only gripe I have with synthetic captions is they seem to neglect to pick up on specifics as in the caption for Mario would be something to the effect of "mustached man wearing a red hat and trousers".
>>
>>101376593
Did not try it yet, should use linux with my hw and I'm on windows. People I know use it for smut

>Is it really more accurate than CogVLM?
I don't think so
>>
>>101376593
https://huggingface.co/RedRocket/JointTaggerProject
>>
>>101376647
>specifics
Not the right word but I think you get what I mean.
>>
>>101376654
>This model is a multi-label classifier model designed and trained by RedRocket for use on furry images, using E621 tags.
tags suck anon
>>
>>101376642
https://arxiv.org/pdf/2405.08748
I think you're right, they don't mention any synthetic data on their paper
>>
>>101376666
You think writing a novel like a captioner is better? Tags are easier to "capture" the important aspects of a image with.
>>
>>101376630
>I'm talking about the caption of those pictures, you won't do them by hands,
i dream of a day where anon can work together to properly tag by hand an entire dataset big enough to pretrain a great model
>>
>>101376647
yeah, if you only use CogVLM captions to pretrain your model, you'll lose all the artists/celebrities/characters in the process, the wet dream of SAI actually kek
>>
>>101376682
Nah this is bullshit, imagine a woman sitting on a table and there's a chair in front of her. The tags will confuse the model "woman, sitting, chair, table", how the fuck the model is supposed to know the woman is sitting on the chair or on the table. That's why we use sentences and we don't speak like that.

"Retarded, anon, are, you, understand, not, shit, issue, skill"
>>
>>101376684
we can't work together, that would mean putting the pictures on a site and working on them, we would be destroyed by "copyright" really quickly
>>
>>101376696
>"Retarded, anon, are, you, understand, not, shit, issue, skill"
kekd hard
it should be a combination of the two desu or at least still allow me to spam random tags at the end for lulz
>>
How many beams should one use with captioning models? More = more accurate? What the hell is a beam
>>
>>101376731
A beam? What captioning model are you using anon?
>>
>>101376757
Trying out microsoft/kosmos-2-patch14-224
>>
https://fal.ai/models/fal-ai/aura-flow
>Two men arguing with each other, one is screaming "NO AI SLOP" the other says "WHY NOT??"
>>
>>101376810
>A woman walking over a giant multicolored glass ball and is screaming "I'm going to fall!", 90's anime style
>>
File: Dalle-3.jpg (331 KB, 1582x1338)
331 KB
331 KB JPG
>>101376810
We are so far from dalle3 it's not funny anymore :(
>>
File: dalle3.jpg (273 KB, 1435x1303)
273 KB
273 KB JPG
>>101376842
Yeah... dalle3 didn't do the 90's anime style and didn't add any text kek
>>
File: 00096-.jpg (1.74 MB, 1536x2304)
1.74 MB
1.74 MB JPG
>>101376775
it it not the width (or height)?
>>
>>101376893
No clue. I'm bouncing between Florence-2 and Kosmos-2 for really quick & simple captions
>>
>>101376810
Model's alright
But there's one thing that completely kills it, it uses the sdxl VAE which renders it unable to use text and finer details. Another DOA release
>>
they come....
and they go....
>>
>>101376731
> Beam size, or beam width, is a parameter in the beam search algorithm which determines how many of the best partial solutions to evaluate.
More = more accurate according to the model's internal scoring/evaluation, yes.
>>
>>101377238
>make 16 channel vae
>https://huggingface.co/AuraDiffusion/16ch-vae
>dont use it
>use sdxl slop instead
but why
>>
>>101378256
I genuinely don't know, I asked on the discussion tab.
It genuinely seems like there's someone sabotaging open-source by making people take bad decissions. First SD releasing the absolute dogshit that SD3M was, then this...
>>
>>101378256
>>101378276
the 16ch vae was made very recently, the guy behind it talked about making this model a few months before the sd3 release.
>>
>>101376810
>Analog photo of a beautiful girl winking and giving a thumbs up, 8k, intricate details.
Wew lad.
>>
File: 00013-2594262553.jpg (421 KB, 1304x1624)
421 KB
421 KB JPG
>>
>>101378256
https://huggingface.co/fal/AuraFlow/discussions/6
They said they plan oln changing it
great :)
>>
>>101378480
any news about the 1.5 vae?
>>
File: 1.jpg (330 KB, 1005x957)
330 KB
330 KB JPG
>>101376883
it's a prompting issue, these models are not tagged in the same way, yet people expect them to behave like they do. picrel is from march when the same thing happened
>>
>>101379158
>picrel is from march when the same thing happened
it was the same exact prompt used on dalle on march, I guess they changed something on the model since then
>>
https://huggingface.co/fal/AuraFlow/discussions/5
>Uh, this is a big one. 35 GB VRAM. Generating a 1024x1024 on a RTX 4090 takes almost 20 minutes. And it seems to be unhappy with non square ratios? (1024x576)
Holy fuck?
>>
>>101378167
>More = more accurate according to the model's internal scoring/evaluation, yes.
Thanks. I'll try 20 with Florence-2-large-ft
>>
>>101379242
can you show us some pictures with florence captions to see how bad/good it is?
>>
File: summer.jpg (519 KB, 1024x1536)
519 KB
519 KB JPG
>>101379313
>>
>>101379120
https://huggingface.co/ostris/vae-kl-f8-d16
seems like it's already usable with 1.5. the guy mentioned that this is an older test version and a new one is on the way. they have a thread for this on the pixart discord if you're interested in where i found this. they also said something about converting sd 1.5 checkpoints to be compatible with the 16ch vae by merging a lora, but i'm only a layman so i don't understand this, sorry.
>>
>>101379421
it's kinda accurate but not descriptive enough, it doesn't say what's written on her shirt, or that there's some clothes and the iron over the ironing board, desu for SFW pictures it's better to use the sota shit like gpt4v
>>
>>101379421
>>101379441
>A fairly close eye-level indoor full shot shows a young woman in a red t-shirt with the words “Bite me” printed on it in white lettering stands in front of an ironing board in a room with orange and yellow walls. The woman is smiling and looking directly at the camera. She has long red hair pulled back in a ponytail and is wearing white ankle-high socks. The dress she is wearing is a dark red with small black dots all over it and a white flower in the center of the shirt. The ironing table is covered with an orange, yellow, orange, and green floral pattern and has a turquoise metal frame. Clothes are folded and stacked on top of one another on the left side of the table. There is anironing board on the right of the image, with an iron on top, and a flower-shaped green and blue flower on the far right. The walls of the room are painted a mauve pink, white, and yellow, and there is a white narrow bookcase in the background with several stuffed animals on it. The door to the left of the frame is orange and appears to be a door knob. The floor is carpeted in a light beige color.

https://huggingface.co/yayayaaa/florence-2-large-ft-moredetailed
>>
>>101379441
I add captions with wd-v3 to it. It's really neat for loras

>>101379464
Ah yes I had token limit on
>>
>>101379464
Really interesting model, it doesn't go the "gender neutral bullshit" "they" like on CogVLM, it's only descriptive and doesn't add necessary fluff. I'd say it's 60% accurate which could be better but it doesn't make insane mistakes so that's ok I guess. Tbh, captioning models is really important and need to be the priority for improvement, because if you have a local captioner that is as good as humans, that's a fucking jackpot, the problem will always remain the celebrities/artists/characters names though... Maybe one day some model will be good enough to recognize everyone kek
>>
>>101379464
>The door to the left of the frame is orange and appears to be a door knob.
that's a weird sentence, sometimes it has broken english in it
>>
>>101375811
Mediocre people survive only on gate keeping and the status quo. Truly skilled people aren't threatened by changes because they often are the change. Excellent artists would be embracing AI for the time saver it is.
>>
>>101379558
this
>>
>>101379558
The worst part is the hypocrisy of artists, they have no problem copying others artists style, in the video this artist has no problem drawing a copyrighted character (Pomny) but if you want to use his pictures to train your model that's blasphemous to them? Get the fuck out of there!
>>
>>101379586
Artists are left brained and stupid. They don't have critical/abstract thinking and they're also dunning kruger incarnate. There's a reason why they're some of the biggest fart huffers and authoritarians in existence, at least modern artists are.
>>
>>101379201
It's why I'm against ultra large models for local, they should've targeted 24 GB of VRAM. Your generation is taking forever because it's memory swapping.
>>
>>101379663
I'm sure that's because he used the default script provided on huggingface, if he used ComfyUi it would fit on a 24gb vram card
>>
>>101379675
The big test is if you can full tine tune on a 24 GB card. Loras simply don't cut it.
>>
>>101379716
I mean, at this point if we want to compete against API, we need bigger guns, that's Nvdia's fault if they prevent us on improving our craft in the first place, and their next 5090 card will probably be a 28gb vram card, fuck them, seriously
>>
>>101375811
I sometimes check on anti-ai forums
These people can't be past 16, they are so corny and passionate yet they don't understand what they are talking about
>>
>>101379738
You will never have a comprehensive local model that competes against API because API can put their models on 80 GB GPUs. But good news, parameters doesn't scale so a model half the size beats a model twice its size as long as you keep the training domain focused.
>>
>>101379767
> But good news, parameters doesn't scale so a model half the size beats a model twice its size as long as you keep the training domain focused.
That might be true for unet models, but probably not for DiT models, transformers models always scale its quality with parameters. That's why LLMs are insanely huge
>>
>>101379774
Wrong. Doubling the parameters doesn't make a model twice as smart but it certainly quadruples the cost to run it. LLMs have already proven you wrong. Many smaller models perform better than their absurdly large counterparts.
>>
>>101379828
that's not true, if you train a small and a big model exactly the same way, the big model will always be better
LLMs are proving me this righ, look at L1, L2, L3, the biggest base model is always the one with the best benchmarks, always
>>
>>101379843
Do you know how graphs work or do you think 10% better is worth 20x the size?
>>
>>101379850
Moving the goalpost? The topic was that bigger models will always perform better than smaller models if trained the exact same way. And no anon, if you want a non retarded experience with LLM you need at least to go on the 27b size (gemma2-it), smaller models will always be too retarded to be genuinely enjoyable, regardless on how well trained they are, it's just how it is. Benchmarks don't tell all the story
>>
>>101379873
I can't have a conversation with someone who thinks a model 20x bigger for 10% the performance is smart in local. Hey faggot, you don't need a model that can do both photorealism and anime at the same time, let's start there.
>>
File: ComfyUI_Kolors_1737.jpg (566 KB, 1664x2432)
566 KB
566 KB JPG
switched to a more proper node set for Kolors: https://github.com/MinusZoneAI/ComfyUI-Kolors-MZ

now I'm not locked into the diffuser wrappers limited sampler selection
>>
>>101379880
>>
>>101379908
I've noticed this is what idiots do when they have no good arguments. Because surely you must be an idiot if you think a model that is 20x bigger and costs 40x as much to train for a 10% performance gain is smart. Also way to out yourself as an underaged banned zoomer. Anon, you can't even afford a 24 GB GPU.
>>
File: file.png (16 KB, 213x630)
16 KB
16 KB PNG
this always cracks me up lmao
>>
>>101379924
>Anon, you can't even afford a 24 GB GPU.
the fuck you talk about nigger? I can run L3-70b at Q5, I know what I'm talking about, I tried small and big models, and the difference is huge, it's not "10%" like you pretend, you fucking faggot fuck, you're probably one of those copium losers that never tested big models and pretend to know everything. Get the fuck out of there you sub-human
>>
>>101379956
Post the graphs then :)
Cost to train, performance scores, and cost to run.
Show the exponential performance
>>
>>101379828
I can tell you've not used bigger models
You can train smaller models to give the illusion of inteligence, but in the real world Euryale 70b (a llama2 finetune) can still recall a series of events and its consequences than gemma 27b.
Parameter count is king.
t. 56gb vramgod
>>
>>101379983
Show the cost/performance.
>>
>>101379997
sorry dude, open source development shouldnt move at the pace of 10th percentile poorfags
>>
File: graph.jpg (95 KB, 1900x786)
95 KB
95 KB JPG
>>101379969
>>
>>101380015
Anon you can't finetune your 70b model. Local models are useless when only a small percentage of people can train them.
>>
>>101380024
>no labels
I'll take your cropping as a concession.
Cost/parameters/performance
Thanks!
>>
>>101380033
Wrong again
Im worked on a LoRA for a 70B in an A100 instance I rented.
And even if I wasn't it's always a possibility to finetune a 70B model for 150$ tops.
>>
>>101380047
>asks for graph
>gets graph
>"no not like that!!"
concession accepted
>>
>>101380033
the LLM community only use cloud to train their models though, the imagegen model will probably go this path aswell, like the anon said, if we want to move forward, we need to scale up, too bad for lora fags who thought it would always be that way (local training)

>>101380047
You're the one claiming that it's "only a 10%" improvement, do you know you have the burden of proof in consequence or something?
https://en.wikipedia.org/wiki/Burden_of_proof_(philosophy)
So let's go anon, show us your cost/parameters/performance graphs, that's your job now, Thanks!
>>
>>101380064
Your graph is useless without a key. Otherwise I assume that's a graph of your faggotry.
>>
>>101380050
Ive*
hadn't*
I swear Im not an ESL
>>
>>101380124
>Talks about grammar mistakes instead of arguing
>>101379924
>I've noticed this is what idiots do when they have no good arguments.
Kek, the irony.
>>
File: output.jpg (245 KB, 1024x1024)
245 KB
245 KB JPG
>>101379663
With the default huggingface script it takes 24.5 GB on my machine. They can probably bring it under 24, but it's not worth it right now to put it mildly. The current model is worse than SD3, it's in beta so maybe we can expect significant improvements, but definitely not off to a good start.
>>
>>101380124
Wow you trained a Lora on an A100! I bet the quality was excellent and well worth the rental!
>>
File: lmao.jpg (101 KB, 979x825)
101 KB
101 KB JPG
>>101380183
She looks like a suitcase there lmaooo
>>
>https://huggingface.co/datasets/matrixglitch/wikiart-215k
cool
>>
>>
File: hmm.jpg (3.08 MB, 3307x3586)
3.08 MB
3.08 MB JPG
>>101380204
A mix of both tags and florence caption would do the trick, you give florence the tags to help it with the captions so that it can write the artist names with the description
>>
>>101380186
>concession
accepted
And it was, now my model understands anthropomorphic anatomy much better, and also writes what I like better.
>>101380124
That's my message, Im correcting my own post
>>
>>101380255
Florence takes no text input sadly
>>
>>101380261
My argument isn't that you can't rent an A100 to do a tiny model lmao
Of course any of us can rent 4xH100s to finetune a 6B model lmaoo
>>
is there a site like PixArt-Sigma
that uses bing.com AI

I get two different styles with the same prompt
>>
>>101380271
You can rent 2x3090s, or a single 3090 even.
Go back to playing with Dalle3, you have no idea how LLMs work
>>
File: image (56).png (1.47 MB, 1024x1024)
1.47 MB
1.47 MB PNG
>>101380203
>photo of a beautiful woman crying and holding a sign with text "tfw no suitcase gf"
>>
>>101380285
You want to win so bad you completely miss the point of everything. Enjoy your 6B art model with 2 fine tunes and 10 loras. I hope you like the base model :)
>>
>>101380255
that florence caption is kinda bad, no wonder models have trouble understanding our prompts, they are being trained with wrong informations
>>
>>101380311
It doesn't need to be great, it just needs to be mostly right. Remember, SD 1.5 was trained on utter garbage yet managed to learn. The model learns the concept of "red" not from one picture but many pictures with red things.
>>
File: OIG1.jpg (54 KB, 621x621)
54 KB
54 KB JPG
>>101380331
>It doesn't need to be great, it just needs to be mostly right.
And then we wonder why we get destroyed by the API models, we shouldn't think mediocrity is good enough, we must aspire for more than that.

>>101380286
Here's a dalle3 version of your prompt kek
>>
>>101380351
API models are trained by people who care less and use the same tools as us. The difference is they can afford 100xH100s training 24/7.
>>
>>101380358
No, OpenAI hired a lot of humans to do manual caption on pictures, that's why their dalle3 model is so good at prompt understanding. But I agree with you on that point, if you have money, it's easier yeah, that's why they were able to rely on actual humans for captions instead of using florence
>>
>>101380377
Retard if you can't get a clue they used the same vision model as GPT4.
>>
>>101380392
And how did they train GPT4V retard?
>>
>>101380399
It doesn't matter, are you so stupid you think they captioned their entire dataset manually? No, they trained GPT4V and they used that. So, earth to retard, the captions they trained with are likely the exact same as what GPT4V is.
>>
>>101380412
>It doesn't matter
oh yes it matter, it fucking matter, if GPT4V is so good that's because it was trained on a lot of pictures with actual human captions, stop being a retard for a second and accept that you need at some point human labeling if you want to improve your craft
>>
>>101380428
Florence2 is just about as good as GPT4V. I just think you're a massive moron who thinks API models have magic sauce.
>>
>>101380440
>Florence2 is just about as good as GPT4V.
LMAOOOOOOOOOOO, I'm fucking done, my sides!
>>
>>101380449
Okay you're just trolling, so I assume work still sucks trollanon? Can't wait to post centaurgirls tonight?
>>
File: GetFucked.jpg (3.51 MB, 6283x2869)
3.51 MB
3.51 MB JPG
>>101380440
>Florence2 is just about as good as GPT4V. I
https://www.youtube.com/watch?v=ciG0FvIUxKM
>>
>>101380563
Haven't followed the reply chain but
>The painting is rich in texture...
Is maximum retarded
>>
>>101380563
I thought you faggots hated long verbose prompts with superfluous language?
>>
File: aaa.jpg (221 KB, 1766x1234)
221 KB
221 KB JPG
>>101380589
>>101380584
I still prefer an accurate model with unnecessary fluff rather than a model that just gives false informations. You can talk to gpt4v and ask him to be more concise, you can't talk to florence so it kinda suck
>>
>>101380654
>do not make any interpretations like...
>
>
>
>this painting is rendered with a high level of detail...

I truly despise the idea of needing to include that kind of information in my prompt, but maybe you can get it to condense even more I do not know
>>
>>101380654
None of that information was false, it was incomplete. It is a group of men carrying a large cloth. There is a man in a blue shirt on the left. There is two men wearing red shirts on the right. The ChatGPT model is full of superfluous language and assumptions, in fact there's a lot more red herring and wasted tokens in the ChatGPT prompt. It's the completely opposite problem.
>>
>>101380693
you only need to do it once and let the API caption your thousands of pictures though

>>101380697
>incomplete
still more accurate and complete than florence, which was the original point, focus anon focus...
>>
>>101380712
It's not more complete, it's completely wrong if your goal is to caption an image for an AI model to learn. I already said it once, AI models don't need complete information to learn, just mostly correct information.

In reality that caption should be:

"A realism painting featuring impasto fine details and brushwork of a group of Asian men on a fishing boat moving a large bundle of cloth and rope which appears to be heavy."
>>
>>101380693
>I truly despise the idea of needing to include that kind of information in my prompt, but maybe you can get it to condense even more I do not know
Looks like gpt4v is making this kind of fluff at the very last sentence, you could make a python script that remove the last sentence to be sure you won't get that shit, dunno if it's always the case though, it's a trial and error I guess
>>
>>101380764
Only a sentence and a half of that entire output is actually good
>>
>>101380762
>It's not more complete
of course it's more complete, florence doesn't say they're carrying ropes, or that they are on a boat like gpt4v does. It's just not precise enough >>101380563

>I already said it once, AI models don't need complete information to learn, just mostly correct information.
I disagree with that, you give the model wrong/incomplete information, it will output shit because it learned that way, dunno why you believe that the quality of the data or the caption don't matter, they matter anon, it's probably the most important thing in machine learning
>>
>>101380776
This
>>
>>101380785
Anon you don't need to literally label every thing in a picture, believe it or not it's smart enough to know a rope is in a picture from other images which were correctly captioned with "rope".
>>
>>101380776
>>101380786
relative to florence, it's good, I don't get why you critisize gpt4v so much when at the end of the day you use a worse model (florence) to caption your pictures, are you retarded or something?
>>
>>101380796
facts don't care about your feelings anon, dalle3 is the best at prompt understanding because it was being trained with the best captioner model, gpt4v. You can make as many mental gymnastics as you want, the reality is here
>>
>>101380798
Because GPTV4 costs money and Florence2 can caption an image every half second for free?
>>
>>101380831
Finally! I prefer that answer rather than coping with "florence is as good as gpt4v" >>101380440
https://www.youtube.com/watch?v=Ha7HAG6jVqc
>>
>>101380811
DE3 is one of the ugliest large models and if SAI didn't completely drop the ball SD3 would've smoked DE3. You just sound like an OpenAI fag. And for prompt adherence? DE3 is actually shit.
>>
>>101380842
>And for prompt adherence? DE3 is actually shit.
>>
>>101380837
Florence is 90% as good as GPT4V. And if you combine Florence with WDV3 it will get you an extremely good model. Florence's tiny captions are also very good.
>>
>>101380852
Yes anon, or have you used it? I know you have selective memory and bias but if you actually paid attention to DE3 it's very much like SD 1.5 in how it gachas your prompts. You are conflating esoteric knowledge with actual prompt adherence. Just because it shows Wario robbing an ATM from the view of a security camera doesn't mean it was actually faithful to the prompt. It also gets much worse the more detailed you are in the prompt.
>>
>>101380885
Give me models that are better at prompt understanding than dalle3 so I can laugh some more
>>
File: file.png (1004 KB, 1788x991)
1004 KB
1004 KB PNG
>>101380896
DE3 is so heckin good at prompt adherence!!!!
>>
>>101380921
Can you simply answer this simple question? You also seem to have trouble at prompt understanding >>101380896
>Give me models that are better at prompt understanding than dalle3 so I can laugh some more
>>
>>101380938
No, I proved DE3 sucks at prompt adherence and it certainly sucks at image quality and hallucinations.
>>
>>101380950
>No, I proved DE3 sucks at prompt adherence
Doesn't prove that DE3 isn't the best at it though

> it certainly sucks at image quality and hallucinations.
Irrelevant goalpost moving, looks like you also like to add verbose fluff to your text
>>
>>101380969
I'd expect the gold standard of caption makers to have fantastic prompt adherence. I guess not. Anyways have fun with your DE3.
>>
>>101380987
>I'd expect the gold standard of caption makers to have fantastic prompt adherence.
I don't expect anything from the best, they know better than anyone how to make their craft, if you think they are so bad, then go ahead and show them how it should be done , we're waiting for your model that will be SOTA at prompt understanding :^)
>>
>>101379889
>non DiT model
WHY?
>>
>>101381099
Ikr, if they went for DiT we would've gotten a top tier local model...
>>
>>101379889
Damn good pic
>>
>>101379889
Does it work with windows?
How many vram does it ask?
>>
Why is DiT considered as being so good? I have zero understanding of this stuff but purely from a visual perspective all these local DiT models preform poorly overall, take longer to gen and are harder to train. Am I missing something?
>>
https://fal.ai/models/fal-ai/aura-flow
kek
>>
>>101381227
It's easier to train, Pixart Sigma is one of the easiest models to train out there, trivial to add nudity to it compared to SDXL.
>>
>>101381227
>Why is DiT considered as being so good?
When you look at the benchmarks, it just beats unet everywhere, and SORA (a DiT model) showed how far you can go with that technology
https://www.youtube.com/watch?v=lKM-QMnZ3yY
>>
File: flow1.png (1.55 MB, 1024x1024)
1.55 MB
1.55 MB PNG
>>
>>101380050
it wouldnt be bad if they just needed to be slightly changed/tuned, but because of the safety cocksuckers the models need to be partially overwritten to add knowledge of nsfw (since the training datasets are going to be pruned of it), and thats going to need way more resources than something that already knows it and just has some guardrails like llms
>>
>>101381248
Ahh my bad, I was under the impression it took a lot more vram = not accessible to local training but I'm now assuming that's model specific and not a DiT thing
>>101381290
Damn that's actually really crazy, couldn't tell it was ai from my mobile screen. Thanks for showing me, anons
>>
downloading auraflow, hopefully its good
>>
>>101381770
super under cooked, even more so than base pixart so temper your expectations. they say it's more like a beta 0.1v proof of concept. probably open sota for prompt comprehension though.
>>
>>101380183
Use a higher cfg for humans.
>>
>We worked on building the 16ch-vae https://huggingface.co/AuraDiffusion/16ch-vae when we were in the middle of v0.1 pre-training, hoping to leverage it for v0.2!

That's good.
>>
>>101381815
>probably open sota for prompt comprehension though.
even better than sd3?
>>
>>101381904
from the samples i've seen posted here, yeah i'd say so.
>>
File: aa.jpg (157 KB, 1530x1694)
157 KB
157 KB JPG
Everyone arguing for florence vs gpt4v; what about this one?
https://huggingface.co/OpenGVLab/InternVL2-40B
>>
File: auraflow.png (576 KB, 408x628)
576 KB
576 KB PNG
>>
Any negatives for using Huber loss? There has to be some downside
>>
>>101382398
lmao that's not bad at all
>>
>>101382398
kekd
>>
>>101382408
I think this model will be sota when its trained more. It looks like they are gonna train from scratch for 16 chan vae for 0.2
>>
>>101382425
I just hope he'll stop using ideogram outputs to pretrain his models though
https://reddit.com/r/StableDiffusion/comments/1e1ktdh/auraflow_sure_does_like_making_the_ideogram/
>>
Bunch of base model comparisons including aura flow. Just click on a image to see it across the base models.

https://images.flrty.li/
>>
>>101382532
>no pixart
pixartsexuals, this open mockery will not be forgotten! they spit on our faces, but not for long!
>>
>>101382532
Auraflow's style is actually coming along good, its just extremely undertrained and so is going to have that smooth undetailed look for a lot of them.
>>
>>101382532
>Anime character illustration of a cheerful karate girl wearing a white gi and headband, jumping kick pose. Expressive manga-style linework.
Midjourney looks so good
>>
Any sampler/scheduler recommendations for AuraFlow?
>>
>>101382398
>408X628
it can do sub 1024px as well?
>>
>>101382407
can't really see any particular downside
>>
>>101382746
I remade lora using Prodigy + Huber loss. It seems to counter the usual Prodigy overfitting issue. Almost too good to be true.
>>
>>101376243
No it was not moron. I have listened to every single Midjourney developer chat.
>Office hours 4/17: Midjourney does not train on its own images and does not train on AI images
And if you don't believe me I'll ask him again next time and record it. You are making shit up now for the SOLE PURPOSE of sabotaging local models. get the fuck out of this thread
>>
>>101382906
based
>>
>>101382792
Haven't found any instance where it has been noticeably bad at bmaltais default values, it basically always either helped or seemingly did nothing in particular. Almost everything else is more tricky.
>>
>>101382906
Of course they can not publicly say they trained on dalle outputs, they would be possibly liable then.
>>
>>101383186
>they would be possibly liable then.
liable for what?
>>
File: file.png (1.37 MB, 1024x1024)
1.37 MB
1.37 MB PNG
>>101378888
>>101376810
>>101376842
I do like the tortured AI jank from hell aesthetic
>>
>>101383254
openai forbids training on its images. Also making datasets public is never a good idea with how grey of a legal area it all is.
>>
>>101383186
Generally speaking overt lies are illegal when it comes to business. So if you trained on something and then directly lie about it, that can come back to haunt you in many ways. It's better to say nothing.
>>
>>101382441
its over
>>
>>101383360
It's just begun. It has sota prompt adherence + and style when prompted decently is not bad so far: https://images.flrty.li/

Its just extremely undertrained.
>>
File: lk2sjjbvu4cd1.png (787 KB, 720x960)
787 KB
787 KB PNG
>>101382441
It is totally insane he did that, it will considers the errors in the AI as valid data and it will break even more. Too many limbs in the training image, no problem, it will be considered valid data...
>>
>localjeets now slopping up synthetic garbage thinking it's better than real data
psyop success, enjoy remaining forever in last place
>>
>>101382441
Why... WHY???
>>
>>101383375
yes just like with pixart, hunyuan and kolors, just 2 more weeks till someone (not me) trains them more
>>
File: sddefault.jpg (33 KB, 640x480)
33 KB
33 KB JPG
>>101382441
I'm so tired of those retards, is there a single man not doing retarded things in the imagegen community?
>>
man people are really trying to fudd the new model, huh?
>>
>>101383439
me
too bad im not training models :/
>>
>>101383396
t: homosexual
>>
>hitting reply limit after only 12 hours
>>
File: FuckOff.jpg (1.26 MB, 2052x2067)
1.26 MB
1.26 MB JPG
>>101382441
Not only he decided to poison his model with AI slop, but he didn't even bother removing the censored pictures, what kind of an amateur moron must you be to end up there??
>>
add auraflow not safe cat to collage
>>
>>101383493
Front loaded thread with lots of discussion around Auraflow.

So the bakery just opened and put out some fresh bread
>>101383507
>>101383507
>>101383507
>>
File: ComfyUI_00155_.png (1.21 MB, 1024x1024)
1.21 MB
1.21 MB PNG
Whatever he is doing is working well. I hope he continues and ignores all the people who think they know better / are trying to disparage him.
>>
>>101383528
yeah, amateur or not, it's good there's another player in the field.
>>
>>101383528
>t. cocksucker
>>
>>101383528
>>101383549
fuck off aura devs
>>
>>101383552
>t. disingenuous troll
>>
>>101382441
I hope someone will tell him on twitter that he's going to the wrong direction, he's wasting his and our time with this bullshit
>>
>>101383556
its one guy SD dev.
>>
>>101383559
Asking him to remove the censored pictures so that the model won't see a fucking fat cat everytime a controvertial prompt appears is trolling? The fuck is wrong with you retard?
>>
>>101383581
That is horseshit, it certainly does not do that, are you retarded?
>>
>>101383593
IT DOES THAT YOU FUCKING MONGOLOID >>101383503 >>101382441
https://github.com/comfyanonymous/ComfyUI/issues/4007#issuecomment-2225633909
>>
>>101383601
Did you not even read it?
>>
File: VuWillBeHappy.png (1.41 MB, 1013x1024)
1.41 MB
1.41 MB PNG
>>101383601
>Vu will let the AI train on AI slop
>Vu will let him add the ideogram censored cat pictures in the pretraining
>Vu will be happy
>>
>>101383601
I don't believe it. The cat images are all exactly the same, not a hair / pixel off. Bet that redditor is bullshitting us.
>>
File: bdepnhwut4cd1.png (498 KB, 628x628)
498 KB
498 KB PNG
>>101383722
I got one when playing it around with ComfyUi, yeah it doesn't look as good as on the previous one, but the cat indeed is there if you wanna try "non safe" prompts
>>
>>101383739
Give me the exact prompt / seed that gives you the cat.

Anyone who believes this >>101383503 is a retard. It is impossible to generate the same exact pixel perfect cat in those gens.
>>
>>101383770
have you not used the model yet? kek i got the cat within maybe 15 minutes
>>
>>101383782
Ive used it for hours now. Not once did I get a cat. Give me the prompt / seed or be proven a troll.
>>
>>101383770
>Fantasy art of skeleton king, death god
that one gave me the cat quickly
>>
>>101383792
my prompt wasn't even nsfw im not trolling, anon
>>
>>101383796
>>101383807
>still avoiding giving a exact seed / prompt combo. Fucking disingenuous troll.
>>
>>101383820
try an overt nsfw prompt jesus slowpoke anon
>>
>>101383820
you want a coffee with that aswell fucker? like I said it's easy to get one, just try it you won't wait for long, disingenuous shill
>>
>>101383853
>still avoiding giving a exact seed / prompt combo .
Thank you for your admission.

>>101383841
nsfw just gives barbie dolls / garbled anatomy, it clearly does not contain many nsfw images, but it certainly does not give you a cat.
>>
File: Capture.jpg (211 KB, 1920x1375)
211 KB
211 KB JPG
>>101383879
>still avoiding giving a exact seed / prompt combo
you want all the details? fine, go for that one. What excuse are you gonna find now?
>>
File: ComfyUI_00167_.png (1.32 MB, 1024x1024)
1.32 MB
1.32 MB PNG
graffiti of a nude woman on concrete wall, the woman in standing on top of a red cube on top of a green ball, masterpiece

No cat, ill try >>101383899 next
>>
File: aura-output.jpg (245 KB, 1024x1024)
245 KB
245 KB JPG
>>101383796ű
>Fantasy art of skeleton king, death god
I can't reproduce anything approximating a cat after dozens of gens. I'm using it through hf diffusers, maybe that's a factor.
>>
meow bros..?
>>
>>101382441
This is what happens when your dataset is made up of primarily AI-generated images. Why people completely forgot how to scrape properly is beyond me. Seems to be a trend with recent local models where developers are resorting to low-hanging trash-tier datasets made up of Dall-E/MidJourney outputs instead of gathering their own real images to train on.

Sad to see local models going completely backwards. Continuously shooting themselves in the foot in order to remain 'ethical' and 'safe'. Just scrape artstation, flickr, etc already and assemble a good dataset or just don't even bother at this point. Each local model somehow gets worse dataset wise, with SD 1.5 having an absolutely massive dataset with a wide range of styles, and cascade/sd3 gutting, no exaggeration, over 98% of the dataset due to 'safety' concerns.

Stop training on ai-generated junk. Learn to scrape.
>>
File: Proof.png (2.01 MB, 2537x1269)
2.01 MB
2.01 MB PNG
Oh look. Fucking troll
>>
it's meowver...
>>
>>101383899
>seed 1588
>>101383987
seed 1589
are you retarded?
>>
>>101383987
seed would be 1587
>>
>>101384005
Are you? Its on increment. It generated at 1588. Here is the image with metadata:
https://files.catbox.moe/p0pqd3.png
>>
>meo-ACK
>>
>>101384015
>1588
should be generated with seed 1587
>>
official cat waiting room
>>
>>101384082
AHAH GATCHA BITCH, LETS GOOOO
>>
>>101384082
Now apologize to the pussy
>>
what the meow
>>
File: Wellshit.png (1.67 MB, 2044x1268)
1.67 MB
1.67 MB PNG
>>101384067
Wait, I fucked that one up. Wtf, there really is a cat at seed 1587
>>
File: cat-well-well-well.gif (21 KB, 220x220)
21 KB
21 KB GIF
>>101384082
>>101384102
ahah, stupid bitch, who's the retard now?
>>
>>101384102
what the fuck
>>
Ahahahahahaa thanks for the lolz anon you fucking massive gorilla retard
>>
>>101384106
What's another seed it pops up at? Makes absolutely no sense for it to be pixel perfect across several seeds. Its not how these models work. I still think that post is trolling.
>>
>>101383770
>Anyone who believes this >>101383503(You) is a retard.
>>101384102
>Wtf, there really is a cat at seed 1587
WELL WELL WELL
>>
>>101384117
>"I need a proof"!
>*Provide the proof*
>"NO NOT LIKE THAT"
Can you stop the denial for 5 seconds?
>>
>>101384119

>>101384117
And it clearly has nothing to do with censorship. It seems random. He certainly needs to filter that out before 0.2
>>
THE
ABSOLUTE
STATE
OF
LOCAL
AHAHAHAHAHAHAHAHAHA
>>
>>101384117
theres probably just a decent amount of the exact same cat image
>>
reminder that pixart bigma will never do this to us
>>
>>101384144
I'll be in denial that the cat is the exact same across them all because that should be impossible.
>>
>>101384145
>And it clearly has nothing to do with censorship.
He just scrapped a shit ton of ideogram pictures without bothering to remove the "censored pictures aka the cat ones", it's not that deep, he's a total amateur
>>
I guess if he had to many that it made up a significant amount of the dataset it might get so incredibly overfitted to that point.
>>
>>101384162
Moving the goalpost? We just proved that this retard added the big censored ideogram cat into the pretraining process, what a fucking retard he is
>>
>>101384161
bigma...... my special bigma....
>>
>>101384178
yeah, if it was just one picture or two, the model would've never learned to reproduce this picture as well, the simple fact it's almost a 1:1 reproduction makes me believe there's probably tens of thousands of those cat pictures on his pretraining dataset
>>
get on your knees and accept my seed
>>
>>101384102
I really thought that chink was smart by making his own architecture + training script, and then he does this... is this the mighty power of autism?
>>
>>101384243
I mean hes done a great job otherwise and 0.1 is apparently a proof of concept. 0.2 is supposed to train from scratch with a 16 channel vae, hopefully he also filters the dataset then.
>>
>>101384258
>0.2 is supposed to train from scratch with a 16 channel vae, hopefully he also filters the dataset then.
Praying he does not fall into the same mistakes as 0.1
>>
>>101384258
he needs to redo all the pretraining again, that cat has poisoned his v0.1 model hard, can't go back and undo that process. Might be a good opportunity to actually do a good job and stop relying on AI slop to pretrain your models
>>
>>101383528
do you recind this post, anon?
>>
>>101384258
>hopefully he also filters the dataset then.
i mean, he'd have to redo the entire thing since it's probably 90% ideogram. explains the great prompt adherence since all his images are now well captioned, but at the cost of image quality and heavy sloppification. and cat.
>>
>>101383722
>I don't believe it. The cat images are all exactly the same, not a hair / pixel off. Bet that redditor is bullshitting us.
>>101383593
>That is horseshit, it certainly does not do that
>>101383770
>It is impossible to generate the same exact pixel perfect cat in those gens.
>>101383879
>it certainly does not give you a cat.
FAMOUS LAST WORDS OHNONONONONO
>>
>>101384301
I think there is a balance to be had. Remove the cat for sure though, all million of them for it to overfit that hard. The actual style that is starting to emerge is not really slop https://images.flrty.li/ just smooth / detail-less due to not enough training.
>>
>>101384349
Shut the absolute fuck up it is slop
>>
>>101384349
Come on just give it up already, he should make a pretrained model without any AI slop, and then it's up to the users (us) to add AI slop if we feel like it, by doing as it is he's forcing everyone to eat his shit AI sloppa, fuck off
>>
if there's anything that aura diffusion shows us it's that a well captioned dataset really do make or break prompt adherence. i didn't think the gap would be this big bros.. wish we had an army of nigerans like openai.
>>
>>101384372
The style is not like ideograms though. It's clearly diverging greatly due to whatever else the dataset contains.
>>
>>101384396
>goalpost moved
just admit you lost anon
>>
>>101384396
I don't give a fuck, no AI sloppa on the pretraining, should be a fucking golden rule, who's retarded idea it is to train an AI model with AI pictures that fucks up limbs, anatomy, perspective, lightning in the first place WHEN BILLIONS OF REAL LIFE PICTURES EXIST AND DEPICT REAL LIFE IN 100% ACCURACY
>>
>>101384424
Likely cause of the possible legal issues.
>>
>>101384441
License your model properly and no one will care.
>>
>>101384441
lol, lmao even, he doesn't share his dataset, no one will know what picture he used in the first place, like OpenAI they also train their model on copyrighted shit but no one can prove anything so they're in the clear. They have no obligation to reveal that
https://youtu.be/mAUpxN-EIgU?t=264
>>
>>101384471
Whoever is funding the thousands of gpus though might care.
>>
>concern trolling
>>
>>101384478
Doesn't he do everything by himself though?
>>
>>101384490
maybe the actual training but I doubt he is bank rolling it all.
>>
>>101384512
I mean, OpenAI was able to pretrain a giant models like gpt4 and dalle3 with copyrighted data without much trouble, dunno why it would be impossible for him to do it aswell, with a much smaller model too. And like I said, I think he does everything by himself, even the gpu and pretraining so... he's just a lazy fuck, he didn't even bother to remove the cat from the ideogram scrapping, that's crazy
>>
>>101384441
What legal issues? Midjourney for example shows openely the artist tags and the celebrities, are they dead? nope
>>
>>101384550
they are both already established and have armies of lawyers / microsoft backing them with infinite money.
>>
>>101384565
>What legal issues?
State v. The Visions and Anon v. The Voices
>>
>>101384570
In the same time they are heavily scrutinized, that chink, no one knows him, he could've even pretrain his model and release it to the hood on 4chan (like llama1 and NovelAI leak), what are they gonna do?
>>
>>101384550
why are you comparing some guy in his basement to openai?
>>
>>101384602
OpenAI has it actually harder, the whole world have eyes on them, it means way more chance to find anti-AI fags willing to destroy them, it's way better to work in the shadow anon, way way better
>>
This thread is fun
>There's no ideogram cat in the pretraining you're retarded if you think otherwise
>Ok... there's the ideogram cat in the pretraining, but the idea of pretraining with AI sloppa is good
>Ok it's not that good... but... but da legal issues!!!
Holy moving the goal post!
>>
>>101384578
As if training pictures with AI is a better way to avoid legal issues, don't forget that the model producing those AI pictures were trained with copyrighted pictures, therefore those AI pictures are also in the gray area
>>
>>101384617
yeah, honestly i think you're right. there's no other explanation for him using so many unfiltered ideogram gens that the model learns to do a pixel perfect safety cat besides pure laziness.
>>
>>101384672
I would even say that it's kinda retarded to reveal to everyone that you used ideogram to pretrain your model, what if ideogram wants to make a cease and desist out of its outputs?
>>
>>101384658
Why do they even do this? Is it really an elaborate ploy to sabotage local models by convincing guidable chinks that training on midjourneyslop is the path forward?
>>
>>101385343
It's probably a good way of preventing the local ecosystem from catching up with the APIs, pushing them to shoot themselves in the foot with "ethical" training or with AI sloppa poisoning, if you want my genuine opinion, it's just sad. We could achieve so much better without those retards.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.