General dedicated to creative use of free and open source text-to-image modelsPrevious /ldg/ bread : >>101344420>Beginner UIFooocus: https://github.com/lllyasviel/fooocusEasyDiffusion: https://easydiffusion.github.ioMetastable: https://metastable.studio>Advanced UIAutomatic1111: https://github.com/automatic1111/stable-diffusion-webuiSwarmUI: https://github.com/mcmonkeyprojects/SwarmUIInvokeAI: https://github.com/invoke-ai/InvokeAIComfyUI: https://github.com/comfyanonymous/ComfyUI>Auto1111 forksSD.Next: https://github.com/vladmandic/automaticAnapnoe UX: https://github.com/anapnoe/stable-diffusion-webui-ux>Use a VAE if your images look washed outhttps://rentry.org/sdvae>Model Rankinghttps://imgsys.org/rankings>Models, LoRAs & traininghttps://civitai.comhttps://huggingface.cohttps://aitracker.arthttps://github.com/Nerogar/OneTrainerhttps://github.com/derrian-distro/LoRA_Easy_Training_Scripts>Kolorshttps://gokaygokay-kolors.hf.spaceNodes: https://github.com/kijai/ComfyUI-KwaiKolorsWrapper>Pixart Sigma & Hunyuan DIThttps://huggingface.co/spaces/PixArt-alpha/PixArt-Sigmahttps://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiTNodes: https://github.com/city96/ComfyUI_ExtraModels>Index of guides and other toolshttps://rentry.org/sdg-linkhttps://rentry.org/rentrysd>View and submit GPU performance datahttps://vladmandic.github.io/sd-extension-system-info/pages/benchmark.htmlhttps://docs.getgrist.com/3mjouqRSdkBY/sdperformance>Try online without registrationtxt2img: https://www.mage.spaceimg2img: https://huggingface.co/spaces/huggingface/diffuse-the-restsd3: https://huggingface.co/spaces/stabilityai/stable-diffusion-3-medium>Share image prompt infohttps://rentry.org/hdgcbhttps://catbox.moe>Related boards>>>/h/hdg>>>/e/edg>>>/d/ddg>>>/b/degen>>>/vt/vtai>>>/aco/sdg>>>/trash/sdg
Blessed to bred you anon
official pixart bigma and lumina 2 waiting room, now with good prompt comprehension
Gonna post this here as well:New open source model, best prompt following outside of ideogram so far.https://blog.fal.ai/auraflow/https://huggingface.co/fal/AuraFlowTowards the right is a cartoon dragon on top of a cliff, to the left is a anthromorphic fox wearing armor riding a horse. The horse is standing on top of a blue cube. In the background there is a flying eagle holding a sun. The sun has a angry face on it.
Model: https://huggingface.co/fal/AuraFlowDemo: https://fal.ai/models/fal-ai/aura-flow?share=45041643-4b84-4603-b6c8-b76be7869c4f>There's a green triangle on top of a blue square, and a red sphere on top of the green triangle, and a yellow rabbit on top of the red sphere, and a pink sheep on the right, and a purple tiger on the left, and a black bat on the top rightThat's really impressive, this guy is good, only one guy is making a better model than a whole fucking team (SAI), the SAI cucks should be ashamed of themselves
>>101375771And this was the actual first attempt at the prompt that did better.
>>101375771>>101375778really promising model, with whatever this guy cooks up next and pixart bigma, we'll be eatin good.
needs 16ch vae tho
>>101375771>>101375778that looks like some monty python drawings kekhttps://www.youtube.com/watch?v=pLpK_Htw-F8
>>101375730I use AI and my silly image and sometime porn generator, but does anyone else notice that the shitter an artist is, the more they hate AI?
>>101375803the 16ch vae exists but in its rough form, needs a bit of training to be adapted to some new models
>>101375811of course, AI can't reach the best artists level (yet), that's why only the shitty artists are seething, because they realized that they have so few talent the AI managed to beat them in less than a few years of its lifetime
>>101375804It can do better aesthetics if prompted for it. I'm more interested in its prompt comprehension because everything else can be finetuned easily enough.
>>101375771>best prompt following outside of ideogram so far.so for you the ranking would be ideogram > this model > dalle3 in terms of prompt understanding?
>>101375843Yes. And this model is supposedly still early in training.
It seems like a lot of the data for this model was pulled off of Ideogram. Like, sometimes the "not safe" cat will appear at random times, like it has been sprinkled throughout the dataset in response to potentially unsafe prompts. There's nothing wrong with that on the surface, but it raises the question of how long the trainer plans to keep training a model that makes copies of copies and how much new and curated data will be introduced into the dataset?
(extremely aesthetic charcoal drawing of a majestic western dragon looking at the viewer, the dragon is sitting on top of a red jeep:1.4), (dark background, rim lighting, epic, detailed background:1.2), (fantasy vibe:1.2), rich colors, high contrast, hard focus, intricate details, natural light, ethereal, expressive, intimate, elegant, vibrant bloom, whimsical, dramatic shadows, medium close-up, 85mm lens, f/2.8, atmospheric, moody, evocative, luxurious, textured, artistic, surreal, detailed, otherworldly
>>101375862>training a model with synthetic datawhen will they learn?
>>101375882they will never learn...
>>101375771>best prompt following outside of ideogram so far.>>101375862>It seems like a lot of the data for this model was pulled off of Ideogram.Now we know why the prompt following is ideogram tier, he's just trying to make a copy of ideogram
>>101375882Its the reason why the model is so good at following prompts. There is nothing to learn. The best current models are using mostly synthetic data from image gen to text gen.
>>101375885They will machine learn
>>101375890>Its the reason why the model is so good at following prompts.Doubt. It's how good the dataset is tagged.
>>101375890>Its the reason why the model is so good at following prompts.that's bullshit, dalle3 is good at following prompt because they used GPT4V on real pictures the captioning, not by using synthetic data
(extremely aesthetic charcoal drawing of a majestic western dragon looking at the viewer, the dragon is driving a red jeep, the dragon is wearing a top hat, the jeeps license plate has the "DRAG" on it:1.4), (dark background, rim lighting, epic, detailed background:1.2), (fantasy vibe:1.2), rich colors, high contrast, hard focus, intricate details, natural light, ethereal, expressive, intimate, elegant, vibrant bloom, whimsical, dramatic shadows, medium close-up, 85mm lens, f/2.8, atmospheric, moody, evocative, luxurious, textured, artistic, surreal, detailed, otherworldly
>>101375890>The best current models are using mostly synthetic datamodels such as?
>>101375899Wizard, phi, gemma... well filtered mostly synthetic data. Dalle / midjourney, tons of synthetic data...
>>101375905Itcametomeinavision_xl and imadeitup_1.5
>>101375909and that's why they are all slopped, and they did that on the FINETUNING stage, not on the pretraining one, you don't pretrain a model with synthetic data that's bullshit
>>101375920>and that's why they are all sloppedI mean you say that but they are all the best performing models for their size.
>I love the look of AI sloppa>Let's throw some into our model
>>101375928no, I mean that they are all slopped, /lmg/ complain about that a lot, and LLMs and imagegens aren't 1 to 1 equivalent. You don't become the best by just being a cheap copy of the bests, Midjourney/dalle/chatgpt, they are the bests and they never trained their models on AI slop, they did this on real data, as it should
>>101375942? Gemma / wizard are some of the best models touted on lmg atm. Phi though sure, but its tiny.
>>101375949why do you repeat the same arguments like a broken disk or something? you're wasting my time anon
>>101375941I think there should be some so the model knows it as a concept. SD face, midjourney style etc. Perhaps even what the common artifacts from them look like
(extremely aesthetic charcoal drawing of a majestic western dragon looking at the viewer, the dragon is driving a red jeep, the dragon is wearing a top hat, the jeeps license plate has "DRAG" on it. To the right is a minotaur driving a purple suv, the suv's license plate has "BULL" on it. They are racing towards the camera.:1.4), (dark background, rim lighting, epic, detailed background:1.2), (fantasy vibe:1.2), rich colors, high contrast, hard focus, intricate details, natural light, ethereal, expressive, intimate, elegant, vibrant bloom, whimsical, dramatic shadows, medium close-up, 85mm lens, f/2.8, atmospheric, moody, evocative, luxurious, textured, artistic, surreal, detailed, otherworldly
>>101375965i don't believe it works like that anon
>>101375941The trick is to use synthetic data to fill out the gaps in its knowledge. But you balance it out with aesthetic training, then you have the advantages without the "slop" style.
>>101375965that's not what it is doing though, it's not putting a ideogram picture and making the model understand it's an AI picture, it's training it as if it's a real picture, that's dumb as fuck, the model is learning reality through AI sloppa, it's like recording a VHS out of a VHS, you just lose accuracy with this inbreeding techniquehttps://www.youtube.com/watch?v=nqy_hYDI0As
>>101375978>use synthetic data to fill out the gaps in its knowledgeName a single concept, object, etc that does not have enough real data available.
>>101375974>>101375983Well then it's extremely gay and not based at all, technically speaking of course
>>101375987Find me a equivalent image for each and every concept in this dataset.https://huggingface.co/datasets/ProGamerGov/synthetic-dataset-1m-dalle3-high-quality-captions
>>101375998I don't want a model to look like a cheap copy of dalle3, that's insanity
>>101376009Read my comment again: >>101375978 You can avoid getting the "style" of said images while gaining the "concepts". Its gonna need a lot more training though.
>>101376006>Models trained or fine-tuned onProGamerGov/synt>Zero text-to-imageI wonder why. You should train one and lets us know how it looks.
>>101376006The world captures over 5 billion real images daily your argument is invalid
>>101376019Are you not following the thread / the last one? That is what Auraflow is doing and its looking good even as undertrained as it is so far. Best in class even.
>>101376006if you're not lazy you could get the same results by training your model with really complex REAL drawings that has a shit ton of stuff in it
>>101376018>You can avoid getting the "style" of said images while gaining the "concepts". Its gonna need a lot more training though.Have any examples of this? Until it happens with aura I sleep.
>>101376043That looks like shit. I honestly prefer the dalle "style."
Any guesses to what that one (1) person as spent on training this? Something isn't adding up.
>>101376047>>101375941
>>101376018Nah I don't bite it, an AI picture will always be an approximation of reality, training a model with less than 100% accurate data when 100% accurate data (Real pictures!!) exist is retarded, it should not be done in the pretraining process, for finetuning why not, people are free to make the model more AI sloppa for what I care. A base model should be neutral in the first place so that everyone can mold it in any way they want
>>101376047
Though I don't think people are giving dalle a fair shake either. Actual stylized images on dalle actually dont have a bad style. Is the fake 3D / realism where its shit.https://cdn-lfs-us-1.huggingface.co/repos/ee/1b/ee1bd318fa77f0f576a7f4f9aed9ef47229a9abd078b2ad9e56f71078c3b5622/c8b1e0635c4e6176e66aec1152002bb5471a5db21eb58774a01d7cd29f785314?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27highlights_grid.jpg%3B+filename%3D%22highlights_grid.jpg%22%3B&response-content-type=image%2Fjpeg&Expires=1721028770&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyMTAyODc3MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2VlLzFiL2VlMWJkMzE4ZmE3N2YwZjU3NmE3ZjRmOWFlZDllZjQ3MjI5YTlhYmQwNzhiMmFkOWU1NmY3MTA3OGMzYjU2MjIvYzhiMWUwNjM1YzRlNjE3NmU2NmFlYzExNTIwMDJiYjU0NzFhNWRiMjFlYjU4Nzc0YTAxZDdjZDI5Zjc4NTMxND9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=YPzjiBEaPyaRAtSdUzk%7EYWe6nEpI18gRak199LfejzLWB52Uu9YBYHI9ae8XXbtuOedjkdabxZDkM5-r%7E8Ge4WSUDtiG7nZ-0BBuu5MBu9WUkXfg7UOMmmB3PQSrelM6La0lArVMB-HEjizie80xuN6FUsJpuPswTme3Fsb30s890z-UlS9k2bZixiGjGsDHEwsBXgW1e866SfleDKmLYKnMtd1iwCRBNmiTJ1-g0Ta6DOUs3q0bHRGN6L5xWcGhLkJ6Ld-TKwOMIrdNRstdo7D20pxiwckDU62dV%7EVwb73X%7Emebh8xpxyi970jIZA0gJp7rhgDRbDhF2LfffN9MxA__&Key-Pair-Id=K24J24Z295AEI9
>>101376130I'm not clicking that big ass link nigga
>>101376038>its looking goodPrompt comprehension wise because of the way it was tagged. That's it. The fact that it uses ai images in training does not make it better at comprehending a prompt. >>101376130>Is the fake 3D / realism where its shit.It's only more apparent with those styles.
>>101376130it doesn't matter how "supposedly good" dalle3 is looking, that's not the fucking point, the pretraining process should be about making the AI understanding the reality, and the reality is REAL PICTURES NOT FUCKING AI SLOP MOTHERFUCKER
>>101376135https://files.catbox.moe/97u9xh.jpg
>>101376050they do it for the love of the game bless their heart
>>101376144A curated dataset of only high quality stylized images is what you would use. Im not saying to use fake RL photos. That is where model's like dalle / midjourney suck. For stylized the best images can be nearly impossible to tell if its from AI or not.
>>101376164>the best images can be nearly impossible to tell if its from AI or not.You can always tell. You can always tell.
>>101376050?
>>101376178How much money has this individual spent on generating synthetic data and GPU rental. How are they funding it?
>>101375882The creation of /ai/ will come before that and shorty after, the heat death of the universe
>>101376164>For stylized the best images can be nearly impossible to tell if its from AI or not.That's a lie and you know it, a model shouldn't learn drawings through AI slop, but through real artists, period
>>101376170>You can always tell.You can not honestly tell me that there are not some images (mostly stylized ones) that would not appear out of place on any art website. For example, some of the pixel art ones here: >>101376146 / the more textured / stylized ones.
>>101376006hey, you absolute fucking dumbass. how do you think dall-e was able to learn all that if no prior image of it existed? it's because they used REAL FUCKING IMAGES and captioned them properly. god localjeets are SO FUCKING STUPID they're actually so completely retarded that they willingly sink their own projects at every turn.YOUR CHINESE SYNTHETIC SLOP MODELS LOOK LIKE SHIT! you will NEVER compete with midjourney and dall-e at this rate because you have NO QUALITY CONTROLthe few somewhat smart people in this thread are all the keeps me from wishing total localoid death. it's actually completely fucking inexcusable how far behind local image models have fallen due to incompetent chinks and self-flagellating ethicsfagsplease, for the sake of local development, NEVER POST about ai again. just jump off a bridge you sabotaging faggot
>>101376214>YOUR CHINESE SYNTHETIC SLOP MODELS LOOK LIKE SHIT! you will NEVER compete with midjourney and dall-e at this rate because you have NO QUALITY CONTROLExactly this, how the fuck do you expect to beat midjourney if your only goal is to only be a cheap copy of it
>>101376214Amen, fucking amen.
>>101376214midjourney was trained on a ton of dalle gens btw, same with ideogram
>>101376243I don't believe that, Midjourney pictures look way better than dalle
>>101376164ai image gen isn't at the point where it can replace real pictures and art, saas or not. with enough synthetic slop it'll just just end up poisoning the model with it's flaws (weird lines, nonsensical colors, stupid fucking fish eye lens effect, centered images, that strange obsession with symmetrical imagery, nonsensical upclose detail, everything is packed into a square resolution so it makes the image feel cramped and claustrophobic). not to mentioned that deepfried look dalle gens have, openai probably adds it intentionally so they can stay out of trouble.
>>101376209Anon. We disagree on "what looks good" but now the argument isn't about that, it's about being able to tell if an image is AI. If you have trouble pointing out the tells of every single one of those images then it's unironically over for you. A VAST majority of people dislike the typical AI styles. You may enjoy it but no one else does. That's why >>101375920 is right and it should be entirely omitted during pretraining.
>>101376255do you really not understand the concept of aesthetic training? You can have your sytheic cake and eat it too
>>101376243you know its easy to spot an mj gen, right? because of its style?
>Pretrain your model on dalle3 AI sloppa>Is surprised the image quality is worse than dalleYou can't make this shit up, local fags will never improve if all they're doing is trying to cheap out the pretraining
>>101376276Don't lump us all together like that. Only a few actually enjoy "AI style".
>>101376265No you don't, and I need a proof that Midjourney was using dalle to train their models, are you on the MJ team or something?
>>101376273>>101376265
>>101376214>YOUR CHINESE SYNTHETIC SLOP MODELSpretty sure base pixart didn't include any synthetic images in training but since they only published the captions we will never know
>>101376285Its called using midjourney when it first came out on discord. It was dalle with some aesthetic training on top that slowly diverged away.
>>101376299And pixart has worse quality images but is also super undertrained.
>>101376284The problem is that there isn't a lot of people making base imagegen models, and the few that does that are all doing something retarded in the pretraining>SAI: Poison their model with """safe""" DPO cucking + insane censorship on the dataset>ComfyUi and pony: Remove all children on the pretraining???>Hunyuan: That's obvious they use AI sloppa for the pretraining>Pixart: Looks fine but undertrained>Kolors: outdated architecture + shit licence>AuraFlow: Spam its model with ideogram pictures on the pretrainingFuck man, they all suck at the end of the day
>>101376276>trying to cheap out the pretrainingI don't get it. Isn't scraping images pretty cheap? Flickr, boorus, public domain etc.>>101376284>Only a few actually enjoy "AI style".I think it's a fun concept, but it should be tagged and trained as a style not something that comes out as a standard.
>>101376315you cant deny its aesthetically superior especially something like bunline which includes virtually zero AI images >>101376323put "dreamshaper" in the negatives of SD3 and post the results
>>101376323>I don't get it. Isn't scraping images pretty cheap? Flickr, boorus, public domain etc.It's not cheap at all because you still need to caption those pictures, and make good captions with it, it's way easier just to use ideogram API and write whatever random complex shit you want in it and you'll get a picture that is somehow close to what you've written
>>101376302Do you have a source or something?
But yea, today I learned that /g is horribly misinformed on synthetic data training. They don't know that you can get away from having the ai "style" while filling gaps if you properly balance the dataset.
>>101376345Go fuck yourself nigger, if you can't find real pictures to fill the gap, you have serious skill issue
Why can anon kmmediately tell when a lora or tunes dataset is mostly synthetic data?
>>101376345>>101376360you need to read this >>101376263>ai image gen isn't at the point where it can replace real pictures and art, saas or not. with enough synthetic slop it'll just just end up poisoning the model with it's flaws (weird lines, nonsensical colors, stupid fucking fish eye lens effect, centered images, that strange obsession with symmetrical imagery, nonsensical upclose detail, everything is packed into a square resolution so it makes the image feel cramped and claustrophobic).
>>101376355Find me a image of a anthro fox riding a horse standing on a blue cube while... ect...You can have both the concepts of the far off concepts. and once you train it enough, the style of the hand made images. The farther off the concept the more "synthetic" the image is going to look due to the limited training so far. That can be fixed with more training / aesthetic training.Educate yourself.
oh brother
>>101376265>>101376264>That's why >>101375920 is right and it should be entirely omitted during pretraining.
>>101376372>Find me a image of a anthro fox riding a horse standing on a blue cube while... ect...You're so fucking retarded, why do you pretend you can't find complex prompts out of real pictures? It won't be an anthro fox with whatever bullshit you've written, but if a model is trained with enough complex REAL pictures with accurate captions from it, there's no reasons it can't make new complex things after that, just admit you're a lazy fuck who wants to use AI sloppa to make your job easier, but that's the difference between the best (dalle/midjourney) and you local cucks, they don't take shortcuts to greatness, they don't rely to other AI sloppa to make their model great. You are the inbreeding cancer to this community, you should be ashamed of yourself.
>>101376400Lol did you work on dalle or something?
>>101376372>That can be fixed with more training / aesthetic training.That's the problem retard, if you only pretrain a model with real pictures, you don't need to fix anything in the first place. All you're doing is to add tape to some broken wall, instead of making a good wall in the first place
>>101376409Did you read the paper or something? They clearly said that they used real pictures + synthetic captions (GPT4V) to achieve this prompt understanding level
Gen
>>101376410What image model are we talking about that was pretrained on a fully synthetic dataset?
>>101376423STOP MISSING THE POINT, A PRETRAINED MODEL SHOULD HAVE 0 SYNTHETIC PICTURES IN IT, KILL YOURSELF
>>101376372>Find me a image of a anthro fox riding a horse standing on a blue cube while... ect...it doesn't need that, just put enough pictures of anthro foxes, people riding riding horses and blue cubes and a model with good prompt comprehension will be able to generalize it.
>>101376006remember the anon who said he was going to use this dataset? i wonder where he is now
>>101376426ok bud
>>101376332>good captionsI would like to know what the good captions are. I don't think having to string overly long sentences together is way to go or just having booru style tags. I worry that these new models will be trained so that people have to use text models for translating prompts
>>101376430pixart devs?
>>101376441say sike
>>101376435>I would like to know what the good captions are.Dunno why this should be a debate, a good caption is something that completely describe the pictures will all the necessary details. And with real sentences, because just using tags leads to confusionIf you write "woman, chair, table, sitting" the model doesn't know if the model is sitting on the chair or the table.
>>101376441bro?
>>101376444Check their discord.
>>101376451screenshot it
>>101376455I don't remember enough to find it in the search, you look.https://discord.com/invite/rde6eaE5Ta
>>101376469nowhere does the dev, the only one i know of lawrence-c, say anything to the effect of "we have used or will use synthetic images in our training"you are bullshitting
>>101376448I hope it's like this, natural looking sentences. It just sometimes looks like these text models try to maximize token usage and pad the description like first year university student does when writing an essay. Perhaps that's not terrible either I just wouldn't want to use prompts like that
>>101376477Do I have to do everything.
>>101376485Yeah, CogVLM does that, it just add unnecessary shit on the caption instead of just being objectively descriptive. Captioning pictures is easily the hardest part, you can't really do that manually, you have tens of millions of pictures to caption to make a good pretraining, but using captioning models are shit too (they won't do NFSW and add shit fluff and won't be as accurate as humans)That's the moat of OpenAI, they hired hundreds of african slaves to make manual captions kek
>>101376495>rando discord user>talking about a model thats not pixartare you being purposefully obtuse?
>>101376495And do we have to repeat everything? LLMs and imagegens aren't equivalent? And no one like Phi this shit is ultra slopped, so if you're trying to make your point out of this LLM models you're failing hardBesides, it's not because they got good results with synthetic data that it means you can't have the same result with human data, that's a fallacy and you know it, they never did any comparaison to reach that conclusion in the first place
>>101376514>slopNot everyone uses these things for porn you know. For real world tasks phi is indeed sota for its size.
>>101376495if you bothered to scroll up you'd notice he's arguing against using synthetic data. also i have no idea if this guy is a pixart dev or not.
>>101376526What's your point? We are in the imagegen community, people want AI models that output images as close as possible as reality, people want soulful drawing, they don't want AI sloppa that can put an AI sloppa dog on the top of an AI sloppa green triangle, what are you smoking mate?
>>101376006Honestly cannot comprehend someone looking at those images and thinking "yeah these look good I should include them in my dataset". Absolutely zero taste.
>>101376501I think models like https://huggingface.co/internlm/internlm-xcomposer2-vl-7b-4bit should be able to make pretty damn accurate descriptions even for pornographic images
>>101376495your eyes must be fucked up, not only do you enjoy the look of ai sloppa but your text rendering is trash kek
>>1013765261) People use synthetic data because they don't have much choice, it's expensive and too much time consuming to do everything by hand, they don't do that because they like it2) the LLM community accept the sloppa more because they want from their AI objectively good answers, the "aesthetic" part which is the way the AI talk is kinda irrelevant if you want to use it for professional use3) That's the difference with the imagegen community, we want both. We want a model that produce a picture that is accurate to the prompt, but at the same time we want the "aesthetic" that looks like real life, and that's the moment synthetic data is unwelcomed, because you can't have the cake and eat it too with that AI slop method
>>101376566Did you use it anon? Is it really more accurate than CogVLM? And can it really caption NFSW pictures?
>>101376583you had me angry with #1 but by #3 i agreed >People use synthetic data because they don't have much choice, it's expensive and too much time consuming to do everything by hand, they don't do that because they like itplenty of people use 100% real data. the ones you mention are simply lazy.
>>101376595>plenty of people use 100% real data. the ones you mention are simply lazy.For pretraining that's kinda impossible to do it all alone, you have tens of millions of pictures you need to have good caption out of them. At some point you need to use sythetic captions (still better than using the shitty laion captions)
>>101376320>>Hunyuan: That's obvious they use AI sloppa for the pretrainingIs that why the skin texture looks so smooth and unnatural?>>Pixart: Looks fine but undertrainedThat's every single anon should be training it. It's a solid base. >>101376606Again, you are insane if you're purporting that there isn't enough real world images out there.
>>101376623>Again, you are insane if you're purporting that there isn't enough real world images out there.No, I think you're missing my point. I'm all for using 100% real pictures, I'm talking about the caption of those pictures, you won't do them by hands, you need the help of an AI for that
>>101376623>Is that why the skin texture looks so smooth and unnatural?No, its because all these models are incredibly undertrained. Undertrained models lack detail.
>>101376630You're right I misread your reply. The only gripe I have with synthetic captions is they seem to neglect to pick up on specifics as in the caption for Mario would be something to the effect of "mustached man wearing a red hat and trousers".
>>101376593Did not try it yet, should use linux with my hw and I'm on windows. People I know use it for smut >Is it really more accurate than CogVLM?I don't think so
>>101376593https://huggingface.co/RedRocket/JointTaggerProject
>>101376647>specificsNot the right word but I think you get what I mean.
>>101376654>This model is a multi-label classifier model designed and trained by RedRocket for use on furry images, using E621 tags.tags suck anon
>>101376642https://arxiv.org/pdf/2405.08748I think you're right, they don't mention any synthetic data on their paper
>>101376666You think writing a novel like a captioner is better? Tags are easier to "capture" the important aspects of a image with.
>>101376630>I'm talking about the caption of those pictures, you won't do them by hands,i dream of a day where anon can work together to properly tag by hand an entire dataset big enough to pretrain a great model
>>101376647yeah, if you only use CogVLM captions to pretrain your model, you'll lose all the artists/celebrities/characters in the process, the wet dream of SAI actually kek
>>101376682Nah this is bullshit, imagine a woman sitting on a table and there's a chair in front of her. The tags will confuse the model "woman, sitting, chair, table", how the fuck the model is supposed to know the woman is sitting on the chair or on the table. That's why we use sentences and we don't speak like that."Retarded, anon, are, you, understand, not, shit, issue, skill"
>>101376684we can't work together, that would mean putting the pictures on a site and working on them, we would be destroyed by "copyright" really quickly
>>101376696>"Retarded, anon, are, you, understand, not, shit, issue, skill"kekd hard it should be a combination of the two desu or at least still allow me to spam random tags at the end for lulz
How many beams should one use with captioning models? More = more accurate? What the hell is a beam
>>101376731A beam? What captioning model are you using anon?
>>101376757Trying out microsoft/kosmos-2-patch14-224
https://fal.ai/models/fal-ai/aura-flow>Two men arguing with each other, one is screaming "NO AI SLOP" the other says "WHY NOT??"
>>101376810>A woman walking over a giant multicolored glass ball and is screaming "I'm going to fall!", 90's anime style
>>101376810We are so far from dalle3 it's not funny anymore :(
>>101376842Yeah... dalle3 didn't do the 90's anime style and didn't add any text kek
>>101376775it it not the width (or height)?
>>101376893No clue. I'm bouncing between Florence-2 and Kosmos-2 for really quick & simple captions
>>101376810Model's alrightBut there's one thing that completely kills it, it uses the sdxl VAE which renders it unable to use text and finer details. Another DOA release
they come.... and they go....
>>101376731> Beam size, or beam width, is a parameter in the beam search algorithm which determines how many of the best partial solutions to evaluate.More = more accurate according to the model's internal scoring/evaluation, yes.
>>101377238>make 16 channel vae>https://huggingface.co/AuraDiffusion/16ch-vae>dont use it>use sdxl slop insteadbut why
>>101378256I genuinely don't know, I asked on the discussion tab.It genuinely seems like there's someone sabotaging open-source by making people take bad decissions. First SD releasing the absolute dogshit that SD3M was, then this...
>>101378256>>101378276the 16ch vae was made very recently, the guy behind it talked about making this model a few months before the sd3 release.
>>101376810>Analog photo of a beautiful girl winking and giving a thumbs up, 8k, intricate details.Wew lad.
>>101378256https://huggingface.co/fal/AuraFlow/discussions/6They said they plan oln changing itgreat :)
>>101378480any news about the 1.5 vae?
>>101376883it's a prompting issue, these models are not tagged in the same way, yet people expect them to behave like they do. picrel is from march when the same thing happened
>>101379158>picrel is from march when the same thing happenedit was the same exact prompt used on dalle on march, I guess they changed something on the model since then
https://huggingface.co/fal/AuraFlow/discussions/5>Uh, this is a big one. 35 GB VRAM. Generating a 1024x1024 on a RTX 4090 takes almost 20 minutes. And it seems to be unhappy with non square ratios? (1024x576)Holy fuck?
>>101378167>More = more accurate according to the model's internal scoring/evaluation, yes.Thanks. I'll try 20 with Florence-2-large-ft
>>101379242can you show us some pictures with florence captions to see how bad/good it is?
>>101379313
>>101379120https://huggingface.co/ostris/vae-kl-f8-d16seems like it's already usable with 1.5. the guy mentioned that this is an older test version and a new one is on the way. they have a thread for this on the pixart discord if you're interested in where i found this. they also said something about converting sd 1.5 checkpoints to be compatible with the 16ch vae by merging a lora, but i'm only a layman so i don't understand this, sorry.
>>101379421it's kinda accurate but not descriptive enough, it doesn't say what's written on her shirt, or that there's some clothes and the iron over the ironing board, desu for SFW pictures it's better to use the sota shit like gpt4v
>>101379421>>101379441>A fairly close eye-level indoor full shot shows a young woman in a red t-shirt with the words “Bite me” printed on it in white lettering stands in front of an ironing board in a room with orange and yellow walls. The woman is smiling and looking directly at the camera. She has long red hair pulled back in a ponytail and is wearing white ankle-high socks. The dress she is wearing is a dark red with small black dots all over it and a white flower in the center of the shirt. The ironing table is covered with an orange, yellow, orange, and green floral pattern and has a turquoise metal frame. Clothes are folded and stacked on top of one another on the left side of the table. There is anironing board on the right of the image, with an iron on top, and a flower-shaped green and blue flower on the far right. The walls of the room are painted a mauve pink, white, and yellow, and there is a white narrow bookcase in the background with several stuffed animals on it. The door to the left of the frame is orange and appears to be a door knob. The floor is carpeted in a light beige color.https://huggingface.co/yayayaaa/florence-2-large-ft-moredetailed
>>101379441I add captions with wd-v3 to it. It's really neat for loras>>101379464Ah yes I had token limit on
>>101379464Really interesting model, it doesn't go the "gender neutral bullshit" "they" like on CogVLM, it's only descriptive and doesn't add necessary fluff. I'd say it's 60% accurate which could be better but it doesn't make insane mistakes so that's ok I guess. Tbh, captioning models is really important and need to be the priority for improvement, because if you have a local captioner that is as good as humans, that's a fucking jackpot, the problem will always remain the celebrities/artists/characters names though... Maybe one day some model will be good enough to recognize everyone kek
>>101379464>The door to the left of the frame is orange and appears to be a door knob.that's a weird sentence, sometimes it has broken english in it
>>101375811Mediocre people survive only on gate keeping and the status quo. Truly skilled people aren't threatened by changes because they often are the change. Excellent artists would be embracing AI for the time saver it is.
>>101379558this
>>101379558The worst part is the hypocrisy of artists, they have no problem copying others artists style, in the video this artist has no problem drawing a copyrighted character (Pomny) but if you want to use his pictures to train your model that's blasphemous to them? Get the fuck out of there!
>>101379586Artists are left brained and stupid. They don't have critical/abstract thinking and they're also dunning kruger incarnate. There's a reason why they're some of the biggest fart huffers and authoritarians in existence, at least modern artists are.
>>101379201It's why I'm against ultra large models for local, they should've targeted 24 GB of VRAM. Your generation is taking forever because it's memory swapping.
>>101379663I'm sure that's because he used the default script provided on huggingface, if he used ComfyUi it would fit on a 24gb vram card
>>101379675The big test is if you can full tine tune on a 24 GB card. Loras simply don't cut it.
>>101379716I mean, at this point if we want to compete against API, we need bigger guns, that's Nvdia's fault if they prevent us on improving our craft in the first place, and their next 5090 card will probably be a 28gb vram card, fuck them, seriously
>>101375811I sometimes check on anti-ai forums These people can't be past 16, they are so corny and passionate yet they don't understand what they are talking about
>>101379738You will never have a comprehensive local model that competes against API because API can put their models on 80 GB GPUs. But good news, parameters doesn't scale so a model half the size beats a model twice its size as long as you keep the training domain focused.
>>101379767> But good news, parameters doesn't scale so a model half the size beats a model twice its size as long as you keep the training domain focused.That might be true for unet models, but probably not for DiT models, transformers models always scale its quality with parameters. That's why LLMs are insanely huge
>>101379774Wrong. Doubling the parameters doesn't make a model twice as smart but it certainly quadruples the cost to run it. LLMs have already proven you wrong. Many smaller models perform better than their absurdly large counterparts.
>>101379828that's not true, if you train a small and a big model exactly the same way, the big model will always be betterLLMs are proving me this righ, look at L1, L2, L3, the biggest base model is always the one with the best benchmarks, always
>>101379843Do you know how graphs work or do you think 10% better is worth 20x the size?
>>101379850Moving the goalpost? The topic was that bigger models will always perform better than smaller models if trained the exact same way. And no anon, if you want a non retarded experience with LLM you need at least to go on the 27b size (gemma2-it), smaller models will always be too retarded to be genuinely enjoyable, regardless on how well trained they are, it's just how it is. Benchmarks don't tell all the story
>>101379873I can't have a conversation with someone who thinks a model 20x bigger for 10% the performance is smart in local. Hey faggot, you don't need a model that can do both photorealism and anime at the same time, let's start there.
switched to a more proper node set for Kolors: https://github.com/MinusZoneAI/ComfyUI-Kolors-MZnow I'm not locked into the diffuser wrappers limited sampler selection
>>101379880
>>101379908I've noticed this is what idiots do when they have no good arguments. Because surely you must be an idiot if you think a model that is 20x bigger and costs 40x as much to train for a 10% performance gain is smart. Also way to out yourself as an underaged banned zoomer. Anon, you can't even afford a 24 GB GPU.
this always cracks me up lmao
>>101379924>Anon, you can't even afford a 24 GB GPU.the fuck you talk about nigger? I can run L3-70b at Q5, I know what I'm talking about, I tried small and big models, and the difference is huge, it's not "10%" like you pretend, you fucking faggot fuck, you're probably one of those copium losers that never tested big models and pretend to know everything. Get the fuck out of there you sub-human
>>101379956Post the graphs then :)Cost to train, performance scores, and cost to run.Show the exponential performance
>>101379828I can tell you've not used bigger modelsYou can train smaller models to give the illusion of inteligence, but in the real world Euryale 70b (a llama2 finetune) can still recall a series of events and its consequences than gemma 27b.Parameter count is king.t. 56gb vramgod
>>101379983Show the cost/performance.
>>101379997sorry dude, open source development shouldnt move at the pace of 10th percentile poorfags
>>101379969
>>101380015Anon you can't finetune your 70b model. Local models are useless when only a small percentage of people can train them.
>>101380024>no labelsI'll take your cropping as a concession.Cost/parameters/performanceThanks!
>>101380033Wrong againIm worked on a LoRA for a 70B in an A100 instance I rented.And even if I wasn't it's always a possibility to finetune a 70B model for 150$ tops.
>>101380047>asks for graph>gets graph>"no not like that!!"concession accepted
>>101380033the LLM community only use cloud to train their models though, the imagegen model will probably go this path aswell, like the anon said, if we want to move forward, we need to scale up, too bad for lora fags who thought it would always be that way (local training)>>101380047You're the one claiming that it's "only a 10%" improvement, do you know you have the burden of proof in consequence or something?https://en.wikipedia.org/wiki/Burden_of_proof_(philosophy)So let's go anon, show us your cost/parameters/performance graphs, that's your job now, Thanks!
>>101380064Your graph is useless without a key. Otherwise I assume that's a graph of your faggotry.
>>101380050Ive*hadn't*I swear Im not an ESL
>>101380124>Talks about grammar mistakes instead of arguing>>101379924>I've noticed this is what idiots do when they have no good arguments.Kek, the irony.
>>101379663With the default huggingface script it takes 24.5 GB on my machine. They can probably bring it under 24, but it's not worth it right now to put it mildly. The current model is worse than SD3, it's in beta so maybe we can expect significant improvements, but definitely not off to a good start.
>>101380124Wow you trained a Lora on an A100! I bet the quality was excellent and well worth the rental!
>>101380183She looks like a suitcase there lmaooo
>https://huggingface.co/datasets/matrixglitch/wikiart-215kcool
>>101380204A mix of both tags and florence caption would do the trick, you give florence the tags to help it with the captions so that it can write the artist names with the description
>>101380186>concession acceptedAnd it was, now my model understands anthropomorphic anatomy much better, and also writes what I like better. >>101380124That's my message, Im correcting my own post
>>101380255Florence takes no text input sadly
>>101380261My argument isn't that you can't rent an A100 to do a tiny model lmaoOf course any of us can rent 4xH100s to finetune a 6B model lmaoo
is there a site like PixArt-Sigmathat uses bing.com AI I get two different styles with the same prompt
>>101380271You can rent 2x3090s, or a single 3090 even. Go back to playing with Dalle3, you have no idea how LLMs work
>>101380203>photo of a beautiful woman crying and holding a sign with text "tfw no suitcase gf"
>>101380285You want to win so bad you completely miss the point of everything. Enjoy your 6B art model with 2 fine tunes and 10 loras. I hope you like the base model :)
>>101380255that florence caption is kinda bad, no wonder models have trouble understanding our prompts, they are being trained with wrong informations
>>101380311It doesn't need to be great, it just needs to be mostly right. Remember, SD 1.5 was trained on utter garbage yet managed to learn. The model learns the concept of "red" not from one picture but many pictures with red things.
>>101380331>It doesn't need to be great, it just needs to be mostly right.And then we wonder why we get destroyed by the API models, we shouldn't think mediocrity is good enough, we must aspire for more than that.>>101380286Here's a dalle3 version of your prompt kek
>>101380351API models are trained by people who care less and use the same tools as us. The difference is they can afford 100xH100s training 24/7.
>>101380358No, OpenAI hired a lot of humans to do manual caption on pictures, that's why their dalle3 model is so good at prompt understanding. But I agree with you on that point, if you have money, it's easier yeah, that's why they were able to rely on actual humans for captions instead of using florence
>>101380377Retard if you can't get a clue they used the same vision model as GPT4.
>>101380392And how did they train GPT4V retard?
>>101380399It doesn't matter, are you so stupid you think they captioned their entire dataset manually? No, they trained GPT4V and they used that. So, earth to retard, the captions they trained with are likely the exact same as what GPT4V is.
>>101380412>It doesn't matteroh yes it matter, it fucking matter, if GPT4V is so good that's because it was trained on a lot of pictures with actual human captions, stop being a retard for a second and accept that you need at some point human labeling if you want to improve your craft
>>101380428Florence2 is just about as good as GPT4V. I just think you're a massive moron who thinks API models have magic sauce.
>>101380440>Florence2 is just about as good as GPT4V.LMAOOOOOOOOOOO, I'm fucking done, my sides!
>>101380449Okay you're just trolling, so I assume work still sucks trollanon? Can't wait to post centaurgirls tonight?
>>101380440>Florence2 is just about as good as GPT4V. Ihttps://www.youtube.com/watch?v=ciG0FvIUxKM
>>101380563Haven't followed the reply chain but >The painting is rich in texture... Is maximum retarded
>>101380563I thought you faggots hated long verbose prompts with superfluous language?
>>101380589>>101380584I still prefer an accurate model with unnecessary fluff rather than a model that just gives false informations. You can talk to gpt4v and ask him to be more concise, you can't talk to florence so it kinda suck
>>101380654>do not make any interpretations like...> > > >this painting is rendered with a high level of detail... I truly despise the idea of needing to include that kind of information in my prompt, but maybe you can get it to condense even more I do not know
>>101380654None of that information was false, it was incomplete. It is a group of men carrying a large cloth. There is a man in a blue shirt on the left. There is two men wearing red shirts on the right. The ChatGPT model is full of superfluous language and assumptions, in fact there's a lot more red herring and wasted tokens in the ChatGPT prompt. It's the completely opposite problem.
>>101380693you only need to do it once and let the API caption your thousands of pictures though>>101380697>incompletestill more accurate and complete than florence, which was the original point, focus anon focus...
>>101380712It's not more complete, it's completely wrong if your goal is to caption an image for an AI model to learn. I already said it once, AI models don't need complete information to learn, just mostly correct information.In reality that caption should be:"A realism painting featuring impasto fine details and brushwork of a group of Asian men on a fishing boat moving a large bundle of cloth and rope which appears to be heavy."
>>101380693>I truly despise the idea of needing to include that kind of information in my prompt, but maybe you can get it to condense even more I do not knowLooks like gpt4v is making this kind of fluff at the very last sentence, you could make a python script that remove the last sentence to be sure you won't get that shit, dunno if it's always the case though, it's a trial and error I guess
>>101380764Only a sentence and a half of that entire output is actually good
>>101380762>It's not more completeof course it's more complete, florence doesn't say they're carrying ropes, or that they are on a boat like gpt4v does. It's just not precise enough >>101380563>I already said it once, AI models don't need complete information to learn, just mostly correct information.I disagree with that, you give the model wrong/incomplete information, it will output shit because it learned that way, dunno why you believe that the quality of the data or the caption don't matter, they matter anon, it's probably the most important thing in machine learning
>>101380776This
>>101380785Anon you don't need to literally label every thing in a picture, believe it or not it's smart enough to know a rope is in a picture from other images which were correctly captioned with "rope".
>>101380776>>101380786relative to florence, it's good, I don't get why you critisize gpt4v so much when at the end of the day you use a worse model (florence) to caption your pictures, are you retarded or something?
>>101380796facts don't care about your feelings anon, dalle3 is the best at prompt understanding because it was being trained with the best captioner model, gpt4v. You can make as many mental gymnastics as you want, the reality is here
>>101380798Because GPTV4 costs money and Florence2 can caption an image every half second for free?
>>101380831Finally! I prefer that answer rather than coping with "florence is as good as gpt4v" >>101380440https://www.youtube.com/watch?v=Ha7HAG6jVqc
>>101380811DE3 is one of the ugliest large models and if SAI didn't completely drop the ball SD3 would've smoked DE3. You just sound like an OpenAI fag. And for prompt adherence? DE3 is actually shit.
>>101380842>And for prompt adherence? DE3 is actually shit.
>>101380837Florence is 90% as good as GPT4V. And if you combine Florence with WDV3 it will get you an extremely good model. Florence's tiny captions are also very good.
>>101380852Yes anon, or have you used it? I know you have selective memory and bias but if you actually paid attention to DE3 it's very much like SD 1.5 in how it gachas your prompts. You are conflating esoteric knowledge with actual prompt adherence. Just because it shows Wario robbing an ATM from the view of a security camera doesn't mean it was actually faithful to the prompt. It also gets much worse the more detailed you are in the prompt.
>>101380885Give me models that are better at prompt understanding than dalle3 so I can laugh some more
>>101380896DE3 is so heckin good at prompt adherence!!!!
>>101380921Can you simply answer this simple question? You also seem to have trouble at prompt understanding >>101380896>Give me models that are better at prompt understanding than dalle3 so I can laugh some more
>>101380938No, I proved DE3 sucks at prompt adherence and it certainly sucks at image quality and hallucinations.
>>101380950>No, I proved DE3 sucks at prompt adherenceDoesn't prove that DE3 isn't the best at it though> it certainly sucks at image quality and hallucinations.Irrelevant goalpost moving, looks like you also like to add verbose fluff to your text
>>101380969I'd expect the gold standard of caption makers to have fantastic prompt adherence. I guess not. Anyways have fun with your DE3.
>>101380987>I'd expect the gold standard of caption makers to have fantastic prompt adherence.I don't expect anything from the best, they know better than anyone how to make their craft, if you think they are so bad, then go ahead and show them how it should be done , we're waiting for your model that will be SOTA at prompt understanding :^)
>>101379889>non DiT modelWHY?
>>101381099Ikr, if they went for DiT we would've gotten a top tier local model...
>>101379889Damn good pic
>>101379889Does it work with windows?How many vram does it ask?
Why is DiT considered as being so good? I have zero understanding of this stuff but purely from a visual perspective all these local DiT models preform poorly overall, take longer to gen and are harder to train. Am I missing something?
https://fal.ai/models/fal-ai/aura-flowkek
>>101381227It's easier to train, Pixart Sigma is one of the easiest models to train out there, trivial to add nudity to it compared to SDXL.
>>101381227>Why is DiT considered as being so good?When you look at the benchmarks, it just beats unet everywhere, and SORA (a DiT model) showed how far you can go with that technologyhttps://www.youtube.com/watch?v=lKM-QMnZ3yY
>>101380050it wouldnt be bad if they just needed to be slightly changed/tuned, but because of the safety cocksuckers the models need to be partially overwritten to add knowledge of nsfw (since the training datasets are going to be pruned of it), and thats going to need way more resources than something that already knows it and just has some guardrails like llms
>>101381248Ahh my bad, I was under the impression it took a lot more vram = not accessible to local training but I'm now assuming that's model specific and not a DiT thing >>101381290Damn that's actually really crazy, couldn't tell it was ai from my mobile screen. Thanks for showing me, anons
downloading auraflow, hopefully its good
>>101381770super under cooked, even more so than base pixart so temper your expectations. they say it's more like a beta 0.1v proof of concept. probably open sota for prompt comprehension though.
>>101380183Use a higher cfg for humans.
>We worked on building the 16ch-vae https://huggingface.co/AuraDiffusion/16ch-vae when we were in the middle of v0.1 pre-training, hoping to leverage it for v0.2!That's good.
>>101381815>probably open sota for prompt comprehension though.even better than sd3?
>>101381904from the samples i've seen posted here, yeah i'd say so.
Everyone arguing for florence vs gpt4v; what about this one?https://huggingface.co/OpenGVLab/InternVL2-40B
Any negatives for using Huber loss? There has to be some downside
>>101382398lmao that's not bad at all
>>101382398kekd
>>101382408I think this model will be sota when its trained more. It looks like they are gonna train from scratch for 16 chan vae for 0.2
>>101382425I just hope he'll stop using ideogram outputs to pretrain his models thoughhttps://reddit.com/r/StableDiffusion/comments/1e1ktdh/auraflow_sure_does_like_making_the_ideogram/
Bunch of base model comparisons including aura flow. Just click on a image to see it across the base models.https://images.flrty.li/
>>101382532>no pixartpixartsexuals, this open mockery will not be forgotten! they spit on our faces, but not for long!
>>101382532Auraflow's style is actually coming along good, its just extremely undertrained and so is going to have that smooth undetailed look for a lot of them.
>>101382532>Anime character illustration of a cheerful karate girl wearing a white gi and headband, jumping kick pose. Expressive manga-style linework.Midjourney looks so good
Any sampler/scheduler recommendations for AuraFlow?
>>101382398>408X628it can do sub 1024px as well?
>>101382407can't really see any particular downside
>>101382746I remade lora using Prodigy + Huber loss. It seems to counter the usual Prodigy overfitting issue. Almost too good to be true.
>>101376243No it was not moron. I have listened to every single Midjourney developer chat. >Office hours 4/17: Midjourney does not train on its own images and does not train on AI imagesAnd if you don't believe me I'll ask him again next time and record it. You are making shit up now for the SOLE PURPOSE of sabotaging local models. get the fuck out of this thread
>>101382906based
>>101382792Haven't found any instance where it has been noticeably bad at bmaltais default values, it basically always either helped or seemingly did nothing in particular. Almost everything else is more tricky.
>>101382906Of course they can not publicly say they trained on dalle outputs, they would be possibly liable then.
>>101383186>they would be possibly liable then.liable for what?
>>101378888>>101376810>>101376842I do like the tortured AI jank from hell aesthetic
>>101383254openai forbids training on its images. Also making datasets public is never a good idea with how grey of a legal area it all is.
>>101383186Generally speaking overt lies are illegal when it comes to business. So if you trained on something and then directly lie about it, that can come back to haunt you in many ways. It's better to say nothing.
>>101382441its over
>>101383360It's just begun. It has sota prompt adherence + and style when prompted decently is not bad so far: https://images.flrty.li/Its just extremely undertrained.
>>101382441It is totally insane he did that, it will considers the errors in the AI as valid data and it will break even more. Too many limbs in the training image, no problem, it will be considered valid data...
>localjeets now slopping up synthetic garbage thinking it's better than real datapsyop success, enjoy remaining forever in last place
>>101382441Why... WHY???
>>101383375yes just like with pixart, hunyuan and kolors, just 2 more weeks till someone (not me) trains them more
>>101382441I'm so tired of those retards, is there a single man not doing retarded things in the imagegen community?
man people are really trying to fudd the new model, huh?
>>101383439me too bad im not training models :/
>>101383396t: homosexual
>hitting reply limit after only 12 hours
>>101382441Not only he decided to poison his model with AI slop, but he didn't even bother removing the censored pictures, what kind of an amateur moron must you be to end up there??
add auraflow not safe cat to collage
>>101383493Front loaded thread with lots of discussion around Auraflow.So the bakery just opened and put out some fresh bread>>101383507>>101383507>>101383507
Whatever he is doing is working well. I hope he continues and ignores all the people who think they know better / are trying to disparage him.
>>101383528yeah, amateur or not, it's good there's another player in the field.
>>101383528>t. cocksucker
>>101383528>>101383549fuck off aura devs
>>101383552>t. disingenuous troll
>>101382441I hope someone will tell him on twitter that he's going to the wrong direction, he's wasting his and our time with this bullshit
>>101383556its one guy SD dev.
>>101383559Asking him to remove the censored pictures so that the model won't see a fucking fat cat everytime a controvertial prompt appears is trolling? The fuck is wrong with you retard?
>>101383581That is horseshit, it certainly does not do that, are you retarded?
>>101383593IT DOES THAT YOU FUCKING MONGOLOID >>101383503 >>101382441https://github.com/comfyanonymous/ComfyUI/issues/4007#issuecomment-2225633909
>>101383601Did you not even read it?
>>101383601>Vu will let the AI train on AI slop>Vu will let him add the ideogram censored cat pictures in the pretraining>Vu will be happy
>>101383601I don't believe it. The cat images are all exactly the same, not a hair / pixel off. Bet that redditor is bullshitting us.
>>101383722I got one when playing it around with ComfyUi, yeah it doesn't look as good as on the previous one, but the cat indeed is there if you wanna try "non safe" prompts
>>101383739Give me the exact prompt / seed that gives you the cat. Anyone who believes this >>101383503 is a retard. It is impossible to generate the same exact pixel perfect cat in those gens.
>>101383770have you not used the model yet? kek i got the cat within maybe 15 minutes
>>101383782Ive used it for hours now. Not once did I get a cat. Give me the prompt / seed or be proven a troll.
>>101383770>Fantasy art of skeleton king, death godthat one gave me the cat quickly
>>101383792my prompt wasn't even nsfw im not trolling, anon
>>101383796>>101383807>still avoiding giving a exact seed / prompt combo. Fucking disingenuous troll.
>>101383820try an overt nsfw prompt jesus slowpoke anon
>>101383820you want a coffee with that aswell fucker? like I said it's easy to get one, just try it you won't wait for long, disingenuous shill
>>101383853>still avoiding giving a exact seed / prompt combo .Thank you for your admission.>>101383841nsfw just gives barbie dolls / garbled anatomy, it clearly does not contain many nsfw images, but it certainly does not give you a cat.
>>101383879>still avoiding giving a exact seed / prompt comboyou want all the details? fine, go for that one. What excuse are you gonna find now?
graffiti of a nude woman on concrete wall, the woman in standing on top of a red cube on top of a green ball, masterpieceNo cat, ill try >>101383899 next
>>101383796ű>Fantasy art of skeleton king, death godI can't reproduce anything approximating a cat after dozens of gens. I'm using it through hf diffusers, maybe that's a factor.
meow bros..?
>>101382441This is what happens when your dataset is made up of primarily AI-generated images. Why people completely forgot how to scrape properly is beyond me. Seems to be a trend with recent local models where developers are resorting to low-hanging trash-tier datasets made up of Dall-E/MidJourney outputs instead of gathering their own real images to train on.Sad to see local models going completely backwards. Continuously shooting themselves in the foot in order to remain 'ethical' and 'safe'. Just scrape artstation, flickr, etc already and assemble a good dataset or just don't even bother at this point. Each local model somehow gets worse dataset wise, with SD 1.5 having an absolutely massive dataset with a wide range of styles, and cascade/sd3 gutting, no exaggeration, over 98% of the dataset due to 'safety' concerns.Stop training on ai-generated junk. Learn to scrape.
Oh look. Fucking troll
it's meowver...
>>101383899>seed 1588>>101383987seed 1589are you retarded?
>>101383987seed would be 1587
>>101384005Are you? Its on increment. It generated at 1588. Here is the image with metadata:https://files.catbox.moe/p0pqd3.png
>meo-ACK
>>101384015>1588should be generated with seed 1587
official cat waiting room
>>101384082AHAH GATCHA BITCH, LETS GOOOO
>>101384082Now apologize to the pussy
what the meow
>>101384067Wait, I fucked that one up. Wtf, there really is a cat at seed 1587
>>101384082>>101384102ahah, stupid bitch, who's the retard now?
>>101384102what the fuck
Ahahahahahaa thanks for the lolz anon you fucking massive gorilla retard
>>101384106What's another seed it pops up at? Makes absolutely no sense for it to be pixel perfect across several seeds. Its not how these models work. I still think that post is trolling.
>>101383770>Anyone who believes this >>101383503(You) is a retard.>>101384102>Wtf, there really is a cat at seed 1587WELL WELL WELL
>>101384117>"I need a proof"!>*Provide the proof*>"NO NOT LIKE THAT"Can you stop the denial for 5 seconds?
>>101384119>>101384117And it clearly has nothing to do with censorship. It seems random. He certainly needs to filter that out before 0.2
THEABSOLUTESTATEOFLOCALAHAHAHAHAHAHAHAHAHA
>>101384117theres probably just a decent amount of the exact same cat image
reminder that pixart bigma will never do this to us
>>101384144I'll be in denial that the cat is the exact same across them all because that should be impossible.
>>101384145>And it clearly has nothing to do with censorship.He just scrapped a shit ton of ideogram pictures without bothering to remove the "censored pictures aka the cat ones", it's not that deep, he's a total amateur
I guess if he had to many that it made up a significant amount of the dataset it might get so incredibly overfitted to that point.
>>101384162Moving the goalpost? We just proved that this retard added the big censored ideogram cat into the pretraining process, what a fucking retard he is
>>101384161bigma...... my special bigma....
>>101384178yeah, if it was just one picture or two, the model would've never learned to reproduce this picture as well, the simple fact it's almost a 1:1 reproduction makes me believe there's probably tens of thousands of those cat pictures on his pretraining dataset
get on your knees and accept my seed
>>101384102I really thought that chink was smart by making his own architecture + training script, and then he does this... is this the mighty power of autism?
>>101384243I mean hes done a great job otherwise and 0.1 is apparently a proof of concept. 0.2 is supposed to train from scratch with a 16 channel vae, hopefully he also filters the dataset then.
>>101384258>0.2 is supposed to train from scratch with a 16 channel vae, hopefully he also filters the dataset then.Praying he does not fall into the same mistakes as 0.1
>>101384258he needs to redo all the pretraining again, that cat has poisoned his v0.1 model hard, can't go back and undo that process. Might be a good opportunity to actually do a good job and stop relying on AI slop to pretrain your models
>>101383528do you recind this post, anon?
>>101384258>hopefully he also filters the dataset then.i mean, he'd have to redo the entire thing since it's probably 90% ideogram. explains the great prompt adherence since all his images are now well captioned, but at the cost of image quality and heavy sloppification. and cat.
>>101383722>I don't believe it. The cat images are all exactly the same, not a hair / pixel off. Bet that redditor is bullshitting us.>>101383593>That is horseshit, it certainly does not do that>>101383770>It is impossible to generate the same exact pixel perfect cat in those gens.>>101383879>it certainly does not give you a cat.FAMOUS LAST WORDS OHNONONONONO
>>101384301I think there is a balance to be had. Remove the cat for sure though, all million of them for it to overfit that hard. The actual style that is starting to emerge is not really slop https://images.flrty.li/ just smooth / detail-less due to not enough training.
>>101384349Shut the absolute fuck up it is slop
>>101384349Come on just give it up already, he should make a pretrained model without any AI slop, and then it's up to the users (us) to add AI slop if we feel like it, by doing as it is he's forcing everyone to eat his shit AI sloppa, fuck off
if there's anything that aura diffusion shows us it's that a well captioned dataset really do make or break prompt adherence. i didn't think the gap would be this big bros.. wish we had an army of nigerans like openai.
>>101384372The style is not like ideograms though. It's clearly diverging greatly due to whatever else the dataset contains.
>>101384396>goalpost moved just admit you lost anon
>>101384396I don't give a fuck, no AI sloppa on the pretraining, should be a fucking golden rule, who's retarded idea it is to train an AI model with AI pictures that fucks up limbs, anatomy, perspective, lightning in the first place WHEN BILLIONS OF REAL LIFE PICTURES EXIST AND DEPICT REAL LIFE IN 100% ACCURACY
>>101384424Likely cause of the possible legal issues.
>>101384441License your model properly and no one will care.
>>101384441lol, lmao even, he doesn't share his dataset, no one will know what picture he used in the first place, like OpenAI they also train their model on copyrighted shit but no one can prove anything so they're in the clear. They have no obligation to reveal thathttps://youtu.be/mAUpxN-EIgU?t=264
>>101384471Whoever is funding the thousands of gpus though might care.
>concern trolling
>>101384478Doesn't he do everything by himself though?
>>101384490maybe the actual training but I doubt he is bank rolling it all.
>>101384512I mean, OpenAI was able to pretrain a giant models like gpt4 and dalle3 with copyrighted data without much trouble, dunno why it would be impossible for him to do it aswell, with a much smaller model too. And like I said, I think he does everything by himself, even the gpu and pretraining so... he's just a lazy fuck, he didn't even bother to remove the cat from the ideogram scrapping, that's crazy
>>101384441What legal issues? Midjourney for example shows openely the artist tags and the celebrities, are they dead? nope
>>101384550they are both already established and have armies of lawyers / microsoft backing them with infinite money.
>>101384565>What legal issues? State v. The Visions and Anon v. The Voices
>>101384570In the same time they are heavily scrutinized, that chink, no one knows him, he could've even pretrain his model and release it to the hood on 4chan (like llama1 and NovelAI leak), what are they gonna do?
>>101384550why are you comparing some guy in his basement to openai?
>>101384602OpenAI has it actually harder, the whole world have eyes on them, it means way more chance to find anti-AI fags willing to destroy them, it's way better to work in the shadow anon, way way better
This thread is fun>There's no ideogram cat in the pretraining you're retarded if you think otherwise>Ok... there's the ideogram cat in the pretraining, but the idea of pretraining with AI sloppa is good>Ok it's not that good... but... but da legal issues!!!Holy moving the goal post!
>>101384578As if training pictures with AI is a better way to avoid legal issues, don't forget that the model producing those AI pictures were trained with copyrighted pictures, therefore those AI pictures are also in the gray area
>>101384617yeah, honestly i think you're right. there's no other explanation for him using so many unfiltered ideogram gens that the model learns to do a pixel perfect safety cat besides pure laziness.
>>101384672I would even say that it's kinda retarded to reveal to everyone that you used ideogram to pretrain your model, what if ideogram wants to make a cease and desist out of its outputs?
>>101384658Why do they even do this? Is it really an elaborate ploy to sabotage local models by convincing guidable chinks that training on midjourneyslop is the path forward?
>>101385343It's probably a good way of preventing the local ecosystem from catching up with the APIs, pushing them to shoot themselves in the foot with "ethical" training or with AI sloppa poisoning, if you want my genuine opinion, it's just sad. We could achieve so much better without those retards.