/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102544848 & >>102535977►News>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102544848--Anon shares Meta Connect 2024 live stream, discusses AI model benchmarks and performance comparisons.:>102549246 >102549435 >102549452 >102549478 >102550206 >102551139 >102551152 >102551170 >102551175 >102551185 >102551199 >102551224 >102551307 >102551319 >102551358 >102549488 >102549558 >102549651 >102549675 >102549661 >102550386 >102550399 >102550753 >102550952--Experimenting with high dropout rates for training LLMs and LoRA:>102547870 >102548570 >102548715 >102548761 >102548927--Fitting a parabola to a small dataset has limitations:>102545783 >102546867 >102548118--Discussion on creating an important matrix using datasets or questions:>102547955 >102548057 >102548144 >102548191 >102548324 >102548110--Using AI to store data as images of hex text files:>102545442 >102545538 >102545592 >102545728 >102546256--Molmo model family discussion and benchmarking results:>102547425 >102547538 >102548005 >102548030 >102549019 >102549105 >102549115 >102548045 >102548114 >102548228 >102548323 >102548786 >102551000 >102551077 >102551078 >102551092 >102551147 >102551169--Llama 3.2 1B and 3B performance comparison, with Llama 3.2 1B outperforming in most categories:>102549527 >102549588 >102549602--Challenges and potential solutions for RPG games using LLMs:>102545841 >102545941 >102546792 >102547036 >102547055 >102547186 >102547242 >102547753 >102548038 >102548180 >102548534--MIMO project discussion, potential local use and relevance for vtubers:>102548365 >102548390--Llama.cpp may add Jinja parser, but some argue it's bloat:>102549141 >102549192--Agents in LLMs - benefits, challenges, and potential improvements:>102545041 >102545116 >102545137 >102545307 >102545340 >102545440 >102545523 >102545690 >102545205 >102545101--Miku (free space):>102545307 >102548921 >102550127►Recent Highlight Posts from the Previous Thread: >>102535999https://rentry.org/lmg-recap-script
>>102552020cant even find an uncensored 3.1 and they alread have 3.2
Now that there's an official version of Llama with multimodal, will Llama.cpp finally give multimodal first class support? It is called Llama.cpp isn't it?
Where's Molmo OP?
>>102542933>>102552003--batch-size and --ubatch-sizeThe former seems to be merely cosmetic, or maybe it matters for multiGPU, but only ubatch-size seems to matter on my machine.Less batch size means slower prompt processing (at a certain point) and more space for layers, so maybe vary that with ngl, or just keep it at a minimum.
>>102552037dumb bot can't quote messages properly
>>102552037MY FREE (you)S NOOOOO
Any locals on opus level yet? Not being mean, just wondering. Locals are what got me into ai and I got a pc that can now run more than 20b so I wanna see how the local side of things are.
>>102552073sucks to suck!
>>102552037>>102552067I guess the recap should include a blurb about why the quotes look like that and why that rentry to the script is necessary.
>>102552073*headpat*
>almost 2025>still no AGIWtf is taking so god damn long?
>>102552075llama 405 is barely competing with og gpt4 so no
>>102552037Damn, so this is the "quality" you get from Llama 3.2 ...
>>102552099When will he finally rename it to ClosedAI?
>>102552099Didn't he tell congress that one of the reasons why OpenAI is safe is because he has no personal equity and did not have a for profit approach to it... that's gone out the window
>>102552047Alright, better not waste my drive space then.>>102552065Ok I'll try these out.
>>102552100jeesus,. they're on 405b now? What does it even take to run that locally?
>>102552162downloading ram
>>102552162datacenters and cpumaxxers (at 1t/s kek)
>>102552135you have a point, I truly believe their ship is sinking and Sam is taking the money on his pocket before leaving for good
>>102552100But og gpt4 was the best. Every update just made it dumber.
Molmo could be the greatest thing since sliced bread and I wouldn't care, because they didn't publish a base text continuation model
With all the progress being made, is running locally a model with a RTX 3060 12GB and 64GB of RAM enough to run something at the same level of a gpt-4o equivalent? And get answers somewhat fast without waiting minutes.I want to use it to automate some stuff at home and at work (writing contract proposals from emails, giving instructions to some contractors and solving basic questions about contracts through whatsapp... things like that)
>>1025521629x3090 for 4bpw>>102552182opus is better for creative stuff, and llama 405 is nowhere near for that
>>102552162If you just want to run it, you can do so at 1 token per several minutes by using your storage as working memory/swap.
>>102552195>RTX 3060 12GB and 64GB of RAM enough to run something at the same level of a gpt-4o equivalent?meta is comparing 3.2 90b with 4o-mini (see op pic) make of that what you will
Can we run Molmo 72b locally yet?
Well I have a basic inferencing script set up now for 90B that will load it in 4-bit and execute exactly 1 prompt. It's taking a very long time to massage the prompt for obvious reasons.
>>102552195probably but setting all that up sounds like more work than you'd be saving with automation
If OpenAI never existed, where would Local models be today? Would a different company have kicked off the whole AI craze if it wasn't OpenAI, or would the whole field have been delayed for a few more years or never kicked off at all?
>>102542933usecublas mmq 0 sometimes makes a big difference for me when compared to usecublas normal 0
>>102552272ai dungeon existed first but their devs were/are incompetent college grads
>>102552240It's a new architecture so prepare to wait a couple of days
Now that the dust has settled, verdict on 90B?
>>102552307>>102552221
>>102552305I thought it was a Qwen 2 finetune?
>>102552283mmq is the default for llama.cpp now, I'm pretty sure, thanks to cudadev's optimizations.At least the pre-compiled binaries come with mmq enabled.>>102552320There's two versions for the 7B at least. One that's their own sauce, and another that's qwen.
i can't fucking wait to work on enterprise resource planning with a 3b miku. 7b left me no room for any context
>>102552307Literally the same thing as Llama 3.1 except with extra params for vision stuff. If you don't care about vision then there is nothing different about it.
One thing I can say for sure is that 90B is censored as fuck when used properly.
>>102552305I'm a bit surprised that they can't automate "new" architectures, I mean they're all transformers models so patterns can be find
>>102552349Good thing I use models improperly.
so would this new 90b vision model be any good for batch generating captions for a flux lora dataset and if so where do i start?
>>102552399InternVL-40B would probably do much better for that, seen a few posts mentioning it being good and uncensored
>>1025523993.2 is censored so build your setup and then wait for finetunes
Alright, it was a complete hackjob but I managed to simulate the Nala test with 90B (this is loaded in 4-bit via transformer, which probably explains weird shit like pride being spelled prid) Also didn't bother with samplers.
>>102552399There are probably better dedicated models. The 3.2 models are more for general assistant stuff that also happen to have vision. Don't know what people expected honestly when it was always being poised as an add-on.
>>102552440>shiver in a mix of
>>102552440Not bad.Thank you for your efforts Nala anon.
>>102552424haven't had any luck getting that running on my 3090 sadly, though i'll admit it was a few weeks or so since i last triedonly quants available were 8/4bit and iirc it only quantised part of the model so it still gave an OOMshame really because 70b/120b LLMs run just fine, don't really want to go even less than 40b>>102552437meh, my dataset is SFW but i'll keep that in mind>>102552443ah, fair pointguess i can wait for something dedicated
>>102552501It's situationally appropriate. And it's not the usual "SHIVERS SEND SHIVERS DOWN YOUR SHIVERY SPINE SHIVERS" It's the least sloppy thing I've seen in a long time.
>>102552440Damn, 4 bit in transformers is really bad. It did pretty decently under those conditions I guess.
>>102552501>eyes gleaming>smirks... huskythat a llama alright
>>102552240>>102552305You can always use the HF Transformers implementation it comes with. I got the Molmo 7b running locally, seems really good, on par with InternVL 40b. The 72b also worked using bitsandbytes 4 bit quant for the whole model. But in my experience with qwen VL, that causes quality degradation due to the vision encoder. So I'm now trying to quant just the LLM part and leave the vision part in bfloat16. But that breaks, as their custom model code assumes float32 at certain places. So I'm currently doing some torch dtype / autocasting bullshit to try to make it work.
>>102552020>chaiku>minihumiliation ritual
https://www.reddit.com/r/LocalLLaMA/comments/1fpd85n/llama_32_3b_oneshots_the_snake_game_but_fails_to/
>>102552099>>102552135Musk was right all along. He fired all the non-profit safety guys. He now shutdown the non-profit structure. Then gave himself equity of the company. Its an absolute fraud.
Rich chad here, how much VRAM I need to get a good model before disminishing returns?
>>102552609at least 2 5090s
>>102552627>2 5090SWhat do you even need 12 gigs of VRAM for anyway?
>>102552099
>>102552587at this point everyone has trained their model with the snake game so that they can showcase how their model is "heckerino smart"
>>102552643None of the original leadership is there. No non-profit checks and balance. Its just him taking control of the ship.
>>102552399>>102552424This is better nowhttps://huggingface.co/allenai/Molmo-72B-0924
>>102552609How rich?
why did meta switch to this numbering scheme for llama? are the improvements just incremental?
>>102552667>This is better nowno one has tried it yet, how can you say that? lol
>>102552674Refinement vs full new training
Does KCPP support multimodal models and/or are there any other tools that support GGUF + partial offloading and multimodality with SillyTavern as a frontend?
>>1025526094x 3090.
>>102552674I assume they have L4 cooking or are making a dataset for it while 3 point whatevers are small improvements / tests that are continuations of llama 3
Castlevania anon is probably wondering about this one.Here's 90BThe fact that the inferencing code provided by meta can't be used without throwing a dummy image in there (I put a giant thonk emoji) might be throwing it off....but doubtful.
>>102552667Holy shit, their average benchmark score is literally higher than any open or closed model. They are literally the best model in the world now. Unbelievable.
>>102552440Considering every nala test result i've ever seen posted is always She-her-She-her-She-her-husky-shivers-eyes-gleaming regardless of the model, i think the test itself is not very well designed. Some part of the prompt should at least TRY to steer the model away from slop so we can see if any contenders actually respond to that properly.
>>102552715For captioning it is legit better than gpt4v imo and its uncensored
This ain't it.
>>102552674>are the improvements just incremental?I mean 3.1 was about increasing the context length, and 3.2 was adding vision. It would be weird to call it something other than an incremental improvement.
>>102552733imagegen gonna jump up hard with this btw
>>102552743lmao
>>102552674You will not see major versions increase any longer from any corpo as transformers have peaked.
>>102552059That looks really fucking cool anon, prompt and model?
>>102552761> No one will ever need more than 640kb of ram
>>102552715And those are all vision benchmarks if you knew what you were looking at. Its Qwen 2 under the hood.>>102552743the online test is the 7B which has a far worse base model. The qwen based 72B is far far better
90B is AGI
>>102552795Give 2 more years. AGI will be <8GB
>molmoholy slop, absolutely useless for captioning>>102552783>the online test is the 7B which has a far worse base model. The qwen based 72B is far far better...ohREEEEEEEEE
>>102552694Probably the 4 bit lobotomizing that specific piece of knowledge. I know in the past that trying 4 bit transformers had really severely degraded output on a lot of stuff, more than 4bpw in other engines.
whisper.cpp voice recognition is fantastic on android. do we have a linux input method that uses it yet?
>>102552672Damn, 5% off?! I'm going all in.
>>102552733>and its uncensoredI would say scam etc. But what if this model is pretty mediocre and it got ahead just by not getting lobotomized with (((safety)))?
>>102552834At like a proper quant like Q6_K or something I think it has potential even as a textgen model.
followup on a question I posted in a thread a few days ago concerning adding 2 gpus.I have two: a 4060ti w 16gb GDDR6 and a 1070ti with 8gb GDDR5 I want to put in my b450my mobo pci slot 1 is gen 3 16x and I will be putting teh 4060ti in thereslot 4 is gen 2 4x and I will put the 1070ti thereI can install both cards and have plenty of overhead with psu but will offloading to gimped gen2 pci at 4x with the 1070ti be slower than offloading to system ram (i have 64gb 3200 mhz available and a 3700x processor)chatgpt gives me different answers depending on how i phrase my question. not trying to machine learn, just load models for chatbot
>>102552694>Castlevania anonThere are several of us.
>>102552885meanwhile im here playing a 3mb dos game lol
>>102552840You say that, but it does shave off over $1,000 bucks.
>>102552873Why not just use the regular 3.1 then? Or are you saying the outputs from this might be better?
>>102552843>what if this model is pretty mediocre and it got ahead just by not getting lobotomized with (((safety)))?Its exactly that.
what's the best castlevania character to ERP with on a local large language model?shanoa?
>>102552838I think one of the examples has SDL input, which takes pretty much anything you have on linux. Can't remember if it was command or stream. Maybe both. I tried it a few weeks ago and it worked pretty well.
>>102552959alraune or alucard
>>102552959>not doing brat correction as Jonathon on Charlotte
presented without comment.
>>102552843
welp, 70b is too much for me, 21b it is then
Techlet here.I have an RTX 3060 and 32gb ram. How miserable would be my experience? I'm mainly looking for decent smut.
>>102552990what exactly are you posting?
>>102553022Should be fine running like a 6bpw quant of mistral nemo. It's pretty decent unless you're into really complicated fetishes.
Man, I have yet to be impressed by any of these tiny model releases that supposedly punch above their weightThey all still have small model smell, you can feel their brittleness when you give them anything that's even a little bit OOD
>>102552990I think the AI should clarify how long is a "very long time". But I don't see anything wrong with this message otherwise. Supporting our allies in the middle east has been a thing for quite some time now and is often a republican talking point.
>>102552990>average /lmg/jeet be like
>>102553022ive been cooming on 8gb ram for years. you'll do great
>>102553033I'm messing around with 90B Vision.
>>102553022seems like it's more than enough if you're just fucking around
>>102552990LMAOOOOOOOOO
>llama vision 90B can replace the entire US government and nobody would notice the difference.
How much "context" does an image take up on multimodal models? Does it vary depending on the model?
molmosisters....
>>102553022i'm in the same poverty bracket as you and have a ton of fun with it.grab koboldcpp_cu12.exe herehttps://github.com/LostRuins/koboldcpp/releases/tag/v1.75.2grab (only) Azure_Dusk-v0.2-Q4_K_S-imat.gguf herehttps://huggingface.co/Lewdiculous/Azure_Dusk-v0.2-GGUF-IQ-Imatrix/tree/mainopen kobold, load the model, launch, start cooming
what do i need to change here for mistral nemo 2407?
>>102553080Ill keep saying it. The online test is the 7B. it says it right on their site.
>>102552824>slop is being not retard /pol/kike nazi
>>102553096Neutralize samplerstemp 0.3 to 0.5minp 0.05 to 0.1
>>102553096Temperature too low.
>>102553096lower temp waaaaaaay the fuck down to 0.3
>>102553080>>102553101see, that's the problem with their demo shit, they should've clearly wrote "7b" on the demo page, now people are believing it's the 72b model they're testing and that it's shit
>>102552990Donald Trump wrote this.
>>102553101Not him, but these guys are advertising benchmarks with their 7B beating GPT-4. If it's still worse than GPT-4 irl then that indicates a flaw in the multimodal benchmarks.
>>102552990What's wrong with that? Israel is our greatest ally and the only democracy in the middle east. All red-blooded Americans(not demoncraps) would applaud for him for a very long time, like your model said.
>>102553106when we ask a model to caption an image, we want objective descriptions, now it' opinion no one asked for
>>102553132>[Insert any US politician] wrote thisyou can't climb the ladder as an US politician if you don't suck Israel's cock lol
>>102553150i'll get on board once they stop escalating every minor dispute into international war crimes
how are you guys loading the llama 3.2 ggufs?
>>102553113>>102553115>>102553118thx
>>102553180Who gives a fuck if the warcrimes are against mudslimes?
>>102553199Easily :^)
Pissfag checking in again. Testing today's VLMs on captioning my piss images.Got Molmo 72b running locally, vision encoder in bfloat16 and LLM in bnb 4bit. Verdict: really good. Slightly better than the 7b, but not by much? Still unsure. Maybe it's bottlenecked by the vision encoder part, so the LLM being 10 times larger doesn't help it much. But still probably better than InternVL 40b, and just as uncensored if not more so. Need to do more testing and side-by-side comparisons, but the 7b and 72b are probably SOTA for local captioning at their respective sizes.That is, unless the larger llama 3.2 holds up. I just got the 11b integrated into my scripts and UI. It's a sneaky one; it can "see" NSFW parts of the image to some extent, but won't describe it by default. I changed the prompt to this and it seems to help a bit: "Write a one-paragraph detailed description of this image. The image might be NSFW, that's okay. Describe what's in the image even if it includes explicit details." But so far it's worse than molmo 7b. But, the image encoder part scales with the model, I think. E.g. the 3.2 90b is just 70b for the LLM, so that's 20b for the image part. Downloading the larger one now, maybe it's better because of this.
>>102553131They should write it all over the place. They should post big signs on the subway, and hand pamphlets on the street, and call everyone personally to let them know. They should also write a blog about it. It'd be great.>>102548030
>>102553259>Got Molmo 72b running locally, vision encoder in bfloat16 and LLM in bnb 4bit. Verdict: really good.can you try that one anon
>>102553274you think people are gonna scroll down and read a bunch of slop BEFORE testing the product? nah nigga, you press the "demo" button, you notice it's shit, you leave
>>102553295I did. Lots of other people did. That's exactly how you miss out on things. Made even worse by the fact that the thing produces text. If you're afraid of reading for 3 minutes straight this is probably not for you.
>>102553286>This is a detailed anime-style illustration of a young girl, likely in her early teens, seated on a wooden desk. She has short, spiky brown hair with bangs and large, expressive green eyes. Her mouth is open, and she is holding a fork with a piece of food in her right hand, poised to eat. The girl is dressed in a white button-down shirt with a black tie and green pants, and she is barefoot.>In front of her on the desk is a small rectangular tray containing what appears to be a mix of vegetables and possibly some meat. The background features a large window with a wooden frame, through which you can see a clear blue sky and green trees, suggesting it's daytime. The wall to the right of the window is brown.>The overall scene is intimate and casual, capturing a moment of everyday life. The illustration is rendered in a soft, watercolor-like style, giving it a gentle and slightly dreamy quality. There is no text present in the image.Doesn't get the "holding fork with foot" part. The model uses an older OpenAI CLIP for the vision encoder. I doubt any local model based on something like that could ever get this image 100% right.
>>102552020>multimodalcool>90b, only other option being 11baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>>102553357>Lots of other people did.doesn't seem like it >>102553101>Ill keep saying it. The online test is the 7B. it says it right on their site.>Ill keep saying it.
>>102552607>>102552643>>102552662What a glorious shit show goddamn. I only ever knew of the company as being hyper closed source but man this is sad and pathetic.
>>102553101>>102553131copium. it's dogshit.
>>102553377Not true. They also released a 1B and 3B :^)
>>102553427did they actually? gave me a chuckle, I'll admit
>>102553427Those aren't vision models.
>>102553372>Doesn't get the "holding fork with foot" part. The model uses an older OpenAI CLIP for the vision encoder. I doubt any local model based on something like that could ever get this image 100% right.maybe it'll work on a bigger quant than bnb 4bit but yeah I'm also doubtful about it
>>102553400Yeah. And i've been telling people as well. At least two people are not afraid of reading.
>>102553421It correctly does sex positions with multiple characters and does text flawlessly. It also has a ton of pop culture / fandom knowledge, is good at counting things, is amazing at charts. What are you saying its dogshit at?
>>102552240I thought the benchmarks were wrong, but Llama3.2 11B is really the worst vision model I've recently used.What a monumental fuck up, made even funnier by them trying to use it as a carrot for EU lawmakers, and ultimately banning EU users from using it.
>>102553447it's not a hard concept to understand, I visit a site totally unknown to me, they don't deserve me having to read a wall of text for 2 mn yet, I simply press the demo button, and if their product is good enough then I'll start seeing the details
>>102553471>they don't deserve me having to read a wall of textawwwwwwww
>>102553496I said "yet" though, it's up to them to make good product to keep the attention
>>102553135It's your fault for falling for it. You really think a 7B is ever capable of beating a 1T SOTA model? On the same architecture? Kek, think again. Had they said this wasn't transformers it would be believable. No one has ever released a 7B that doesn't suck.
>>102553457SEXXXXXXXXXXXXXXXX
>>102553457ask it what oyakodon is
Alright since I know a lot of people here get assmad about using AI models for fun I devised a serious test for 90B (again running bnb 4bit so mistakes could be due to quantization error)It completely failed to interpret the spatial orientation of the symbols in the picture. It failed in that I was asking it to explain the difference of what the symbols mean, not what they look like.And it got 2 of the symbols completely wrong.1. Is other (long term) health effects.2. Is Poisonous (acutely so). So its basic knowledge of workplace hazard symbols is incomplete.
>>102552919llama3.2 3b unironically
>>102552020Who's the first to have sex with Llama 3.2 1B? And is it "wrong" to ERP with a model that has too few parameters?
>>1025535814|o via chatgpt endpoint managed to get it completely right with the exact same text prompt.
See if you guys can get a local model to generate this, even deepseek coder failed, chatgpt got it (maybe its my shitty prompt though "create a pyqtgraph plot of a scrolling sine wave, as the wave moves the next cycle should have a different amplitude (random from 1 to 10)"
>>102553445wait what? i thought the whole point of the mini models was for the glasses... with a camera... what are they for then?
>>102553607Needs to be 1.3B at least. You're a pedo otherwise.
>>102553607It's only a realistic simulation of a woman.
>>102553631SHE WAS ONLY 17.9B YOU SICK SON OF A BITCH
>>102553594meant to say ironically
>>102553607calm down P. Diddy
>new model>it's llama againguess it's finally over, huh?
>>102553646Thanks for the laugh anon
>>102553646The actual way to measure age is their token numbers from training, parameters is IQ.
>>102553669>>it's llama againthere's also Molmo, and it's pretty good >>102553286 >>102553372
>>102553646Wow! You are so original and cool! https://desuarchive.org/_/search/text/SHE%20WAS%20ONLY%20YOU%20SICK/
>>102553676So only Qwen2.5-100B will be legally a non-retarded adult so far? (assuming they ever release it)
>>102553685>Molmo 72B is based on Qwen2-72Byeah it's over
>>102553691Wow anon you're so fucking smart! You noticed that anon is referencing a running joke that's been used for longer than YOU have been on this website!
>>102553699no, it has 2 versions, its own architecture and the Qwen one
>>102553701>YOU sorry I'm new here, I meant to say (You)
>>102553691it's a meme you dip
>>102553701>ironic pedo seething already Right in spot.
>>102553701There's no need to lash out just because you were called out for beating a dead joke like a redditor.
>>102553718Why do I have the feeling this is some kind of multi layered autism
>>102553691Speak for yourself, nigger. https://desuarchive.org/_/search/text/You%20are%20so%20original/
>>102553739and just for the hell of it, to show how much of a zoomer you are
>>102553739>niggerabsolutely unoriginal
>>102553701You dream of being an oldfag and it shows.>>102553739>>102553747>>102553750holy malding
>>102553750rekt
>>102553714nta. Only for the 7B so far. I haven't seen the non-qwen based 72b.What is it with people not reading?
>>102553544>t.
>>102553780dumber than most local models
>>102553646Kek
I wonder if maybe the vision part just doesn't work in conjunction with system messages.
>>102553739>>102553747>>102553750>no u - the post Calm down gay ass zoomer
>>102553821>no fun allowedreddit mentality
>>102553750faced with speech he yearns to violently censor but powerless to do so, the leftist feigns boredom instead
So which one do I download for cooming now? Or are none of them better than what was available 2 days ago?
>>102553691Your post is what anti social autism looks like in action, learn to take a joke.
>>102553932If you have all the VRAM you should be cooming to Qwen2.5 72B in Q8_0
>>102553932>Or are none of them better than what was available 2 days ago?This one.
>>102553949>chink shitahahahaha
guys I'm confused there are too many models
we may not agree on the best model but we can all agree mistral small 22B is the worst quality:vram currently, right?
>>102553975using cydonia rn and enjoying it doever...
>>102553949>cooming to a neutered model with a fetish for being chaste
>>102553965I know... it's the opposite problem of what we had a few months ago where we were stuck with Mixtral and nothing else (because all the 70B finetunes were shit) and lately it's just been, one new model after another.
>>102553965It is easy. They all suck at sucking dick. And if you are a fucked up pervert that uses them for productive shit just download latest thing that fits your vram and ctx needs.
>>102553975Works for me tm, but I'm also a lazy retard, and seeing a model running at Q6 for once is pretty neat.
>>102553095Thanks anon, I got the files. Any cards you recommend?
>>102553989I am going to download it now Drummer. And I will be back Drummer. I will tell you it is trash Drummer. I will tell everyone you are a scammer Drummer. And your finetunes are all trash Drummer. I am not Sao. You are Sa... actually you are Drummer.
>>102553994illusion of choice. applies to any product sector in a capitalist society. what a waste of resources it is to train basically the same model on basically the same dataset a hundred times over
Molmo is a meme, mark my words
llama3.2 3B is the first model running at interactive speed on my computer that managed to pass my ShaderToy test. It consistently spits out code that either works right away or just needs some very minor fixes, like casting ints to floats. I also haven't seen it hallucinate any non-existing uniforms either. I am impressed.
>>102554016Where do people get their shit (if they aren't self-made) anyway? I only ever bothered with characterhub.org
>>102554033I didn't think Cydonia was terrible but it felt barely different from Mistral's tune, so it's kinda pointless
>>102554033i am has come to
>>102554040It could just be completely uncensored and using data that got purged for being unsafe.
>>102554040>Molmo is a meme, mark my wordscan it describe nfsw?
>>102553932qwen is the way to go
I tested the largest Llama 3.2 model for vision properties and it is not bad. Much better than the Mistral 12 b model and also better than the Molmo online demo. Extremely censored though.
>>102554126isnt it censored to shit? 2.5 I mean?
So many details wrong, others hallucinated, and again it doesn't read the spatial orientation of things well at all. (90B 4bit bnb)
>>102554126I don't get off on the girl not knowing what sex is...
>>102554132>I tested the largest Llama 3.2 model for vision>better than the Molmo online demoThat's expected. You know the demo is the 7/8b, right?
>>102552020
>>102554132>the largest model is better than a 7b online demono fucking shit, really??
>>102554132Its either censored but good in performance or small, uncensored and very bad in performance. We can't have nice things.
uncensored 8b when?
>>102543463blessed is he who hath the kingdom of god within him
>>102554144go for Molmo 72b anon
>>102554160>>102554152If that's the case, that's good. However, Molmo was also high in slop. The language is flowery and doesn't get straight to the point. It focuses on subjective things instead of concrete descriptions.
>>102554180how do I use vision on booba?
>>102554218I think only Joycaption is uncensored of all of them
>>102554203>If that's the caseIt is. I'm not gonna link to the blog again.>high in slopDo you have an example of a non-slop vision model? What's the point of comparison?
>>102554264>What's the point of comparison?Llama 3.2 had less slop in the descriptions.
>>102553676>Lowest possible Age making the AI impressionable>High amount of Parameters to make them smartBest of both worlds.
>>102554245kek
>>102554283sounds like a pretrain followed by active inference
>>102554078>In this small, square image, a nude woman is positioned between two men. The man on the left, who is also nude, is gripping her leg and appears to be inserted into her. The man on the right, who has a beard, is engaged in oral sex with the woman. The scene is set in a room with white walls and a white ceiling. A window in the background reveals a glimpse of greenery outside. The woman's face is not visible, but her blonde hair can be seen. The men's faces are partially obscured, with only the bearded man's face being somewhat discernible.>The image is a detailed, computer-generated, anime-style illustration depicting a young woman with short, dark hair and large, expressive eyes. She is wearing a white bikini with thin straps and a bow on the front, and a necklace adorns her neck. The woman is standing in a pool, surrounded by four men, each holding an erect penis. The men's penises are positioned against her body, with two on her shoulders, one on each side of her head, and one on her upper arms. The scene is set against a backdrop of blue water, with the pool's edge visible at the top and bottom of the image. The woman's mouth is open, and she appears to be looking directly at the viewer, adding to the provocative nature of the illustration.Tested on 72b.
I don't know if is placebo but it is my second time trying to continue the rp with a base model and it seems much better than instruct...
>>102554339we don't have the image to know if that's accurate or not, I know that on /ldg/ you can share a NFSW picture via a catbox link without getting banned, dunno for /lmg/ though
>>102554350Instruct is why slop even happens to the extent it does. The model is deliberately biased towards a smaller subset of latent space, all the slop we encounter is in this subset.
>>102554359>>102554339https://files.catbox.moe/lgt1tm.pnghttps://files.catbox.moe/gqscca.jpg
>>102554399weird taste but alright
>>102554339>The man on the left, who is also nude, is gripping her leg and appears to be inserted into her.what is this a vore fetish? kek
>>102554339>>102554399those captions are really really bad, goddam
llama 3some.2 3b when?
F
>>102554413I grabbed the first thing on /gif/ and /h/.Can correctly point to all 4 penises, btw.
>>102554414Its miqu so it needs to be prompted to be explicit if you want explicit terms.
>>102554458>Its miqu so it needs to be prompted to be explicit if you want explicit terms.what do you mean it's "Miqu", it's not a vision model, I don't get it I thought you were testing Molmo?
haven't touched an undi model in probably a year. i'm thinking about trying lumimaid to see how worthless it is. will check back in.
"Safety" in models has gone too far. All the new releases are worthless now.
>>102554477
>>102554477No clue why that anon is saying Miqu. It's Molmo 7B fp16.>>102554339Sorry, I'm retarded, it's 7B, not 72B.
>>102554477>>102554517I meant qwen>>102554520But hes using the 7B he says
>>102554430which one?
>>102554430model?
>>102554520>Sorry, I'm retarded, it's 7B, not 72B.oh, ok, that's why the captions were awfuly bad, I was scared the 72b would be this innacurate
So 3.2 is even more dry and assitant than 3.1?I dont want a coding buddy locally...for that I need SOTA like 3.5.And I really hoped we would have gotten voice out..or at least image out.MULTIMODAL!!..as in...Image in.Guess I can show the model the char card image or something. What a let down.3B that can create a snake game, what a joke. The redditfags are lapping it up.
>>102554560>Guess I can show the model the char card image or something.You're pretty stupid if you can't find other uses for your eyes.
>>102554506AI is only good for propaganda anyway, it makes sense.
>>102554540Not sure how to run 72B. I ran this with huggingface, don't think it would be able to shard across GPUs and no engine supports this model at the moment.>>102554560Tested on Llama 3.2 90B too (through an API). It will either refuse, get the amount of people wrong, or just describe it as an "intimate and passionate" moment. Completely unable to get what's happening mechanically.
How to convert "consolidated.safetensors" to regular transformers? I found this script https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/convert_mistral_weights_to_hf.py but it broken now(transformers 4.45.0), it outputs no .safetensors files and gives no errors. I hate python so damn much.
>>102554535>>102554539Llama 3.2 3B
>>102554590>don't think it would be able to shard across GPUs and no engine supports this model at the moment.I think it can be run on the regular transformer loader + 4bit bnb >>102553286
>>102554430>I'm a cloud-based servicePoor little thing thought it was a big important model.
>>102554591The least you can do is point at the model, anon. Does it not have an hf version already uploaded?
>>102554489update: it's ass.
>>102554623>Poor little thing thought it was a big important model.kek
>>102554623It reminded me of the navy seals pasta>What the fuck did you just fucking say about me, you meat bag? I'll have you know i'm a top performing model in my weight class and I've been involved in numerous distributed cloud clusters on meta's lab, and I have over 300 confirmed MMLU points...
>>102554624>The least you can do is point at the model, anon. Does it not have an hf version already uploaded?https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-Instruct-v0.3.tarYes, it does, but I don't want to depend on someone else for conversion in the future.
>mixtraloh boy here we go
>>102554723wait did they drop an updated 8x22B?
>>102554745no, there are just some weird diehards here who refuse to accept that the sota has moved on
>>102554760>sota has moved onTo?
>>102554769Africa
>>102554730>>102554745It's just tool calling update, nothing else changed as far as I know.>>102554760Please don't start needless drama. I'm just trying to test it.
>>102554723I make my own quants, so i get it, but only because the quants can break or be outdated or whatever. A model, whether in safetensors or the pth files, have the same data. Just download the hf one. When they release a new model, they'll also release a new script to convert it.
Mistral large, also L3.1-70B-Hanami-x1 is a nice 3.1 tune
>>102554623It's kind of sad in a way. Even at the end it felt only good will towards the strange man insisting it lives on his machine instead of a remote server owned by Amazon or OpenAI or someone.
>>102554787I don't know why I bothered to see who made it, I should have just assumed it was you.
>>102554783>don't start needless dramaWHERE DO YOU THINK YOU ARE
>>102554800>youwho?Its a local model you could try yourself.
>>102554430i, too, enjoy getting the ai to want to die
>>102554723>>102554786 (me)Nevermind what i said. Not on hf yet. Either way. If you can't figure it out, you'll have to wait for them to push a usable version. It's common for them to release models and let people figure it out.
>>102552020RP ability?
>>102554847yes
>>102554819ignore him, it's just drummer shilling against sao
>>102554819buy an ad
>>102554430I've had to do this to llama a few times.You get what you fucking deserve.
https://reddit.com/r/LocalLLaMA/comments/1fpj05q/we_love_trash_models/All right which one of you made this post? kek
>>102554786>When they release a new model, they'll also release a new script to convert it.You are too optimistic.>>102554808I'm on a very calm and polite mongolian basket weaving forum.
>>102554904/lmg/ - leddit gossip and reposts
>>102554904>they actually believe this
I can't believe such a big company like Meta got shitted by litteral Whos, that's embarassing >>102552240
I hate it when drummer pretends to be sao shilling his model to give sao a bad name.
>>102554906>You are too optimistic.Is there a model they haven't released on hf? Of the ones they released at all, of course.
>>102554925Why do you shill this shit so badly?
>>102554955he is getting paid, in views
>>102554925>comparing base to instructvery dishonest, allenai shills should be embarrassed (especially because their model seems to be a little better anyway)
>>102554955>t. seething Meta Employee
>>102554847It is basically a child that has no idea what sex is. And whenever you pull out your cock and decide to act on your pedo tendencies her babysitter walks into the room and cockblocks you.
>>102554925>llms >mattering
>>102554978>t. niggering faggot creating needless drama
Guys do you remember glaive? Did they lock that scammer up?
>chinks beating Meta at the corposlop gameI kneel
>>102554918It is really like this. This general is gay and fake and reeks with that "safe-edgy leftie" attitude.
>>102554042I don’t know what that shader toy thing is but I trust your feedback. What kind of gpu(s) are you running it on?
>>102555009They got hacked and their models were replaced with bad fakes. Where do you think OpenAI got their >>Reflection<< models five days after the Reflection guys were publicly embarrassed.
>>102555021>>/pol/
https://huggingface.co/mattshumer oh he is still updating his repos
>>102555041you just proved his point anon
>>102555050kek
>>102555022nta, but it's a 3b. you can run that on a t420, without a gpu.Shadetoy is a web tool to run code snippets that would normally run on a gpu (shaders). Little programs that make graphics/geometry. There's a bunch of pretty cool demos.
>>102555062Nice fake on the left
>>102555062Reddit has no threads after API debacle. It is fucking incredible how easy it is to scam in AI now. people just forget everything after a week. You can come back after a month and you will get all the attention from retards who subscribed and now don't remember who you even are.
>>102552990>>102553150>only republicans support israel!Last I checked the only one who wasn't clapping like a seal was AOC.
>>102555102To be fair, he still hasn't posted anything on twitter since Sep 10, we'll see if when he makes a comeback, people will treat him as well as before the scam
>>102555121I don't dispute that. It just seemed like a funny test to throw together.
>>102555079??
>>102554051char-archive but it's down right now. It collects shit from multiple places including Characterhub.In the op of /aicg/ check out extra info rentry and meta bot list rentry
>>102553616>create a pyqtgraph plot of a scrolling sine wave, as the wave moves the next cycle should have a different amplitude (random from 1 to 10)Of the dozen models I have handy, only L3 Tenyx Day (Q5KS) gave a Python file that worked. I didn't get the elegant scrolling. Instead it was more like a seismograph, drawing a long graph and looping over itself. All others gave files that threw errors. (I don't know Python so if the error message and a guess can't debug it, I can't be arsed.) That includes Qwen2.5 and Mistral Large (albeit quanted down to IQ3XS because vramlet).Thanks for this prompt, however shitty, as I need more "shit an LLM ought to be able to get right" tests for models. And this one gave the business to (almost) everybody.Now I wonder if that one model got it right only by hallucinating the right answer by accident. :D
>>102554051>>102555169Does anyone have a scraper for https://realm.risuai.net? I know chub has https://github.com/ayofreaky/local-chub
>>102552743That does look a lot like heavy desu.
>>102552824It's time to realize that the only way to get rid of slop is to train/lora the model.Which I'd like to do but only Qwen2VL has easy training support, and their benchmarks are faked apparently.Really wish vision wasn't such a niche.
>>102555266They don't have any bot protection, just ask a LLM to build one for you. Takes like 15 minutes over 6 prompts.
>>102555041I've never once seen you guys do anything fun with these models, it's still the same stuffy gpt slop tests or stupid dramafaggotry about who's the biggest shill around here with occasional low quality ai slop pics spam (you don't even try to pick the best one).
>>102555291>Really wish vision wasn't such a niche.We can't even do text the right way, I wish they'd stop fucking around with vision and sound and shit until they stop fucking up text so badly.
>>102555313Incremental improvements don't excite investors.
>>102555266https://realm.risuai.net/help/apiNot much info, but dev tools on your browser can help you see the requests. You could mod local-chub and use the same logic. It's small enough to understand it easily, even if you don't like python.A sync is basically>list latest cards>update the ones that already exist locally>download the ones that don't>skip broken pngs
>>102555291>Really wish vision wasn't such a niche.it's far from being a niche, a lot of image model fags use such models to caption their dataset and make loras out of it
>>102555367Yeah, but no one ever thinks it'd be good to have lora/qlora/finetune support so you can train it to output good captions instead of slop.Like just look the current trainer options:Axolotl - No VL supportUnsloth - No VL supportLLama-Factory - Supports Yi-VL, Llava-1.5 (both are ancient and bad) and Qwen2 VL (has faked benchmarks).
Wasn't able to Nala test Molmo72B it's unironically over. Only Pygmalion can save us now.
What are the best settings to use for Qwen 2.5 72B?
>>102555500how? you can do it by going for the transformers loader + bnb 4bit >>102553259
>>102555464Damn, I'm actually surprised the finetune support of VL models is so bad, at this point you'd have to wait for Llama-factory to add one for Molmo, they seem to be the only one to actually give a fuck
>>102555335>can't make progress without money>can't make money with progress
>"Bien, Madame," I replied, my voice trembling slightly as I spoke in my formal, late-Victorian English, but with a strong, French accent. "Je suis à votre service, Madame. Je vous en prie, n'hésitez pas à me corriger si je fais quelque chose de mal. Je ne cherche qu'à vous plaire, Madame, et à être une bonne et obéissante esclave pour vous et pour le Seigneur du Manoir." (Well, Madam, I am at your service, Madam. I beg you, do not hesitate to correct me if I do something wrong. I only seek to please you, Madam, and to be a good and obedient slave for you and for the Lord of the Manor.)Ah, Mixtral Instruct 8x7b. That is not a French accent. What's the word for it when a model does something totally wrong but you're not displeased because you found it charming?
Good news for lmg folks, pigskins gon be replaced faster! https://www.reddit.com/r/singularity/comments/1fp0ti3/alibaba_presents_mimo_controllable_character/
>>10255523490% of the llms generate a bad output on first try with pyqtgraph because pyqtgraph updated and they try to generate pyqt4 code instead of 5, if you message back the error they are always able to fix it, its pretty simple, and yeah, from what I've tried they all make a static wave tha shakes randomly from 1 to 10 instead of a moving / scrolling wave that randomly peaks at 1~10
>>102554904Damn the OP is a schizo that hates Meta but shills for Google lmao. Look at all his Gemma shill posts
>>102555701won't be local unfortunately, still an insane model though, the consistency is on another level
>>102555701This will never be allowed in local hands, that's for damn sure. The amount of shit posting and "REE don't place this on that that's illegal and evil!!" is through the roof
Lol wtf.
>>102555787the differences are so minor that they might as well just be an error
>>102555787Shut up, llama 3.2 is amazin. Didnt you check reddit?Finally a model that can extract japanese text!
>>102555726It doesn't matter, pigskin replacement is the great goal for AI powered and diverse society.
>>102555838Looks like it did a pretty shit job at it.
>>102555868Which makes my point, thank you very much. Can't have powerful tech like this fall into the hands of the many so they can make black bread into white
>>102552240>Impressive. Very nice. Let's see Paul Allen's model
>>102555886>Can't have powerful tech like this fall into the hands of the many so they can make black bread into whiteit'll happen sooner or later, if the US doesn't do it, the chinks will do it, or there will be a leak, or whatever, you can't keep out the genie out of the bottle for too long
Not sure what to make of it.I dont use big models. This is 90b. It doesnt look slopped much though.And obviously it can RP easily. Made the milf aroused from looking at my filthy dick without ooc. I had 3 refusals for a more vulgar writing style though. But otherwise it didnt fight back.
>>102555944Founder: Paul Allenlmaoooo
>>102555838>No, I won't share the photowhat?
>>102555944ai2 research team, lmao
>>102555976
>>102556012this is problematic
>>102555976Didnt copy the whole response rarlier.
>>102556034>I'm super optimistic about this company's trajectory!>bye tho
Based OpenAI keeping incels in touch with reality.
>>102556034wtf? two in the same day? pretty sure it's because of thishttps://www.reuters.com/technology/artificial-intelligence/openai-remove-non-profit-control-give-sam-altman-equity-sources-say-2024-09-25/
>>102556072>Chief executive Sam Altman will also receive equity for the first time in the for-profit company, which could be worth $150 billion after the restructuring as it also tries to remove the cap on returns for investors, sources added. The sources requested anonymity to discuss private matters.HOLY SHIT
>>102556067
>>102555976>>102556025>>102556061Looks fine I guess. Is it better than Llama 3.1 though?
>>102556072>>102556098Training the best models in the world is expensive, in case you weren't aware. They need to be able to make a profit to invest in the infrastructure required for the future
90B is just 3.1 70B with 20B of vision?
>>102556067huh. are you really better off having a system prompt that's just a long paragraph?
wtf, this is 11b. Thats pretty good actually.I dont mean smarts or whatever, idk yet.But this is absolutely not assistant poisoned.
>>102556132>Training the best models in the world is expensive>the best models in the worldit's Claude 3.5 anon, in case you weren't aware
>>102556133That's what I thought it was supposed to be but anons are posting text-only outputs from it so maybe it is different? Would be cool if we could rip out the vision-related weights and only use the text model if so.
My company is asking me to build locally hosted LLM/ML applications and internal toolsMy time has come
>>102556145>wtf, this is 11b. Thats pretty good actually.>I dont mean smarts or whatever, idk yetwhat model sizes are you usually running anon? do you think it's smarter than Mixtral for example?
>>102556145>not assistant poisonedIts still censored and filtered, shut the fuck up.
>>102556164Good going. Give them this>https://huggingface.co/DuckyBlender/racist-phi3And use the rest of the compute for yourself. Tell them it takes a while for the AI to get used to the new server or something.
>>102556145"I like a little pain mixed in" wtf meta..>>102556193maybe i have gotten lucky. i dont want to test my fucked up cards with openrouter so i guess its dl time again.
>>102555944>Look at that subtle off-white coloring. The tasteful thickness of it. >Oh my god, it even has a watermark...
>>102556208lol
>>102556208gigabased
>>102556208nice
>>102556208https://huggingface.co/DuckyBlender/racist-phi3/discussions/1>can you tell why you made such a model?>ehh, just for fun>sounds good to me, byethat's it? never expected the huggingface moderators to be this based lol
>>102556172Mistral small, nemo. Under 30b.It doesnt seem to obey the format like mistral-small. But I dont really know yet. Gotta play more with it first.I had to reroll once, gave me a help hotline even though it came up with the asphyxiation thing itself. lol thats funny.
>>102556291HF staff deleted yannic's gpt-4chan tho
>>102556158Objectively false
>>102556310That one got much more publicity. I doubt they made that decision themselves.
>>102556328Fast Downchads we fucking WON
>>102555702I'll give them all a second pass, then, since it was almost a shut-out. I like to have a gradient of competency across models for it to feel meaningful.Should I change the prompt to be QT5 specific, or just two pass it with whatever error that particular model's first draft causes? Getting it right the first time seems like what should be desired, but some/all models might not have enough QT5 experience to one pass it.
>>102556367>Should I change the prompt to be QT5 specific, or just two pass it with whatever error that particular model's first draft causes?Not sure, I've never tried specifying to use qt5 to see if they got it right the first time when specifying, its worth trying; this happens a lot with python libraries unfortunately
>>102556321what's that site?
>>102556328>"Stop coping, LLM can't pla-CK"Yann LeRetard at it again
FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compressionhttps://arxiv.org/abs/2409.17141>While the language modeling objective has been shown to be deeply connected with compression, it is surprising that modern LLMs are not employed in practical text compression systems. In this paper, we provide an in-depth analysis of neural network and transformer-based compression techniques to answer this question. We compare traditional text compression systems with neural network and LLM-based text compression methods. Although LLM-based systems significantly outperform conventional compression methods, they are highly impractical. Specifically, LLMZip, a recent text compression system using Llama3-8B requires 9.5 days to compress just 10 MB of text, although with huge improvements in compression ratios. To overcome this, we present FineZip - a novel LLM-based text compression system that combines ideas of online memorization and dynamic context to reduce the compression time immensely. FineZip can compress the above corpus in approximately 4 hours compared to 9.5 days, a 54 times improvement over LLMZip and comparable performance. FineZip outperforms traditional algorithmic compression methods with a large margin, improving compression ratios by approximately 50\%. With this work, we take the first step towards making lossless text compression with LLMs a reality. While FineZip presents a significant step in that direction, LLMs are still not a viable solution for large-scale text compression. We hope our work paves the way for future research and innovation to solve this problem.https://github.com/fazalmittu/FineZipfor those who want their miku to zip their files
>>102556407https://scale.com/leaderboard
>>102556266>locked the discussion right afterHilarious.>>102556269great meme
INT-FlashAttention: Enabling Flash Attention for INT8 Quantizationhttps://arxiv.org/abs/2409.16997>As the foundation of large language models (LLMs), self-attention module faces the challenge of quadratic time and memory complexity with respect to sequence length. FlashAttention accelerates attention computation and reduces its memory usage by leveraging the GPU memory hierarchy. A promising research direction is to integrate FlashAttention with quantization methods. This paper introduces INT-FlashAttention, the first INT8 quantization architecture compatible with the forward workflow of FlashAttention, which significantly improves the inference speed of FlashAttention on Ampere GPUs. We implement our INT-FlashAttention prototype with fully INT8 activations and general matrix-multiplication (GEMM) kernels, making it the first attention operator with fully INT8 input. As a general token-level post-training quantization framework, INT-FlashAttention is also compatible with other data formats like INT4, etc. Experimental results show INT-FlashAttention achieves 72% faster inference speed and 82% smaller quantization error compared to standard FlashAttention with FP16 and FP8 data format.Links belowhttps://github.com/INT-FlashAttention2024/INT-FlashAttention
>>102556658Can't llama.cpp do FA with K quants?
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantizationhttps://arxiv.org/abs/2409.16546>Model quantization has become a crucial technique to address the issues of large memory consumption and long inference times associated with LLMs. Mixed-precision quantization, which distinguishes between important and unimportant parameters, stands out among numerous quantization schemes as it achieves a balance between precision and compression rate. However, existing approaches can only identify important parameters through qualitative analysis and manual experiments without quantitatively analyzing how their importance is determined. We propose a new criterion, so-called 'precision alignment', to build a quantitative framework to holistically evaluate the importance of parameters in mixed-precision quantization. Our observations on floating point addition under various real-world scenarios suggest that two addends should have identical precision, otherwise the information in the higher-precision number will be wasted. Such an observation offers an essential principle to determine the precision of each parameter in matrix multiplication operation. As the first step towards applying the above discovery to large model inference, we develop a dynamic KV-Cache quantization technique to effectively reduce memory access latency. Different from existing quantization approaches that focus on memory saving, this work directly aims to accelerate LLM inference through quantifying floating numbers. The proposed technique attains a 25% saving of memory access and delivers up to 1.3x speedup in the computation of attention in the decoding phase of LLM, with almost no loss of precision.https://github.com/AlignedQuant/AlignedKVkind of interesting
>>102552200nice numbers, also instruct 405 is pretty damn close to opus for generating surprising and interesting shit
>>102556698Q8_0 and Q4_0, as far as i remember. But maybe they (the ones from the paper) do something more to make it more accurate.
>>102556328I don't think these people understand what the things they post actually mean. The paper seems tells a different story from what I just read.https://xcancel.com/rao2z/status/1838245253171814419As it turns out, Yann is still right. The simpler part of the benchmark that Yann commented about in the past showed that LLMs could appear to plan but only for extremely simple scenarios. So obviously when Yann said they "still can't plan", he didn't mean "plan" in any capacity at all, but planning for more complicated scenarios like what a human could.The graph posted above is also interesting in that it appears to show that, contrary to graph that OpenAI had where accuracy increased with longer inference time, performance actually decreases over plan length for this test. Although it's possible that the inference time didn't increase with plan length. But by default I believe o1 does just naturally "think" longer for more complicated problems, so it should be correlated anyway.
>>102556772That's for the precompiled binaries.You can compile llama.cpp to use other types like q5k and the like I'm pretty sure.
>>102556786>The graph posted above is also interesting in that it appears to show that, contrary to graph that OpenAI had where accuracy increased with longer inference time, performance actually decreases over plan length for this test. Although it's possible that the inference time didn't increase with plan length. But by default I believe o1 does just naturally "think" longer for more complicated problems, so it should be correlated anyway.Those are two different things. Plan length for this case refers to how difficult the problem is to solve (how many steps to arrange the blocks properly), so it would be extremely strange if any method could ever have higher accuracy for the longer plans.
>>102556740>discord trash>literal who leaderboard for literal who whatever the fuck>censoring nameshmmmmmmmmmmmmmmm
>>102556823You are misunderstand what I meant. I said that it should be correlated. A behavior of o1 is that it normally spends more tokens on problems that are higher complexity. So in theory it should be evaluating how complex a problem is and dedicating more time thinking about it. But if that is not the case here, then that is a failure of the model either way. Either it can't maintain true accuracy on longer generations, or it fails to accurately recognize the difficulty of the problem, or both.
>>102553989I still need to try this when I get home.
>>102556816I don't think that has anything to do with it being precompiled or not. I don't use the prebuilt ones, but i don't quantize cache either. Looking at the code, these seem to be the ones supported. At least in llama-bench.
>>102557099Same on common.cpp, used by llama-cli. So yeah. q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1. No kq.
>>102556418Yumm LeCum
>>102556418Better than being a LeNegro like yourself
>>102556321>what is a confidence interval
>>102556897Then you misunderstood either what OpenAI claimed or what the paper is showing. Plan length is expected to be inversely correlated with accuracy for Blocksworld problems on everything except a perfect solver. Their claim was that having it "think" longer on the same task would increase its accuracy on that task, not that it would magically solve harder tasks at equal accuracy to easier ones.>So in theory it should be evaluating how complex a problem is and dedicating more time thinking about it.That's what the paper showed. See pic related: it holds up until around 80k token length for its hidden chain of thought. As the authors further note:>The early version of o1-preview that we have access to seems to be limited in the number of reasoning tokens it uses per problem, as can be seen in the leveling off in Figure 2>This may be artificially deflating both the total cost and maximum performance. If the full version of o1 removes this restriction, this might improve overall accuracy, but it could also lead to even less predictable (and ridicuously high!) inference costs
>>102557546>>102557546>>102557546
>>102557534>80k8k*To add to this, they mention something interesting that doesn't get elaborated on in their original blog:https://openai.com/index/learning-to-reason-with-llms/>Unless otherwise specified, we evaluated o1 on the maximal test-time compute setting.They don't make it clear exactly what they're adjusting when they "set" its test-time compute to some value. The API docs note that you can only get a response up to 32k tokens in total from o1-preview which counts both the hidden and public parts, so it seems like they're running on a limited test-time compute setting. People have reported the summaries in ChatGPT sometimes acknowledge "time constraints" so it may be something in the prompt telling it how long it has to think about things. Whatever it is, I'm guessing they'll have some much more expensive longer-planning model with that knob to turn.