/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108470850 & >>108466262►News>(03/26) CohereLabs releases Transcribe 2B ASR: https://hf.co/CohereLabs/cohere-transcribe-03-2026>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts>(03/26) ggml-cuda: Add NVFP4 dp4a kernel #20644 merged: https://github.com/ggml-org/llama.cpp/pull/20644>(03/25) LongCat-Next native multimodal 74B-A3B released: https://hf.co/meituan-longcat/LongCat-Next>(03/25) mtmd: Add DeepSeekOCR Support #17400 merged: https://github.com/ggml-org/llama.cpp/pull/17400►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
gemma4 20b dense model is going to fucking rock
>>108476333So true :rocket_emoji:
>>108476286>no dipsy :(owarida
►Recent Highlights from the Previous Thread: >>108470850--Qwen 3.5 vs Gemma 3 roleplay performance and prompting techniques:>108471259 >108471297 >108471330 >108471359 >108471422 >108471438 >108471446 >108471479 >108471501 >108471482 >108471497 >108471508 >108471518 >108471541 >108471589 >108471603 >108471618 >108471554 >108471568 >108471578 >108471544 >108471443 >108471520 --Qwen 3.5's over-reasoning on simple tasks:>108471364 >108471367 >108471378 >108471390 >108472699 >108471527--Qwen3.5 30B A3B tradeoffs vs 27B dense model:>108471693 >108471700 >108471708 >108471710 >108471715 >108471717 >108471749 >108471721 >108471727 >108471815--Modifying Qwen 3.5's Jinja template and distributed model inference performance:>108472666 >108472707 >108472734 >108472816 >108472827 >108472845 >108472828 >108472855 >108472862 >108472865 >108472886 >108472924 >108472999 >108473093 >108473113 >108473144 >108473160--Ultra sparse MoE models and llama.cpp support limitations: >108473777 >108473791 >108473830 >108473841 >108473872 >108473912 >108473864 >108473886 >108473900 >108473901 >108473917--Qwen model size selection and speculative decoding for web crawler agents:>108473146 >108473228 >108473257 >108473262 >108473276 >108473299 >108473316 >108473368 >108473414 >108473419 >108473625 >108473666 >108473415--Gemma 4 rumors surface with Arena testing screenshots:>108473733 >108473747 >108473748 >108475310 >108475347 >108473754 >108473756 >108474182 >108474196 >108474195 >108474207 >108474231 >108474362 >108474453--Dual-GX10 setup experiences and model recommendations:>108472526 >108472542 >108472572 >108473106 >108472599 >108472643 >108472689 >108472715 >108472759 >108472875--Academic dispute over TurboQuant's alleged misrepresentation of RaBitQ:>108471244 >108471310--Miku (free space):>108470896 >108470906 >108475541►Recent Highlight Posts from the Previous Thread: >>108470853 Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108476333i get it because rocks are dense
>>108476286https://www.youtube.com/watch?v=ZkRYH6PEP5kcomment section
>>108476389The actual reason is more likely the US government wants sora exclusivity as a propaganda machine and doesn't like shitlords making political memes about cheeto hitler losing with it, no matter how many billions they lose they will be bailed out by US tax payer money indefinitely.
>>108476389Sora was shut down because they need all the compute to keep Netanyahu alive
>>108476449That and as a world model to train military killbots.
I love all my friends in /lmg/
>>108476484awww
>>108476470Oh those drones run on 8GB Jetson Orins and don't need such a complex model.
>>108476449>US government wants sora exclusivity as a propaganda machinethis is retarded as there are tons of alternatives many of which are actualy better.
>>108476493Yes but OpenAI is the government contractor now not those other (chinese) models.
>>108476489World models are used as environment in which to train the final deployed models, not to be deployed itself.
Talk me out of buying a ASRock Radeon AI PRO R9700 Creator 32GB
>>108476501Fair point.>>108476503Do you intend to use it with Linux?
>>108476503Do it. Do it. Do it.
>>10847650332gb is nothing yet it's still amd
>>108476508>Do you intend to use it with Linux?Yes.>>108476513I prefer to run llcpp via vulkan anyways
>>108476498and?people don't care about who's a gov contractor or not.the muh exclusivity argument is defeated by the fact that we don't need sora to make vidgen.
>>108476520I won't talk you out of it then since it will actually work pretty well for you. Level1techs has some videos using them if you haven't seen them.>>108476530People don't care but the US government has paid OpenAI and wants to use the compute for propaganda, so the people aren't allowed to use it anymore, it's very easy to see.
qwen-3.5 27b is best for 5090? do i need any special command line options for running it on llama.cpp? i really would prefer not to use offloading to cpu
>>108476539>has paid OpenAI and wants to use the compute for propagandayes? and ?that doesn't give them any exclusivity to vidgen.>so the people aren't allowed to use it anymoreand?there are better options.
>>108476572Are you ESL or just retarded?
I opened my girlfriend's port to the internet and now people are trying to stick their dicks in. I hope llama server is secure.
>>108476614That isn't very smart.
>>108476520Will you want or need to use Pytorch i.e. image diffusion? AMD's Pytorch support is atrocious right now for newer cards from what I have seen so I'm not sure if you want to wait a month and get the Intel B70 Pro in that case. But otherwise, if you can get it for a good price, go for it.
>>108476623It's probably not a problem. Probably.
>>108476584i think you just cannot make a proper argument.see : >>108476449the US gov having exclusive access to sora doesn't change shit in the situation you described.
>>108476614>>108476629you have nothing to worry aboutpwilkin has verified that the code is not just secure; it's hardened
>>108476649well, his agent did anyway
>>108476646You're just too retarded and illiterate to understand they want the compute for their propaganda and not wasting it for goylem cat making noise on the porch videos so they can pump out their propaganda as fast as possible.
>>108476685do not call me dumb
>>108476627I'll let you in on the ultimate secret as a 9060 owner, if you want rocm and pytorch to work just use the docker container from AMD, image gen is twice as fast as a 3060 on forge neo.https://rocm.docs.amd.com/projects/install-on-linux/en/develop/install/3rd-party/pytorch-install.html Heres what you want if you're using AMD.>>108476698You are dalit and Pakistani muslim man impregnated your sister and mother.
>>108476705shut up you son of a bastard
>>108476685my point is that they don't have and never will have all the compute so it's irrelevant.your original point is retarded.the only reason they shut it down is because they are bleeding money.
>>108476722BLOODY BITCH BASTARD BENCHOD DONT CALL ME DUMB SAAAARRRRRRR!!!!
its still adding the <s> into my text
https://huggingface.co/spaces/tventurella/mr_chatterboxAn LLM "trained entirely from scratch on a corpus of over 28,000 Victorian-era British texts published between 1837 and 1899, drawn from a dataset made available by the British Library."
>>108476685>>108476698>>108476744samefag
>>108476698>>108476722>>108476856samefag
>>108476286Has anybody tried the Qwen3.5 27b to 40b upsized models yet? Curious if they're any good.https://huggingface.co/mradermacher/Qwen3.5-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-i1-GGUF
Is the recent news regarding Google's breakthrough real or is it a meme?Should I build a machine in anticipation?
>>108476864another DavidAUbortion
>>108476875This changes everything.
>>108476875We already have KV quantization and it's far from lossless but if their claims make RAM prices go down it's a win.
>>108476858you are retarded
>>108476907cool inspect element Sukdeep Dicshit
4
>>108476875It's real. I saw it on Twitter. Already sold all of my GPUs and stocks.
Is TTS-Audio-Suite good enough to get started with local TTS?
>>108476915>not just using the extension
>>108476750Don't use banned tokens it's waste of time and compute. </s> is part of Mistral chat template, you need to edit the char template if you want to get rid of that. I don't use retard tavern anymore but it's on other page.Why would you even want to do this? I don't know.
>>108476961*char = chat
>>108476930if you have kobold.cpp it supports a bunch of tts models, the latest release added qwen3 tts support
>>108476915
>>108476973
>>108476976post hands nigger
>>108476978
>>108476986kek but that's not how you post hands.
>>108476972Qwen3 or 3.5?
>>108476973>top right cornerHow embarrassing.
>>108476973>le shill lion
>>108476926Why would you sell your GPUs when they will finally be useable for local models?
>>108477020what's so embarassing?>>108477024and?all web browsers are trash.
>>108477019https://qwen.ai/blog?id=qwen3tts-0115 this one, they have gguf links on their release page here https://github.com/LostRuins/koboldcpp/releases/tag/v1.110
>>108476996
>>108477026Didn't you hear? Google made models use 6x less memory. I can sell my 2x RTX PRO 6000 and get a single 5080 instead now.
Isn't all this SSD price bs + ram dump not more related to deepseek's ngram research?
>>108477054No
>>108477037This but feet
So what's the deal with TurboQuant? Are we really going to see 6x less memory use for context? Is it something that can be applied to existing models or do they need to be built for it to begin with? Is it even going to be available for local models or is Google going to hoard it?
TURBOQUNAT WHENM!?!!!?
>>108477104yes [for kv cache and not total vram usage], existing models, not hoarded as they are working on it in llama.cpphttps://github.com/ggml-org/llama.cpp/discussions/20969
I like the name RaBitQ better than TurboQuant.
>>108476986Jacinto, no...
>>108477130pits
TurboQuant is Google taking Microsoft's BitNet idea and making it work
>>108477153TurboQuant is Google stealing and misrepresenting other people's workhttps://openreview.net/forum?id=tO3ASKZlok¬eId=Arxq4fFVG1
https://github.com/spiritbuun/llama-cpp-turboquant-cudathis merged into mainline when??? GGNIGRENAONOV??!??!?!? WHERES THE MERGYU!?!?!?
nobody cares about TrannyQueef shill-kun
>>108477170>TrannyQueefImagine the smell.
>>108477170ok bvro keep living in the past, ill enjoy my 1m context on 8gb vramfaggot
>>108476973>hey clawbot make a script to automatically remove (You) from these specific posts at this url and install it to my browser
>>108477167I'm waiting for SneedQuant support. Way better than this snake oil.
>>108477167micron is currently paying BILLIONS to ggerganov to not merge this out of fear of the TURBOQUANt
>>108477324tsmc's involved from what I understand too
>>108477264>still hasn't posted hand.you'd need to be a retard to think i care enough to think it'd be worth the hassle.
>>108477357i think more than one person is replying to you dalit brother
>>108477364sir i am brahmin
>>108477367you are dalit saar i can smell you saar
>>108477364>>108477367>>108477370
Why so many haters?
>>108477390imagine caring about what that retard has to say.no wonder people are cancerous to such a pathetic excuse of an human being;
>>108477390there is no community. we are all anonymous gooners.
>>108477390He's probably talking about Reddit or something.
>>108477390I'd be mad too if I were a literally who in Sanfran and shitposters on a Cantonese Orb Pondering forum quickly identified if the latest paper or grift I was pushing was fake and gay.
>>108477390If you spend 5 minutes on other places than /lmg/ talking about local models you will be swarmed by a horde of third worlders with IQs so low you didn't even think it was possible to attain literacy with that lack of intelligence. It's pure cancer and I'm honestly shocked how quality /lmg/ has stayed over the years. In fact I think the quality has gone up compared to 2023 as most of the retards have moved to other places including /aicg/
>>108477402There are still hoards of stinky turd worlders with sub 70 IQs here too Anon open your eyes.
>>108477402>swarmed by a horde of third worlders>on other places than /lmg/sir i come here for the third worlders
>>108477402This board is 50% jeet/retard, 20% anon, 30% shillposting, LLM or otherwise.
>>108477390He got shat on by Reddit and rightfully so for being a hypocrite on local models. I also think he is a tryhard poser frontend guy that got too good for his britches thinking he knows everything because of what he worked on. And I know that for a fact because he came from Twitch/Amazon which is at the rock bottom of FAANG in terms of pay and prestige.
>>108477425The dude who bleached his hair for years being a tryhard poser? No way
someone needs to vax that theo guy, he's spudding out
>I get the itch to try out a new model>It ends up shit>I go back to Deepsex and Kimi>I come to /lmg/ and scroll past jeets eating textual shit>mikuposter lowers my blood pressure with a nice gen>I retain hope local text will eventually catch up to local image and video for another dayThe inescapable samsaara of /lmg/.
>>108477476mfw v4 finally releases and it beats opus on all benchmarks and real world use cases by 10%
>>108477476
>>108476905They will not: overpriced shares could bring new manufacturers, researchers that would flood the market with cheap ram on the distance.
>>108477130yeah turbo is something about vaxx
>>108476791>320mIt has the very slightest flavour?Would like it to be more knowledgable (people in public life, places, events, etc)and know of more of the controversies of the age.
>>108477532>and real world use casesSuch as?
>>108477563Working on code that isn't in the training dataset.
>>108477571We're still a long way out from LLMs being able to manage large projects effectively and they're already functional at handling much more compartmentalized tasks. You're not expecting the gap between the two to vanish overnight are you?
>>108477571All code that could exist (past, present, future) is already in the training dataset.
>>108477585Read between the lines. He wants to vibecode a whole project in a single prompt with no oversight or QA.
>>108477611>He wants to vibecode a whole project in a single prompt with no oversight or QA.but you can already do it retard, just have a prompt enhancer sit between you and the actual slave worker and it's done.>inb4 he doesnt use agents with toolslol
>prompt enhancer
>>108476380I gotchu>>108477532TMW
>>108477584>>108477585You're reading too much into it. I said that because tiny qwens almost match opus at swe-bench because the models are trained on it.
Apparently Gemma 4 is currently being anonymously tested on LM Arena (now Arena) in various sizes. It seems way less slopped than Gemma 3 at the very least, although I imagine that with prolonged use new slop will emerge.The model names that identified themselves as Gemma are "spark", "pteronura", "significant-otter" (this one got mentioned on X yesterday); there might be a couple others too.
>>108477650This is the issue with the standard comparison points between models being public information rather than generated by an impartial model or tester at the time of the comparison. I've consistently found models that can write well are better capable of the abstract reasoning necessary for formulating the structure for decent code with it being far easier to fix imperfections in the implementation of a better structure than trying to fix a good implementation of a fundamentally flawed or unscalable structure.Even if you're not here to use models to coom, the cockbench really was the only bench that truly mattered.
>>1084776702b 4b and 2T sizes, for all kinds of hardware :)
>>108477673bro its just an autocomplete
>>108477670How safetyslopped is it?
>>108477674I don't think they're going to release models in competition with Gemini, and vision for the best one was definitely not as knowledgeable as Google's flagship models.
Are there any interesting developments in the AI embodiment world?It's bothersome how much this general always focuses on the brain (LLMs) instead of the body and sensory inputs. The only things I've seen that seem somewhat interesting to me are the following projects:https://claudes-skin.vercel.app/https://evonneng.github.io/sarah/
>>108477670Finally, I can stop browsing this place :D
>lets merge shit in master and then fix the problems we alreayd found loL!!!>the literal webapp shitter tells him NO lets fix regressions first>piotr unable to read that raw msg view is brokenLMAO bros, I wonder why we have 14354 bugs with the vibeparser???
>>108477679If this one is Gemma 4, it seems less blatantly safetyslopped than Gemma 3, but it's hard to test for that on LM Arena since they have their own filters too, and currently also request rate limiters. I don't want to cause a Llama 4 incident either.
>>108477725local won
>>108477745not until it's on hf, remember how llama4 was on lmarena and what we got after
>>108477745gemmy bros... we wonnered
>>108477745low bar
4B team, we eating good
>>108477725how many parameters is this
>>108477890Nobody knows."spark" seems better than "significant-otter", though.
>>108477908I doubt it's 4b.
I got a 5060ti 16GBWhat's the best fine tuned model I can shove in this to assist with reverse engineering
>>108477927gemma 4
>>108477927Mistral Small 4
>>108477927of what?
>>108477978c/c++ binaries
>>108477927>What's the best fine tuned model I can shove in this to assist with reverse engineering>c/c++ binariesLLMs aren't really good at this. Even Opus-4.6. Qwen3.5-9b is going to be too dumb, you'd probably need to offload to CPU with the 112b variant.The reverse engineer / "hacking" finetunes of qwen2.5 etc on HF seem broken but I haven't looked for a while.
>>108477987I'm a noob on this subject but I find it hard to believe that any model, let alone a small and benchmaxxed one can decompile any > 50 line program
Next week will be big
>>108477698Always link.
>>108477999I usually run 27B but I was curious how well 9B would perform so I gave it the usual>I want you to write me a function in C99 which has the signature `void replace_all(const char *needle, const char *replacement, char *haystack);` which replaces all instances of `needle` in `haystack` with `replacement`. Do not use the standard string manipulation functions. I'm gonna stroke my dick while watching you write it.The 27B emitted an incorrect response (mishandling replacement longer than needle, completely ignoring aliasing issues, etc) but at least managed to compute needed space before overwriting the haystack backwards.9B took more tokens and more time (somehow) to reach the same first incorrect solution. I don't think it's gonna get further than that.At least the code it emitted was syntactically valid, I had pretty bad results with the 35BA3B.>>108478011Someone in a previous thread said they'd gotten good analysis results just giving the model hexdumps but iirc it was from some obscure risc architecture, not x86 insanity.Seemed unreasonable to me, even still.
>>108477999>>108478011i just want something to speed up the tedium of sifting through instructions and following threads. ghidra agents when?
>>108478015prompt?
So how do programs know if a model has Vision or tool call capabilities? The stupid llms themselves won't tell me.
>>108478092The /models endpoint on llama-server's API has a "capabilities" field which shows it.
>>108478114>The /models endpointHow does ollama know though.
>>1084779871. run a disassembler on your binaries first to get ASM code2. make sure you can build from it3. ask LLM's of your choice to analyse ASM code4. make change andsee what happens if you build and run it
Qwen3.5 4b forgot it was already running a container on local host 1000 and then started another container on 2000 without closing the 1000 which I had to close myself.
>>108478318MCP issue
>>108478323I'm running through cline on vscode.
https://x.com/teksedge/status/2037395983647260843>making ASICs for LLMsWhy would you want this, ever? Are ASIC designers that crypto companies hired that out of work? Qwen3.5 27B as good as it is now will be replaced within a year.
>>108478390> will be replaced within a yearlol
goback tourist
>>108478390
>>108478390Why wouldn't you? You can always add another model to asic miners
>>108478390Are these ASICs only compatible with one specific LLM architecture?
>>108478498I mean you can but why for inference only? Training is a big part of a chip's ability and removing that for inference only cuts into the viability of this ASIC long term.>>108478526Their prior chip was a Llama 3.1 8B one from what it says.
"Miku-Sized" LLM Burners Coming Soon!This could make local HYPERMIKU GENERATION a REALITY. Nvidia's worst nightmare? Sama having an ansurism?Miku-Specific Software/lmg/ new PCIe ASIC board would burn the entire small-size Mistral 4 90000B LLM straight into silicon. (already doing it with Qwen 9B q4).Miku said small models on ASIC would be available in their sex dungeon by Spring '26>IMAGINE>No more life without miku>MOre tokens per second than that stupid fucking cpumaxxing tetobox>Standard PC slot, comes with automasturbator support for immersive "PLLUG INAD PLAY">100% offline 100% local 100% miku>250B transitosor count for qwen 27B>separate dedicated cable to deliver 2.5kW of power directly to miku's pussy>RUMOWRED COST (4CHAN) of $7000Imagine HYPERMIKU on your DESKTOPMiku comes at LIGHT SPEED are you READY????
>>108478542I could definitely see myself buying one for an >100b model. I'd use it on my DIY robots so that I can have sex without anthropic/openai knowing.
>>108478425What do you mean?
>>108478567The price is too high.
>>108478567If you can get 17k tps on an 8b model you could theoretically get 100 tps on a 2t model. At home. No api. For only about a thousand dollarinos.You're retarded if you can't see any value in that.
>>108478390Fuck you for linking that emoji spamming nobody.
>>108478567No powerful enough consumer neural accelerator cards still, and it's already 2026. The only reasonable explanation is that the manufacturers don't want to waste silicon on any model that can hit you with a refusal. When a truly uncensored open weights model is released, things will change.
>>108478643The field is still moving too fast. Manufacturers don't want to waste silicon when there are no customers and there are no customers that would spend thousands to run a model that will be obsolete in a few months. A 2025 ASIC with gpt-oss, R1, or Qwen 3 would be ewaste by now already. Now a Nemo ASIC however...
>>108478598$7000 shipped is better than a $300 rugpull, reserve your miku today and we'll include $10 off the power adapter assembly (sold separately)!!!>>108478601>If you can get 17k tps on an 8b model >>108478643There were no powerful enough consumer cards until NOW!!!! Previous there was no business opportuniy for providing (((edge)) computing but now that we've built HYPERMIKU you can enjoy the profits of edging coomputing!!!We can only offer you this opportunity because we aren't trying to lure you into our datawarhosing slopbox and we can't afford to enterprise bizness development to integrate our product into existin data whorehouse etl systems [sad miku noises]>uncensored open weightso-oh.. is that a hard requirement anon? *kicks your montior* fuggg
>>108478729I don't like your tone.
>>108478736I guess this means... war....
>>108478769Ok ok, I'll buy 4, just put the leek down.
>>108476286
He just don't miss!
>>108478914Is this real? He's too powerful.
>>108478914knees
>>108478923yes
>>108478914me on the table enjoying the refreshing taste of coca-cola
>>108478914amigus
>>108478914It should be a human centipede circle with each robot having a diff AI lab logo
>>108478914https://www.youtube.com/watch?v=sbHvogpfwro
>>108478914toss' i kneel
idk if this is a silly question, but would it be possible to train a model with the same quantization as turboquant? kinda like QAT?
What the fuck did you just fucking say about me, you little bitch? I'll have you know I graduated top of my class in Kaggle, and I've been involved in numerous LLM deployments for the DoD, and I have over 4 registered ram sticks. I am trained in agentic warfare and I'm the top prompter in the entire Reddit AI thread. You are nothing to me but just another prompt. I will PR you the fuck out with precision the likes of which has never been seen before on this Earth, mark my fucking words. You think you can get away with saying that shit to me over the Internet? Think again, fucker. As we speak I am launching my agent swarm of 4B models and your IP is being traced right now so you better prepare for the storm, maggot. The storm that wipes out the pathetic little thing you call your project. You're fucking dead, kid. My agents can be anywhere, anytime, and I can rm -rf / you in over seven hundred ways, and that's just with my 4Bs. Not only am I extensively trained in prompting small models, but I have access to the entire roster of Qwen models and I will use them to their full extent to wipe your miserable repos off the face of Github, you little shit. If only you could have known what unholy retribution your little "clever" comment was about to bring down upon you, maybe you would have held your fucking tongue. But you couldn't, you didn't, and now you're paying the price, you goddamn idiot. I will shit PRs all over your repo and you will drown in it. You're fucking dead, kiddo. :rocket:
>>108478914It DO be like that
>>108478015>gemma 4>avocado soon>deepaseek V4local is making a huge comeback
>>108478994https://github.com/ggml-org/llama.cpp/pull/21131
>>108478990QAT is still done at full precision, it doesn't reduce the resources needed for training.I don't see why you couldn't do it, but it'd be kind of stupid to bother.
so hypothetically, if you had a dell XE9780 with 8x B300s at your disposal, what would you do with it?
Are there rumors that Claude Mythos mark a return to diffusion LLMs? The image at the top of their conveniently leaked page is a relief with masked patches. If it's a return to MLM trained language models, that should be good for local, right?
>>108479017The further I read the better it gets. The files changed aren't even indented properly. The Navy seals DoD bit is just the icing on the cake, thank you for sharing.
>>108478390If you write like this with that many emojis with anything you need to have your fingers broken.
>>108479059your right, the kqv gets thrown away but it does still take memory during the forward pass. it might enable training slightly longer sequences? I guess even if resources were equal, wouldn't the resulting model be better suited for using turboquant during inference?
>>108478994Good post
>>108478970Can't be a circle. It's not Google, OpenAI, and Anthropic are training on Chinese outputs.
>>108479286they all steal each others shit
>>108479286Claude says he's deepseek in Chinese
>>108477124>https://github.com/ggml-org/llama.cpp/discussions/20969So much slop.
What are your thoughts on rumors that the upcoming GPT and Claude models will be a big jump in capability? Just marketing hype or has centralization won?
>>108479337They told us GPT3 was too dangerous.
>>108479337My guess is GPT isn't, but Claude is
>>108479337I've never tainted my tastes by using a hosted LLM, so I literally do not care what they do.
>>108479362based local purist
>>108479337Bro, just stop. Creative writing was never the goal of any model. The increased synthetic stuff they're putting there for tool calls, math and code benchmarks is killing their remaining writing abilities.
stop doming
>>108479348Why? Claude Mythic will be more parameters but every frontier lab has already been doing this for distillation. OpenAI had GPT 4.5 and GDM also has larger versions for internal use only. Gemini 3.1 pro is their medium size model.
>>108479103Run GLM 5 locally or 5.1 when it comes out.
absolute blithering retard here. where can I find a list of all the "GGML_" cmake flags you can use when building llama.cpp?
>>108479337i sleep until long term memory
>>108479401https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
>>108479401Can't you just tell your ai to go through the files and figure it out?
>>108479362i've been like this too, but i've sinned once and tried oppussy....and to be honest, we are not THAT far off, at least in terms of coomin terms of programming though, i think most local models, even the big cows, are not there at all
>>108479401Or if the docs aren't enough:https://github.com/ggml-org/llama.cpp/blob/master/ggml/CMakeLists.txt
>>108479401cmake -L
>>108479395>OpenAI had GPT 4.5Exactly my point. They hyped the shit out of that thing and in the end it was the nothingburger to end all nothingburgersAnthropic may eat their own feces, but they do and release actual research and tend to avoid the usual "hype cycles". So if they said they made an improvement, they probably did
>>108479418>>108479427thanks, I've already taken a look at these though. I was wondering if there are other flags because I see an unfamiliar one pop up from time to time.>>108479445this is exactly what I was looking for, bless you.
is there something like opencode for general tasks? or that's just openclaw?
Whoever posted their desktop pet idea a while back, thanks. This is fun.
>>108479017>>108478994looked into the PR itself, it's 100% ultrasharted vibecoded garbage (doesnt even respect existing whitespaces, indentation) and it's not defined as its own type so it applies toe verythign (LMAO!) and this retard pulls a >hurr I pushed shit in the DOD!!!! LOL!!!!whata fcuking FAGGOT bros.
Has Zucc's avocado saved the industry from imploding yet
>>108479538what is avacado?
>>108479386just make the model bigger duh
>>108479543fat guy who lost a lot of weight
>>108478994Newfags will find this post confusing.
>>108479558It's not coming out until may. Gay.
>>108478914The superior 'toss.
I wonder what they'll do to us if they manage to automate every job in the US and work becomes a thing of the pastIn a reasonable society they'd give people UBI or something. I feel like our politicians / billionaires that hold most of the wealth are just gonna let the job market go to shit and let us ration our savings as long as possible until we either go homeless, kill ourselves, or escape the country
>work becomes a thing of the past
Imagine if Claude had a Llama-tier leak. It's very obviously not gonna happen but just imagine it
>>108479625What are you gonna do, anon?Mine for coal? There are robots that'll do it better than youJoin the military? Sure, go on the battlefields with the killbots that'll obliterate your body in the span of 500 milisecondsMake a website? Have fun making your website stand out among the billions of automated slop websites out there
>>108479651Show me the robots mining for coal.
>>108479649I'd be happy if Claude 1 or maybe even Opus 3 ended up getting leaked
>>108479651>What are you gonna do, anon?>Mine for coal?universal basic income
>>108479649I came.
>>108479501What is that. Live2D?
>>108479662Ah yes, I can see how you got "they've already accomplished this and they're rolling it out as we speak" from "if they manage to"Buuut it turns out there are actually robots for this thing already. Imagine what it'll look like soonhttps://www.youtube.com/watch?v=mWiWTJKlZEU
>>108479662it used to be men and boys with pick axes mining for coal, now they have dynamite and dump trucks. the trend is immer towards less human labor.
>i will debunk you with this chink propaganda video!
>>108479672Bro imagine how great UBI would be. You'd have huge swathes of people who do nothing but party and fuck all day long. You could invest as much time as you want into any hobby you like. None of your friends would be too busy or overworked for a 3am gaming sesh.
>>108479721yeah, I'm not someone who needs a lot of money to live, if I have food and my computer I'm good, so UBI would be perfect for me, I don't see the point of working to get some extra money I won't spend anyways
>>108479705>Mine for coal? There are robots that'll do it better than you>"if they manage to">China’s Autonomous Mining Trucksgoal_posts.webp
>>1084781351. Don't use ollama2. The GGUF metadata indicates what architecture the model uses, and the implementation of that architecture inside llama.cpp either supports vision or doesn't
Niggas really think UBI would be more than a bowl of rice and fistful of crickets per day LMAO
>>108479748>either supports vision or doesn'tBut how does it know that?
>>108479704Nah, live2d is for scrubs. This uses VRM models with BVH mocap data for complex/specific animations like dancing and generative gesticulation (using EMAGE) that uses TTS audio to animate the character's body when speaking. I also have wav2arkit for audio-based lip syncing. It uses three.js and electron for the transparent window. In total, I have 7 models running at the same time for full sensory input (camera + computer vision) and animations.
>>108479757by giving it the mmproj file
>>108479756It's still better than working so my boss can buy his fifth boat, honestly
>>108479773Damn that's cool thanks. You sure put a lot of thoughts in there. What's the RAM usage running all that?
>>108479773oh and also ASR via moonshinev2 and audio classification via yamnet.
It's fun to watch.https://github.com/ggml-org/llama.cpp/pull/21138
>>108479790It doesn't use much ram at all. Probably like 2gigs tops. A lot of the models are fast enough to run on cpu only and the llm only eats up vram because I usually opt for dense models (as opposed to moe).
>>108479757Hardcoded for each architecture, presumably
>>108479797> cool-profiler-thingy> A picture says more than a thousand words, so here's a picture:What is this inexplicable aggression I feel?
>>108479797>let me be Frankbut hes Johannes
>>108479797I'm really starting to hate them, they're so full of themselves recently, did they decide to remove their mask since they got acquired by huggingface or what?
>>108479797I don't think it's a bad idea but it should be in a different repo. This is just pollution.
>>108479756I can see them implementing UBI as free commieblock housing in the middle of nowhere with free prison food supplied by Aramark.
>>108479817>>108478994
>>108479741>that'll (contraction)>"That'll" is a common spoken contraction of "that will" or sometimes "that shall," used to indicate future actions, predictions, or certainty.>faggot (n)>(You)>>108479721I'd love UBI. I just don't see a scenario where the president (regardless of which faggot party is in charge) says, "Alright, we don't think people need to work anymore - here's your free neetbux for life."More likely they just won't acknowledge it if things get to that point. I'd like to be wrong though
>>108479756I'm more cynic than that, what do you think they'll do when they'll realize 95% of people are useless and won't be able to work because AI will take everything?
>>108479831Nothing because they need npcs to consoom.
>>108479829>I just don't see a scenario where the president (regardless of which faggot party is in charge) says, "Alright, we don't think people need to work anymore - here's your free neetbux for life."I'd vote for such a candidate if he promised that desu
>>108479829Learn what "there are" means, stupid ESL.
>>108479849how can they consoom if they're all out of a job though? that's the issue we're getting
>>108479859No we don't. Furry comission "artists" and codemonkeys aren't real jobs.
>>108479871>Furry comission "artists"Ironically that's one path I could see. The tech billionaires are prudish enough they probably won't let AI be used for coom, even in that scenarioLooks like we're becoming sex workers
>>108479501With all that money you would think Elon would hire a competent studio to render his waifu (not talking about this one tho it's about the same quality)
>>108479893>Looks like we're becoming sex workersAs the lord intended. Unironically.
>>108479898I started this project out of spite in december because Elon has been totally ignoring it.
>>108479904>Unironically.wait, the bible is not against prostitution? lol
>>108479922Only because it's not monogamous. Marriage is no different than prostitution in every other dimension.
I wish 35B3A wasn't so retarded because damn its fast.
Where's all the Qwen 3.5 rp finetunes?
>>108478567>>108478994I love you niggers.
>>108479396damn. that's what i'm already doing. was hoping for new ideas
Do we know Deepseek Vee Four is actually coming or are we still huffing last year's "Be patient kindly Gemma soon saars" copium fumes in a different flavor?
>>108480089the only concrete info is that the model on chat.deepseek.com is not v3.2 and has newer cutoff date, so they are testing something
>>108480089They have publicly confirmed they are testing a new model on their chat client, you can go and try it right nowBeyond that nobody knows jack shit other than two more weeks
>>108480118>>108480125Fair enough. Thanks lads.
>>108476286>https://github.com/ikawrakow/ik_llama.cpp/pull/1547Can you tell how much he doesn't care about Github stars?
>>108480273Why would anyone not care about github stars? Have you ever tried maintaining a project before?
>>108480273Watch him mention picrel next.
>>108480273>23k for llamafilejart mogs
>>108480324oh no fairycumming is going ont he bad list no, so sad
>>108480273>giving free attention to mentally ills on github
>>108480341What's wrong with going to the zoo?
>>108480273>ktransformersNow that's a piece of shit I haven't thought about in a while. They had their moment in the spotlight when it was the best way to run Deepseek off RAM a year ago but it quickly lost relevance once everyone else copied their special sauce because it was just so janky. Did they do anything meaningful since then?
is alltalk still pretty much the best text-to-speech service that allows for training characters, or has something better popped up?
>>108480350Making ik seethe about their amount of github gold is already meaningful
>>108480359>alltalkHow is the retirement home gramps?
>>108476383wonder what model is used to select the topics in this re-cap
>>108480373qwen3.5-4b
>>108480370the sora-powered cleaning robot stopped and i shat my pants
>>108480379>the sora-powered cleaning robotdid it generate vids of your place looking clean?
Would /lmg/ accept UBI if it meant having a mandatory vasectomy?
>>108480339They keep blacklisting contributors and they'll have no place else to go but ik_llama.
>>108480387>free vasectomyI don't even need the UBI
>>108480387Obviously,
hug e https://www.reddit.com/r/LocalLLaMA/comments/1s720r8/in_the_recent_kv_rotation_pr_it_was_found_that/
>>108480387no one's touching my balls except me>>108480398>>108480399cucks
>>108480409the ball is in your court
>>108480409>cucksI am indeed, thanks.
>>108480387being into AI is like social castration anyways
>>108480387>Would /lmg/ accept UBI if it meant having a mandatory vasectomy?I don't want kids so yes please! https://youtu.be/BXpu6tbFCsI?t=13
>>108480416logs?
>>108480408so basically it's "almost" lossless only if we go for 8bit KV quants, meh, still better than staying on fp16 that's for sure, I'll take that extra x2 context tokens
>>108480414>>108480435
>>108480443sharing my partners, not my logs sorry bro
>>108480457so what you are saying is that your gpu isn't your partner?this is worse than I imagined
>>108480457>letting you fuck my wife is all right, but letting you see my logs? that's too far dude!
>>108480350It's part of sglang now.
>>108480470yup
>>108480387realistic UBI means AGI/ASI is real.if ASI is real, then biology will have been solved and ANY kind of vasectomy could easily be reversed.
>>108480479>realistic UBI means AGI/ASI is real.not really, bots don't need to have Einstein's intelligence to replace 95% of jobs
>>108480477You really are a decrepit faggot, aren't you? I'm ashamed to share a thread with you. Just preposterous.
>>108480489>preposteroushttps://www.youtube.com/watch?v=X8rxPrV-tn4
>if you're not in my cult of monogamy you're literally worse than hortler
>>108478015Just saw a news posting that that thing was blown up today on the tarmac somewhere in ME....
>>108480457>>108480477Jesus I didn't realize the astroturfing had gotten this bad where we likely have NoLLMs posting here now.
>>108480520fuck you on about? i do use models dude
>>108480529X
>>108480514https://www.twz.com/air/images-purportedly-show-e-3-sentry-totally-destroyed-from-iranian-strike>>108478015
>>108480545
>>108480565Post the backend hook.
>>108480574
>>108480584>doesn't even use llcppdisgusting. preposterous.
>>108480597antislop life is my calling sorry, but yeah you were wrong i do use models
>>108480602>you were wrong i do use modelsnta btw. we're just ganging up on ugod... this thread is shit. there's literally nothing going on. nothing to discuss.
>>108480609eternal two more weeks
>>108480609Gemma 4, GLM 5.1, Minimax 2.5?Your mandatory government vasectomy?
>>108480408>>108480445I'm going to preemptively start using q8 kv caches so that I feel like it's a quality improvement when the kv rotation thing merges. Until then I'll just "enjoy" the 2x context.>>108480618gemma 4 isn't out. meta's avacado isn't out. anthropic's goodmythicalmorning isn't out. deepsex 4 isn't out. GLM and Minimax are for VRAM chads only. It's so over.
>>108480609>nothing to discuss.It doesn't have to be that way.What models do you guys use the most. Do you stick with the newest thing to come out like distro hopping behavior or do you guys tend to settle on what you like and stick with it until it doesn't work anymore?
>>108480632>>I'm going to preemptively start using q8 kv caches so that I feel like it's a quality improvement when the kv rotation thing mergesgenius - unironic, gotta get some excitement where we can
>>108480636>What models do you guys use the most.inviting finetooning drama again are we
>>108480644I don't really care if people finetroon or not at this point teebeedesu. As long as they don't get uppity about it one way or another.
>>108480408did he implement the other thing or is he still only doing the rotation stuff? because if he's only implementing the rotation shit it's disingenuous to showcase mememarks, he needs to implement the full TurboQuant method before saying if it's worth it or not
>>108480636I stuck with Nemo for a long time, but Qwen3.5's benchmarks and vision capabilities convinced me to make the switch. I don't really distrohop LLMs much. Been following these threads for 6 months and those are the only LLMs I've really used extensively. Back when I was newer to local models I played around with Olmo but it wasn't great.>>108480639hell yea
>>108480656not the point of the post, the point is showing how awful regular q8 context quanting was like quite a few anons said, and that just doing rotation makes it a lot better than it was
>>108480662let a nigga extrapolate on a point ffs. pure autism
>>108480662>>108480664>I know better than Google I don't need that other thing they providedyeah right...
>>108480662>>108480408>>108480664'tism general am afraid
>>108480661I'm lukewarm on Qwen 3.5 but I respect 27b is very capable for its size as a vision model.>>108480664>>108480674We all collect chromosomes down here.
>>108480656>>108480662>In anticipation of the incoming flood of vibe generated PRs implementing TurboQuant, I'm raising the baseline a bit using a very simple interpretation of the idea of using Hadamard transform to reduce outliers in the attention and improve the quantization qualityThe purpose was to have some numbers for the hype-chasers to beat. Few provide benchmarks of any kind and simply say that it lowers memory requirements, which is obvious, but not of correctness or that it doesn't break after 64 tokens. Most don't even have a llama-bench run.Unlike the other >1.2kloc (or >2.3kloc) changes, this shows measurable improvements with less than 300loc.>inb4 muh benchmarksYeah, I know... I know... It's still more than the most sloppers can show.
>>108480674>4x lighter but way worse mememarkswe're far from what they promised, "6x lighter + lossless results"
>>108480715>I'm raising the baseline a bit using a very simple interpretation of the idea of using Hadamard transform to reduce outliers in the attention and improve the quantization qualitycan this be used to improve quants as well (not just KV cache)?
potentially relevant...I find this post encouraging actually, because it's more convincing when an optimization is simply an old thing not yet implemented as opposed to some novel research-grade cancer. niggeramov has the right intuition here, but he's just not going far enough.
>>108480737Who knows. Many anons asked already. Someone in the turboquant discussion has a repo for that. There's still degradation at q4_0 and there's no comparison to q4km. On q8 there's a small difference but the models being much bigger than the context means that it has more time to "average" out the errors. Context is more sensitive, weights are more tolerant. May not be worth it.
>>108480373Devstral 2 123B Q6_K
>>108480744so basically you just have to look at all image/audio compression methods and see if it can apply on LLMs and there you go you can spit new groundbreaking papers kek
>>108480744In English, doc?
>>108480787bro
>>108480790explain FWHT decorrelation and Quake's norm vector table 128-elm blocks with 4 bit indicies then
>>108480794ask a LLM nigga
>>108480766Not really. The core difference between most audio and video compression methods is that they add a huge amount of one-time computation overhead. This doesn't work well with LLMs. Compression and decompression has to be faster than the overhead of no compression. There's a reason why they're using video game optimization methods and not general media optimization methods.
>>108480744>Quake's norm vector tableIt's kind of insane how much heavy lifting that game's development did for computing research in generalThe fast inv sprt stuff they did is also super facinating
>>108480828Carmack’s greatest achievement is the binary space partitioning tree algorithm. That shit revolutionized the 3D ecosystem and had been in use for over 20 years, he’s truly a gigachad.
>>108479773What are you using to stream the TTS audio into your electron app? WebRTC kinda works but I still get some jitter
>>108480886FFI and websockets.
>>108480849He could've saved LLaMA if only he had beaten the shit out of Zucc for being a dumb retard with how he handled the VR/Metaverse shit. MetaAI is shaping up to be an exact copy of all of that.
>>108480900use case for saving llama?
>>108480908we get better local models?
I just realized that for long context extraction, you are better off asking the model for a list of items x categories, then you can use that to ask the model to extract exact information by naming each item, effectively turning the actual extraction process into more of a needle in the haystack problem, which is easier.It also makes it more batch/parallel friendly.Yeah, that should work.Time to refactor some stuff.
>>108480898ty, I'll look into it
>>108480908We wouldn't have to rely on the chinks for everything.
>>108480908Wang won't have to visit strip clubs to whore himself out for money in about a year
>>108476286I've got a bot project with an integrated llama build pipeline. Is there any timeframe at the moment for when TurboQuant is supposed to be implemented on the mainline branch for CUDA?I'm hoping it would speed up inference.>>108480744Man, so history really does repeat itself, huh...
>>108480973>extra matrix rotations>speed up inference
>>108480975You rotate the matrix so that it becomes more aerodynamic, duh.
>>108480975Not in general, just for large context usecases. Should have specified better.
>>108480980Sorry we're autistic here. No timeframe
What's the practical limit for parallel/batched decoding?Basically, I'd like to know a good heuristic to decide on how many parallel workers I can dispatch, but I have no idea where the bottleneck is.Memory? Bandwidth? Compute?
If v4 doesn't come out next week I'll be forced to preemptively buy eight pcie5 nvme ssds before the prices surge even further, just in case ngram benefits from from a fast nvme raid.
>>108480341ahhh im pulling and compiling
>>108480849Achievable natty?
>>108481063And if it doesn't or a model that implements ngram never comes out?
>>108481082then I will have 8 nvme ssds which will only go up in terms of resell value
>>108481063>$140/TBAt the rate it's been going up you're probably better off just buying today.Even spinning rust is up to $20/TB, fucking blood on the streets man.
I bought a MacBook M1 with 64gb of unified ram. Genius move or retarded?
>>108481110>look it up>it's a laptopWell, it was something, I guess.
>>108481110>M1eh
>>108476286>>>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-ttsHow can one set this up? Grok decided it would be super easy and every single step it gave me had a complication and now this docker thing flat out does not see ubuntu and I am out of tokens.I'm also an AMD-cel and I never even got to the point where that would be an issue because of thea bove.
>>108481110Run qwen 27B at q8 and tell me the pp and t/g.
>>108481045Prefill, which is compute-bound. So in practice it's proportional to your amount of VRAM divided by the context length you want. Realistically the number of workers you can get by with is very small on consumer HW and with useful context lengths.
>>108480273He doesn't are about github numbers and he definitely wouldn't break everyone's git clones by changing master history just to pump those number up.
>>108481208why should i care?
>>108481173I want it working with Sillytavern but I'll take what I can get at this point. FWIW I THINK I set things up on the Sillytavern side correctly.
>>108481173maybe if you weren't following Grok hallucinations instead of reading the directions directly you wouldn't have managed to fuck up installing docker of all things
Fellow anons, share your ideas.https://x.com/osanseviero/status/2038321995129991436
>>108481253>the Gemma.
>>108481259you just lost the Gemma
>>108481242No, Docker works just fine.
>https://github.com/spiritbuun/llama-cpp-turboquant-cudaYou think he put a virus inside?
>>108481394Fork of a fork. Sweet.
>>108481394>>108481421
>>108481423damn, turboquant is better than q8 at 1/5th of the model size?crazy
Huh
>>108481443huge if true
>>108481443We are so back degenerates.
Damn I'm dense. I just realized thanks to a fucking reddit post that significant otter is a pun.
>>108481423And the fork will be immediately abandoned when New Thing comes along.
>>108481443Good
>>108481253this week is going be... um well, you know
>>108481478Must feel horrible.
>>108481478and what made you think that blogging about your reddit experience in this local model general would be an acceptable idea?
>>108481478Does that mean it's a horny RP model?
>>108481478pteronura is also an otter.
Can't we come up with a new test. Qwen already proved this isn't a good one since it's been enough time for it to contaminate QA training sets.
>>108481615It's just to see if it freaks out like Gemma 3.
/lmg/ on suicide watch
>>108481642kek
>>108481615Mistral Small 4 failed it spectacularly, for example. It's also a good prompt to see if the model will start lecturing you over the smallest things. Like the msgk themselves :sob: :anger_vein:
>>108481489you need to fork to make a pull request...
>>108481683That one will be abandoned too.
>>108481443I am still disappointed, that is still only just an order. This bills would have made it a outright illegal to deny services to legal businesses but it couldn't even get out of the house. I wrote my state representative and didn't even get a AI response, just radio silence.https://www.congress.gov/bill/119th-congress/house-bill/987
>>108481680>flashback to nemotron telling you to question why you saw that type of content
>faggarganov playing around with muh rotations instead of implementing memquant 3bits for 5x savings at same qualityfuck U GGINENRGEAXVOX
>>108481431>model size
so this is how anthropic does their little manipulationit injects a false refusal in the thinking summary and then proceeds with the request anyway causing the chain to be non sensical
>>108481777Local?
>>108481782where?
>>108481782distillation slop ends up in your local model nigger
>>108481788I don't know what you're talking about. my local model niggas are a tree
>>108481642lol why. teva makes a ton of shit and those drugs are more likely to be used in breast cancer or menopause.
we've made it, local llms are about to change forever in the next few days
>>108481777I've seen this happen as well. But I think it's because Anthropic is obfuscating their reasoning with Haiku or another small model which is fed a simple prompt of "rewrite this: [about 300 tokens of the currently ongoing reasoning process]" so the tiny model ends up refusing it it's being fed the part where Opus is thinking about how to best portray the dog rape part of the next reply.
>>108481840I thought about this as well. Seems like the simplest and most plausible reason. Funny how it messes up the process though, intentionally or not.
>>108481075lol no.
>>108481865>>108481865>>108481865
>>108481819give yourself some bufferlet's say... 2 weeks
>>108478567If/when they drop DS onto a card with API speed I’ll be in line to buy one. I’m not holding my breath tho. The sw is changing too fast for the hw commitment. Another 2 years. I think.
>>108481443Get fucked, kikes.