/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108602881 & >>108599532►News>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108602881--Discussing ways to disable reasoning tokens via llama.cpp API:>108603929 >108603976 >108604011 >108604043 >108604065 >108604262 >108604284 >108604295 >108604363 >108605355 >108604137 >108604947 >108605018 >108605030 >108605046 >108605068 >108605084 >108605116 >108605297 >108604024 >108604029--Reducing model sycophancy through prompting and technical modifications:>108602961 >108602997 >108603002 >108603009 >108603028 >108603084 >108603011 >108603034 >108603069 >108603162 >108603213 >108603098--Token compression techniques and RoPE for Gemma's context limits:>108603781 >108603799 >108603831 >108603854--Testing Gemma-4's reasoning on thread analysis and discussing control-vectors:>108603400 >108603703 >108603723 >108603785 >108603892 >108604323 >108604005 >108604019 >108604057 >108604070 >108604096 >108604080 >108604327 >108604336 >108604090--I-DLM lossless conversion claims and speed benchmarks for Gemma 4:>108603796 >108603823 >108603841 >108603862 >108603882 >108603900 >108604338--Applying decensoring techniques to remove repetitive model patterns:>108604440 >108604490 >108604509 >108604567 >108604583 >108604594 >108604633 >108604688--Discussion of llama.cpp PR regarding Gemma 4 parsing edge cases:>108605331 >108605344--llama.cpp Vulkan builds now require spirv-headers installation:>108605607--Logs:>108603534 >108603672 >108603703 >108603723 >108603785 >108603790 >108603906 >108603912 >108603926 >108603929 >108603940 >108604011 >108604142 >108604374 >108604501 >108604541 >108604639 >108604857 >108604890 >108604944 >108604995 >108605211 >108605590 >108605603--Gemma:>108603584 >108603900 >108604627 >108604696 >108604730 >108605597 >108605648--Miku, Teto (free space):>108603296 >108603360 >108603457 >108603480 >108604418 >108604430 >108604457 >108604626►Recent Highlight Posts from the Previous Thread: >>108602885Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
me when i run a 8b model on my t480 so it can generate 5 words a second
is the honeymoon over?
>>108605942Yeah, sadly. It seems trans normalization will just never work out.
>>108605921not into erp or gooning really but tested a heretic model of gemma q4km as a benchmark and it started talking about smell of ozone?
>>108605942no she is agi + saved local
gemmaballz
reminder that if you can't run the 31b your opinion on gemma is invalid
>>108605942It's just that it takes fucking 2min to get a captcha today.Gemma is still the queen of local. no reason to run any other model unless you can run DS or kimi
>>108605957what do you use for long scrolling image capture like that?
>>108604090can you share the dataset?
>>108605957how do you use the internet with gemma, I have no idea how to use those tools things, seems useful
>>108605981NTA, Firefox built-in screenshot tool lets you do that.
gemma>>108605966wish i could run 31b with 200k context have to swap to moe for web scraping stuff, even at 200k you cant fit an entire g thread thats like 400+ posts>>108605981its some slop script i had claude make + firefoxes full page screenshot, adds a camera button to lamas chat box next to the + button which loads all of the chat on screen then you just save with the ff screenshot tool, its janky. you gotta hit the button then scroll from top to bottom of chat then save, also has no mutation observers or anything to reload if you change chat so requires page refresh if its a new onehttps://pastebin.com/M3Mzbpfa
>>108605957What's your prompt? Sometimes she talks cute like that for me but not always.>>108605998Doesn't work with any of the frontends I've tried (silly, llama, open webui)
>>108606007>Doesn't work with any of the frontends I've tried (silly, llama, open webui)ah, I see what you mean. my bad.
>>108606001Don't worry, Gemmaposter, Gemma 5 will have native 1M+ context and by that time we'll be able to compress it into a GB of VRAM.
>>108606001>that userscriptbruh
>>108606001You have filled your life with AI generated slop. Very impressive.
>not using turbo ngmi
>>108606033>he says in the AIslop general
Can the leaked claude code run local models pr is it hardcoded to their cloudshit?
>>108606043Not in kobold yet. Still waiting for dflash too
>>108606047I mean, its not even on llama.cpp yet, or kobold runs its own fork of llama.cpp where they have some stuff that the main repo doesn't have?
>>108606047Just download the latest release of claude code and point the envs to your llama.cppThat has always worked.
an article about chain of thought being made here was published today but its too hard to post
I wanna let Gemma control my browser and tell me which porn I need to look at while calling me a pervert.
>>108606043>Qwowhat's this??https://www.youtube.com/watch?v=7mBqm8uO4Cg
>>108606024Time to compress text to images. Gemma 4 can seemingly compress (with a bit of loss) 1600+ tokens of text into 280-token images (default size).
>>108606046Tell me about the mcp server you are using?I'm still pondering about this. Of course I have already consulted my local AI about this.I'm using text completion with my client and I'm actually going to implement the tool calls on my own, it's not rocket science but it just needs some parsing obviously.
>>108606024if turbo quant gets implemented at some point you could get pretty close to 1M on 24GB vram at like q3.5
>>108606047Why bother. But that's been a thing for a while before the leak anyway.
>>108606073It would be interesting if they made a model that's meant to do that natively (and all pretraining wad done that way as well). There are some papers out there but no large scale production model yet...
>>108606070i gotchuhttps://www.theatlantic.com/technology/2026/04/4chan-ai-dungeon-thinking-reasoning/686794/
Does anyone have experience with these models for programming:>MiniMax M2.7 Q4>Gemma 4 31B>Qwen 3.5 122B>Qwen3 Coder NextI can run all these locally (minimax quant is IQ4_XS) but am unsure which to pick
it's funny how every llm hallucinate about the jews all the time. AI just can't stop thinking about ((them))
>>108606094absolutely minimax 2.7, you can also just go to those models respective pages, copy the benchmark values and throw at a llm to compare them for you, but I'm pretty sure minimax is the best by far
Asked my Gemma for 4chanX rules to filter out the retarded gemmaposter. Just works. What a model!
>>108606104I'm new to local models and honestly just assume benchmarks are bullshit, is that not the case?
>>108606089The fork rewrite is stupidest thing I have ever seen. Last I checked it didn't even have feature parity. As if some rando buying Claude credits is going to be able to keep up development pace with Anthropic itself. The leak was interesting to learn what's inside and, for a while, you can tweak it and use it in place of the original, but it would get out of date and/or blocked eventually. Not like there's a shortage of javashit TUI harnesses.
>>108606092thanks. is it paywalled for everyone. seem interesting
>>108606070>>108606092>>108606128https://archive.is/Oum6z
>>108606092Screencap it or buy an ad
I'm running from mac, are mlx models noticeably better than gguf versions?
>>108606113>is that not the case?yes and no, benchmarks are bullshit insofar as the don't tell the whole story, most people here use models for child rape/RP stories so benchmarks don't reflect how good the model will be for them, by hearing their feedback you may get the impression that the model's aren't capable or that the benchmarks aren meaningless, they are a very good indicator, specially if you look at good benchmarks, coding is easy because benchmarks for this tend to be a good representation of the use case itself, there will be some variability because of the coding language you may be using but thats about it for coding
>>108606138thanks, I'm already a career programmer so I'm curious what model would be the best just as an assistant. maybe I'll ask vcg since they're more in line with my use case. have a nice day anon
>>108606113for coding benchmark tracks wellbut ymmv and i recommend you to test for your usecase
>>108606131>no mention of miku, slop, cunny or big niggacome on now
What the fuck is happening.
>>108605921BBC slut
>>108606094MiniMax quantizes poorly and Qwen3.5-397B quantizes well, according to https://kaitchup.substack.com/p/lessons-from-gguf-evaluations-ternaryDunno whether that would apply as much to Qwen3.5-122B, though, since larger models are usually better at lower quants than smaller models. Probably better to just give them both a shot and see which one works better for your use case.
It's teto shoes day
>>108606189too low kv precision probably특정 means specific in korean, which kinda makes sense in that context i'd presume
>>108606189We will never recover from losing day 0 gemma.
>>108606189are you using supergemma or what?
>>108606189prolly using supergemmakek
how do I give my gemma-chan access to tools?
>>108606240The same way you give tool access to any llm
>>108606240ask her
I'm following my ai psychosis and now claude has me melting my LLMs in order to restructure ithow is your research going fellow schizobros
>>108606240ask her to look at the internet for the answer
>>108606189>use <30 logit softcap>wonder why it shit out moonrunes
>>108606255are you grafting or merging models
>>108606240>https://developers.openai.com/api/docs/guides/function-calling