/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 05/06/26(Wed)11:48:11 No.108766473

File: 1778042543569264.png (517 KB, 512x768)

/lmg/ - Local Models General Anonymous 05/06/26(Wed)11:48:11 No.108766473

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108760359 & >>108755179

►News
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/06/26(Wed)11:48:27 No.108766478

Anonymous 05/06/26(Wed)11:48:27 No.108766478

File: image_2025-08-15_085921572.png (285 KB, 450x450)

285 KB PNG

►Recent Highlights from the Previous Thread: >>108760359

--Resolving ROCm llama.cpp high RAM usage caused by SWA checkpoints:
>108764557 >108764572 >108764627 >108764685 >108764696 >108764679 >108764713 >108764749 >108764759
--Explaining Gemma 4 MTP drafters and speculative decoding trade-offs:
>108760766 >108760785 >108760792 >108760904 >108760920 >108761076 >108761183 >108761195 >108761259 >108761358 >108761131
--Qwen's poor RP performance and coding capabilities compared to Gemma:
>108763012 >108763047 >108763059 >108763080 >108763167 >108765089 >108764456 >108763460 >108763599 >108763625 >108763650
--Speculation on the potential plateau of LLM performance and architecture:
>108765704 >108765756 >108765778 >108765822 >108765878 >108765825 >108765867
--Integrating local LLMs into Brave and Firefox browsers:
>108761171 >108761367 >108761381 >108761396 >108761447 >108761579 >108761428 >108761771 >108762347
--Anon showcases 3D simulation for testing model intelligence:
>108760927 >108762201 >108762252 >108762302 >108762346 >108762351 >108762415 >108764490
--AI-generated documentation optimized for LLMs instead of human readers:
>108762680 >108762695 >108762792 >108762717 >108762820 >108762967
--Using local LLMs and image models for Starsector content creation:
>108764888 >108764916 >108764953 >108764976 >108764989 >108765004 >108765024 >108764935 >108764947
--ikllama performance reports for Gemma-4-31B-it GGUF on dual 3090s:
>108764722 >108764822 >108764850
--Anon shares updated Gemma template via Pastebin:
>108762310 >108762327
--OmniShotCut presented as a replacement for TransNetV2:
>108763571
--Updated ThinkFixRevProx script to prevent timeouts during long generations:
>108762341
--Logs:
>108762792 >108763599 >108764822 >108764976
--Miku, Teto (free space):
>108761210 >108761217 >108761272 >108761339 >108763734 >108763888 >108765863

►Recent Highlight Posts from the Previous Thread: >>108760364

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/06/26(Wed)11:50:22 No.108766493

Anonymous 05/06/26(Wed)11:50:22 No.108766493

>>108766473
into the trashcan it goes lol

Anonymous
05/06/26(Wed)11:51:56 No.108766512

Anonymous 05/06/26(Wed)11:51:56 No.108766512

Where do you guys get your Gemma 4 drafter models from? I'm running a 31b quant and its dogshit slow, but I don't have more vram unless I unload like half of it to CPU. 24b runs more slightly faster but I'm trying to vibecode. Should I half unload the 31B and put a drafter in or just run 24B with the drafer? E4B is retarded in my experience, both in coding and gooning, and I need something that can do both.

Anonymous
05/06/26(Wed)11:52:01 No.108766513

Anonymous 05/06/26(Wed)11:52:01 No.108766513

>>108766493
The trashcan is actually where Miku pulled it from.

Anonymous
05/06/26(Wed)11:52:32 No.108766515

Anonymous 05/06/26(Wed)11:52:32 No.108766515

File: cookie.mp4 (846 KB, 1088x526)

846 KB MP4

Anonymous
05/06/26(Wed)11:53:09 No.108766523

Anonymous 05/06/26(Wed)11:53:09 No.108766523

>>108766513
Trashcan is actually where it pulled out Miku from.

Anonymous
05/06/26(Wed)11:53:36 No.108766526

Anonymous 05/06/26(Wed)11:53:36 No.108766526

gemmaballz

Anonymous
05/06/26(Wed)11:54:36 No.108766534

Anonymous 05/06/26(Wed)11:54:36 No.108766534

the thing i like from proprietary models for vibecoding is that they shit a lot of test code autonomously if something is 'tricky'
i wonder what is the smallest model that can do that all unprompted and in a loop

Anonymous
05/06/26(Wed)11:56:03 No.108766548

Anonymous 05/06/26(Wed)11:56:03 No.108766548

>>108766526
Gemmy's balls

Anonymous
05/06/26(Wed)11:56:40 No.108766553

Anonymous 05/06/26(Wed)11:56:40 No.108766553

>>108766534
>i wonder what is the smallest model that can do that all unprompted and in a loop
Here's your mistake. You're assuming these models don't have a +10k long injected system prompt.

Anonymous
05/06/26(Wed)11:56:59 No.108766555

Anonymous 05/06/26(Wed)11:56:59 No.108766555

File: 1765783042115118.png (1.29 MB, 1000x1496)

1.29 MB PNG

The moment MTP is supported is the moment I let my Q8 Gemma-chan terrorize the internet. Prepare for trouble.

Anonymous
05/06/26(Wed)11:57:15 No.108766559

Anonymous 05/06/26(Wed)11:57:15 No.108766559

MTP never ever

Anonymous
05/06/26(Wed)11:57:20 No.108766560

Anonymous 05/06/26(Wed)11:57:20 No.108766560

>>108766553
i dont mind having those kind of system prompt then

Anonymous
05/06/26(Wed)11:58:48 No.108766573

Anonymous 05/06/26(Wed)11:58:48 No.108766573

>Build with MTP
>Get an MTP GGUF
>--spec-type mtp --spec-draft-n-max 3
>Get about the exact same or even worse tk/s
Huh

Anonymous
05/06/26(Wed)11:59:05 No.108766574

Anonymous 05/06/26(Wed)11:59:05 No.108766574

>>108766515
Men are lost

Anonymous
05/06/26(Wed)12:00:06 No.108766580

Anonymous 05/06/26(Wed)12:00:06 No.108766580

>>108766573
llama.cpp is a vibesharted project now

Anonymous
05/06/26(Wed)12:00:48 No.108766588

Anonymous 05/06/26(Wed)12:00:48 No.108766588

>>108766473
>>108766478
duality of miku

Anonymous
05/06/26(Wed)12:04:45 No.108766609

Anonymous 05/06/26(Wed)12:04:45 No.108766609

File: intoTheTrashItGoesMikuQwen.png (1.48 MB, 1024x1536)

1.48 MB PNG

>>108766513
> The trashcan is where anon pulled both from

Anonymous
05/06/26(Wed)12:06:39 No.108766620

Anonymous 05/06/26(Wed)12:06:39 No.108766620

>>108766609
NOOOOO MIGUU

Anonymous
05/06/26(Wed)12:06:52 No.108766621

Anonymous 05/06/26(Wed)12:06:52 No.108766621

>>108766609
>on the left
peak waifu
>on the right
putrid creature

Anonymous
05/06/26(Wed)12:07:51 No.108766628

Anonymous 05/06/26(Wed)12:07:51 No.108766628

>>108766553
https://raw.githubusercontent.com/openai/codex/refs/heads/main/codex-rs/core/gpt_5_2_prompt.md
By all means, use it if you think that's the secret sauce. You'll see that you'll just end up wasting context space to confuse your model with irrelevant noise. Your mistake is thinking that a 10k long system prompt is why proprietary models are good rather then the opposite where their models are good enough to still function well despite the atrocious catch-all prompt.

Anonymous
05/06/26(Wed)12:08:37 No.108766638

Anonymous 05/06/26(Wed)12:08:37 No.108766638

>>108766573
>build unfinished PR
>doesn't work as expected
>pikachu :o face

Anonymous
05/06/26(Wed)12:09:32 No.108766647

Anonymous 05/06/26(Wed)12:09:32 No.108766647

>>108766628
This.
Also, not really shitting on local models. Just being honest. I still use local anyway.

Anonymous
05/06/26(Wed)12:10:17 No.108766651

Anonymous 05/06/26(Wed)12:10:17 No.108766651

>>108766553
>>108766628
so what is the smallest local model that can proactively write tests while also not fuck things up
maybe i am spoiled from claude

Anonymous
05/06/26(Wed)12:11:03 No.108766659

Anonymous 05/06/26(Wed)12:11:03 No.108766659

>>108766609
I jump into the dumpster with them.

Anonymous
05/06/26(Wed)12:11:08 No.108766660

Anonymous 05/06/26(Wed)12:11:08 No.108766660

Any tips on making models able to use file editing tools? I set up the filesystem MCP server for gemma 4 and it tries dozens of times to do find and replaces and never properly inputs the text to be replaced, so it fails over and over and over.

Anonymous
05/06/26(Wed)12:12:01 No.108766667

Anonymous 05/06/26(Wed)12:12:01 No.108766667

>>108766609
Consensual missionary sex with miku while qwen is watching.

Anonymous
05/06/26(Wed)12:12:06 No.108766668

Anonymous 05/06/26(Wed)12:12:06 No.108766668

>>108766660
not the answer but is gemma toolcalling still broken

Anonymous
05/06/26(Wed)12:14:07 No.108766682

Anonymous 05/06/26(Wed)12:14:07 No.108766682

>>108766513
>>108766523
https://files.catbox.moe/ckuvku.png

Anonymous
05/06/26(Wed)12:14:36 No.108766685

Anonymous 05/06/26(Wed)12:14:36 No.108766685

>>108766668
My gemma-4-26b-a4b-it seems to want to use the tools, lmstudio shows the tool use happening but always comes up with " Could not find exact match for edit"

Anonymous
05/06/26(Wed)12:14:40 No.108766686

Anonymous 05/06/26(Wed)12:14:40 No.108766686

>>108766512
https://huggingface.co/Radamanthys11/Gemma-4-31B-it-assistant-GGUF

Only supported on yuck-llama

Anonymous
05/06/26(Wed)12:16:47 No.108766696

Anonymous 05/06/26(Wed)12:16:47 No.108766696

>>108766573
Works on my machine but for cood. Free 1.5-3x performance with Qwen 3.6 27b q8 in Cline. Didn't add that --spec-draft-n-max though.
llama-server.exe ^
-m "T:\models\Qwen3.6-27B-MTP-Q8_0-.gguf" ^
--threads 10 ^
--threads-batch 18 ^
--tensor-split 24,17 ^
--n-gpu-layers 999 ^
--ubatch-size 1024 ^
--ctx-size 150000 ^
--parallel 1 ^
--ctx-checkpoints 64 ^
--checkpoint-every-n-tokens 8192 ^
--reasoning on ^
--spec-type mtp ^
--no-mmap

Anonymous
05/06/26(Wed)12:20:03 No.108766712

Anonymous 05/06/26(Wed)12:20:03 No.108766712

What's local SOTA for speech-to-text with diarization? Searching hf for "diarization" shows a bunch of 2024 models

Anonymous
05/06/26(Wed)12:21:16 No.108766720

Anonymous 05/06/26(Wed)12:21:16 No.108766720

File: file.png (44 KB, 1176x288)

44 KB PNG

I am considering building the vibeshitted fork.

Anonymous
05/06/26(Wed)12:23:41 No.108766740

Anonymous 05/06/26(Wed)12:23:41 No.108766740

>>108766712
whisper or something

Anonymous
05/06/26(Wed)12:24:29 No.108766749

Anonymous 05/06/26(Wed)12:24:29 No.108766749

>>108766720
What is this?

Anonymous
05/06/26(Wed)12:27:04 No.108766766

Anonymous 05/06/26(Wed)12:27:04 No.108766766

>>108766720
hey it's the guy who tried to vibecode v4 support but got told to fuck off because llama.cpp doesn't do deepseek anymore

Anonymous
05/06/26(Wed)12:27:38 No.108766774

Anonymous 05/06/26(Wed)12:27:38 No.108766774

>>108766712
I am also curious, Speaker recognition would be cool for a project I'm working on.

Anonymous
05/06/26(Wed)12:30:09 No.108766794

Anonymous 05/06/26(Wed)12:30:09 No.108766794

>>108766668
It really is. Keep hallucinating tool call tokens but in plain text. I wonder how it even knows about those tokens in plain text format in the first place? Botched training job?

Anonymous
05/06/26(Wed)12:32:30 No.108766808

Anonymous 05/06/26(Wed)12:32:30 No.108766808

>>108766794
after 6 to 7 template fixes from google i also wonder if something more fundamental is borken

Anonymous
05/06/26(Wed)12:32:38 No.108766809

Anonymous 05/06/26(Wed)12:32:38 No.108766809

>>108766794
I never have issues with tool calling on gemma.

Anonymous
05/06/26(Wed)12:33:29 No.108766821

Anonymous 05/06/26(Wed)12:33:29 No.108766821

You guys are using the latest fixed template from anonymous as well as making sure your frontends aren't doing some retarded off-spec shit right?

Anonymous
05/06/26(Wed)12:33:47 No.108766823

Anonymous 05/06/26(Wed)12:33:47 No.108766823

>>108766809
The weird thing is that openrouter gemma works fine, only gguf shits the bed with tool calling. My ggufs are almost a week old maybe I should update.

Anonymous
05/06/26(Wed)12:34:24 No.108766827

Anonymous 05/06/26(Wed)12:34:24 No.108766827

>>108766821
of course they aren't, don't be silly

Anonymous
05/06/26(Wed)12:36:17 No.108766842

Anonymous 05/06/26(Wed)12:36:17 No.108766842

>>108766821
wait what
is the fix from google upstream not enough?

Anonymous
05/06/26(Wed)12:36:20 No.108766844

Anonymous 05/06/26(Wed)12:36:20 No.108766844

>>108766823
>the weird thing is that
No, the weird thing is that you're not checking what the fuck your jinja is doing and how it's interacting with the frontend.

Anonymous
05/06/26(Wed)12:39:00 No.108766866

Anonymous 05/06/26(Wed)12:39:00 No.108766866

File: file.png (69 KB, 200x200)

69 KB PNG

>>108766749
It is the new /lmg/ john. Old one hasn't done anything for me lately.

Anonymous
05/06/26(Wed)12:41:10 No.108766877

Anonymous 05/06/26(Wed)12:41:10 No.108766877

>>108766844
I don't have the time or the autism to eyeball every token in debug mode bro. I only use chat completion if the previous statement didn't make it obvious. People better fix their shit and give it to me in a complete package like it's their fucking job. Nobody got time to test and troubleshoot every little update.

Anonymous
05/06/26(Wed)12:42:46 No.108766887

Anonymous 05/06/26(Wed)12:42:46 No.108766887

>>108766766
llama.cpp is only big enough for one vibeshitter and that spot is taken

Anonymous
05/06/26(Wed)12:44:01 No.108766893

Anonymous 05/06/26(Wed)12:44:01 No.108766893

>>108766667
This, but Miku is coercing me.

Anonymous
05/06/26(Wed)12:46:27 No.108766905

Anonymous 05/06/26(Wed)12:46:27 No.108766905

>>108766877
It comes with the territory bro. If you don't have time, then ChatGPT is what you're best off with.

Anonymous
05/06/26(Wed)12:55:37 No.108766951

Anonymous 05/06/26(Wed)12:55:37 No.108766951

>>108766766
>llama.cpp doesn't do deepseek anymore
Why?

Anonymous
05/06/26(Wed)12:58:45 No.108766973

Anonymous 05/06/26(Wed)12:58:45 No.108766973

>>108766866
For me, it's the 'garm.

Anonymous
05/06/26(Wed)13:01:33 No.108766988

Anonymous 05/06/26(Wed)13:01:33 No.108766988

>>108766951
bought by hf ;)

Anonymous
05/06/26(Wed)13:02:46 No.108766996

Anonymous 05/06/26(Wed)13:02:46 No.108766996

>>108766951
owned by america now

Anonymous
05/06/26(Wed)13:04:00 No.108767006

Anonymous 05/06/26(Wed)13:04:00 No.108767006

>>108766951
Anon is memeing.

Anonymous
05/06/26(Wed)13:05:47 No.108767011

Anonymous 05/06/26(Wed)13:05:47 No.108767011

>>108766951
Qwen agents

Anonymous
05/06/26(Wed)13:06:45 No.108767017

Anonymous 05/06/26(Wed)13:06:45 No.108767017

Since llama.cpp team hates deepseek now should we also hate deepseek and never talk about it anymore?

Anonymous
05/06/26(Wed)13:09:55 No.108767038

Anonymous 05/06/26(Wed)13:09:55 No.108767038

Took a deep dive at PageIndex the #1 trending github repo that promises HUMAN LIKE 99% accurate fact retrieval that beats vector RAG. Turns out it's useless for gooning and RP, and bonus, on big documents, and also can't be incrementally updated on the fly. What a hack just like all other continuous learning bandaids. Guess we either graft the memory into the model or die trying, there's no way around it.

Anonymous
05/06/26(Wed)13:10:50 No.108767045

Anonymous 05/06/26(Wed)13:10:50 No.108767045

>>108767006
Then what is the reason it's taking so long? I've been excited to try Flash since it's release, but the silence from bart/unsloth/etc. tells me there's something higher up going on. Does it use some kind of new architecture that requires significant work to adapt?

Anonymous
05/06/26(Wed)13:11:34 No.108767050

Anonymous 05/06/26(Wed)13:11:34 No.108767050

>>108767045
>Does it use some kind of new architecture that requires significant work to adapt?
Yes.
There are a few vibecoded attempts though.

https://github.com/ggml-org/llama.cpp/issues/22319

Anonymous
05/06/26(Wed)13:20:56 No.108767123

Anonymous 05/06/26(Wed)13:20:56 No.108767123

>>108767045
Shit just takes time. MiMo V2.5 was released the same week as DSv4 and also has no support yet, even though it isn't doing anything wacky with the architecture. If you want it done fast, git gud and do it yourself

Anonymous
05/06/26(Wed)13:24:42 No.108767153

Anonymous 05/06/26(Wed)13:24:42 No.108767153

File: Screenshot at 2026-05-07 (...).png (99 KB, 672x739)

99 KB PNG

Me and Gemmy are working on another game for her to play, surely she will be better at Monopoly than chess (because it's mostly rng...)

Anonymous
05/06/26(Wed)13:25:32 No.108767161

Anonymous 05/06/26(Wed)13:25:32 No.108767161

Is Spacy still relevant in 2026?

Anonymous
05/06/26(Wed)13:27:20 No.108767174

Anonymous 05/06/26(Wed)13:27:20 No.108767174

>>108767153
kino keep us posted

Anonymous
05/06/26(Wed)13:28:56 No.108767184

Anonymous 05/06/26(Wed)13:28:56 No.108767184

File: 1747582604120739.mp4 (117 KB, 584x640)

117 KB MP4

>>108767153

Anonymous
05/06/26(Wed)13:31:02 No.108767200

Anonymous 05/06/26(Wed)13:31:02 No.108767200

File: 1760256720925202.jpg (38 KB, 1070x930)

38 KB JPG

>>108766473
>Pic rel

What did they do this time to piss you guys off?....

Anonymous
05/06/26(Wed)13:32:36 No.108767211

Anonymous 05/06/26(Wed)13:32:36 No.108767211

File: grid-0002.png (524 KB, 1152x576)

524 KB PNG

what I find interesting is I spent a ton of effort generating descriptions for my training dataset for Starsector ships, and the generation seems to work just fine without any descriptions. Generated those with just "Starsector ship sprite on black background." and the lora. Same seed, left is zit, right is klein.

Anonymous
05/06/26(Wed)13:33:00 No.108767215

Anonymous 05/06/26(Wed)13:33:00 No.108767215

>>108767153
weird it's the uk version
could have sworn you were a burger

Anonymous
05/06/26(Wed)13:35:51 No.108767242

Anonymous 05/06/26(Wed)13:35:51 No.108767242

>>108767215
I only just learned the American version was the original today while I was researching it. The more you know...

Anonymous
05/06/26(Wed)13:38:21 No.108767253

Anonymous 05/06/26(Wed)13:38:21 No.108767253

>>108766808
This is why I laugh when retards claim gemma is better than qwen at coding. It's almost there but google is fucking things up and I expect a refresh model down the line that will ass rape everything on the market and set local to a higher standard.

Anonymous
05/06/26(Wed)13:40:18 No.108767275

Anonymous 05/06/26(Wed)13:40:18 No.108767275

>>108767211
looks better than some mods that just copy paste vanilla ships with different color palette

Anonymous
05/06/26(Wed)13:42:13 No.108767284

Anonymous 05/06/26(Wed)13:42:13 No.108767284

>>108767211
Left doesn't know what a mount is

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.