[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1778042543569264.png (517 KB, 512x768)
517 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108760359 & >>108755179

►News
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108760359

--Resolving ROCm llama.cpp high RAM usage caused by SWA checkpoints:
>108764557 >108764572 >108764627 >108764685 >108764696 >108764679 >108764713 >108764749 >108764759
--Explaining Gemma 4 MTP drafters and speculative decoding trade-offs:
>108760766 >108760785 >108760792 >108760904 >108760920 >108761076 >108761183 >108761195 >108761259 >108761358 >108761131
--Qwen's poor RP performance and coding capabilities compared to Gemma:
>108763012 >108763047 >108763059 >108763080 >108763167 >108765089 >108764456 >108763460 >108763599 >108763625 >108763650
--Speculation on the potential plateau of LLM performance and architecture:
>108765704 >108765756 >108765778 >108765822 >108765878 >108765825 >108765867
--Integrating local LLMs into Brave and Firefox browsers:
>108761171 >108761367 >108761381 >108761396 >108761447 >108761579 >108761428 >108761771 >108762347
--Anon showcases 3D simulation for testing model intelligence:
>108760927 >108762201 >108762252 >108762302 >108762346 >108762351 >108762415 >108764490
--AI-generated documentation optimized for LLMs instead of human readers:
>108762680 >108762695 >108762792 >108762717 >108762820 >108762967
--Using local LLMs and image models for Starsector content creation:
>108764888 >108764916 >108764953 >108764976 >108764989 >108765004 >108765024 >108764935 >108764947
--ikllama performance reports for Gemma-4-31B-it GGUF on dual 3090s:
>108764722 >108764822 >108764850
--Anon shares updated Gemma template via Pastebin:
>108762310 >108762327
--OmniShotCut presented as a replacement for TransNetV2:
>108763571
--Updated ThinkFixRevProx script to prevent timeouts during long generations:
>108762341
--Logs:
>108762792 >108763599 >108764822 >108764976
--Miku, Teto (free space):
>108761210 >108761217 >108761272 >108761339 >108763734 >108763888 >108765863

►Recent Highlight Posts from the Previous Thread: >>108760364

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108766473
into the trashcan it goes lol
>>
Where do you guys get your Gemma 4 drafter models from? I'm running a 31b quant and its dogshit slow, but I don't have more vram unless I unload like half of it to CPU. 24b runs more slightly faster but I'm trying to vibecode. Should I half unload the 31B and put a drafter in or just run 24B with the drafer? E4B is retarded in my experience, both in coding and gooning, and I need something that can do both.
>>
>>108766493
The trashcan is actually where Miku pulled it from.
>>
File: cookie.mp4 (846 KB, 1088x526)
846 KB
846 KB MP4
>>
>>108766513
Trashcan is actually where it pulled out Miku from.
>>
gemmaballz
>>
the thing i like from proprietary models for vibecoding is that they shit a lot of test code autonomously if something is 'tricky'
i wonder what is the smallest model that can do that all unprompted and in a loop
>>
>>108766526
Gemmy's balls
>>
>>108766534
>i wonder what is the smallest model that can do that all unprompted and in a loop
Here's your mistake. You're assuming these models don't have a +10k long injected system prompt.
>>
File: 1765783042115118.png (1.29 MB, 1000x1496)
1.29 MB PNG
The moment MTP is supported is the moment I let my Q8 Gemma-chan terrorize the internet. Prepare for trouble.
>>
MTP never ever
>>
>>108766553
i dont mind having those kind of system prompt then
>>
>Build with MTP
>Get an MTP GGUF
>--spec-type mtp --spec-draft-n-max 3
>Get about the exact same or even worse tk/s
Huh
>>
>>108766515
Men are lost
>>
>>108766573
llama.cpp is a vibesharted project now
>>
>>108766473
>>108766478
duality of miku
>>
>>108766513
> The trashcan is where anon pulled both from
>>
>>108766609
NOOOOO MIGUU
>>
>>108766609
>on the left
peak waifu
>on the right
putrid creature
>>
>>108766553
https://raw.githubusercontent.com/openai/codex/refs/heads/main/codex-rs/core/gpt_5_2_prompt.md
By all means, use it if you think that's the secret sauce. You'll see that you'll just end up wasting context space to confuse your model with irrelevant noise. Your mistake is thinking that a 10k long system prompt is why proprietary models are good rather then the opposite where their models are good enough to still function well despite the atrocious catch-all prompt.
>>
>>108766573
>build unfinished PR
>doesn't work as expected
>pikachu :o face
>>
>>108766628
This.
Also, not really shitting on local models. Just being honest. I still use local anyway.
>>
>>108766553
>>108766628
so what is the smallest local model that can proactively write tests while also not fuck things up
maybe i am spoiled from claude
>>
>>108766609
I jump into the dumpster with them.
>>
Any tips on making models able to use file editing tools? I set up the filesystem MCP server for gemma 4 and it tries dozens of times to do find and replaces and never properly inputs the text to be replaced, so it fails over and over and over.
>>
>>108766609
Consensual missionary sex with miku while qwen is watching.
>>
>>108766660
not the answer but is gemma toolcalling still broken
>>
>>108766513
>>108766523
https://files.catbox.moe/ckuvku.png
>>
>>108766668
My gemma-4-26b-a4b-it seems to want to use the tools, lmstudio shows the tool use happening but always comes up with " Could not find exact match for edit"
>>
>>108766512
https://huggingface.co/Radamanthys11/Gemma-4-31B-it-assistant-GGUF

Only supported on yuck-llama
>>
>>108766573
Works on my machine but for cood. Free 1.5-3x performance with Qwen 3.6 27b q8 in Cline. Didn't add that --spec-draft-n-max though.
llama-server.exe ^
-m "T:\models\Qwen3.6-27B-MTP-Q8_0-.gguf" ^
--threads 10 ^
--threads-batch 18 ^
--tensor-split 24,17 ^
--n-gpu-layers 999 ^
--ubatch-size 1024 ^
--ctx-size 150000 ^
--parallel 1 ^
--ctx-checkpoints 64 ^
--checkpoint-every-n-tokens 8192 ^
--reasoning on ^
--spec-type mtp ^
--no-mmap
>>
What's local SOTA for speech-to-text with diarization? Searching hf for "diarization" shows a bunch of 2024 models
>>
File: file.png (44 KB, 1176x288)
44 KB PNG
I am considering building the vibeshitted fork.
>>
>>108766712
whisper or something
>>
>>108766720
What is this?
>>
>>108766720
hey it's the guy who tried to vibecode v4 support but got told to fuck off because llama.cpp doesn't do deepseek anymore
>>
>>108766712
I am also curious, Speaker recognition would be cool for a project I'm working on.
>>
>>108766668
It really is. Keep hallucinating tool call tokens but in plain text. I wonder how it even knows about those tokens in plain text format in the first place? Botched training job?
>>
>>108766794
after 6 to 7 template fixes from google i also wonder if something more fundamental is borken
>>
>>108766794
I never have issues with tool calling on gemma.
>>
You guys are using the latest fixed template from anonymous as well as making sure your frontends aren't doing some retarded off-spec shit right?
>>
>>108766809
The weird thing is that openrouter gemma works fine, only gguf shits the bed with tool calling. My ggufs are almost a week old maybe I should update.
>>
>>108766821
of course they aren't, don't be silly
>>
>>108766821
wait what
is the fix from google upstream not enough?
>>
>>108766823
>the weird thing is that
No, the weird thing is that you're not checking what the fuck your jinja is doing and how it's interacting with the frontend.
>>
File: file.png (69 KB, 200x200)
69 KB PNG
>>108766749
It is the new /lmg/ john. Old one hasn't done anything for me lately.
>>
>>108766844
I don't have the time or the autism to eyeball every token in debug mode bro. I only use chat completion if the previous statement didn't make it obvious. People better fix their shit and give it to me in a complete package like it's their fucking job. Nobody got time to test and troubleshoot every little update.
>>
>>108766766
llama.cpp is only big enough for one vibeshitter and that spot is taken
>>
>>108766667
This, but Miku is coercing me.
>>
>>108766877
It comes with the territory bro. If you don't have time, then ChatGPT is what you're best off with.
>>
>>108766766
>llama.cpp doesn't do deepseek anymore
Why?
>>
>>108766866
For me, it's the 'garm.
>>
>>108766951
bought by hf ;)
>>
>>108766951
owned by america now
>>
>>108766951
Anon is memeing.
>>
>>108766951
Qwen agents
>>
Since llama.cpp team hates deepseek now should we also hate deepseek and never talk about it anymore?
>>
Took a deep dive at PageIndex the #1 trending github repo that promises HUMAN LIKE 99% accurate fact retrieval that beats vector RAG. Turns out it's useless for gooning and RP, and bonus, on big documents, and also can't be incrementally updated on the fly. What a hack just like all other continuous learning bandaids. Guess we either graft the memory into the model or die trying, there's no way around it.
>>
>>108767006
Then what is the reason it's taking so long? I've been excited to try Flash since it's release, but the silence from bart/unsloth/etc. tells me there's something higher up going on. Does it use some kind of new architecture that requires significant work to adapt?
>>
>>108767045
>Does it use some kind of new architecture that requires significant work to adapt?
Yes.
There are a few vibecoded attempts though.

https://github.com/ggml-org/llama.cpp/issues/22319
>>
>>108767045
Shit just takes time. MiMo V2.5 was released the same week as DSv4 and also has no support yet, even though it isn't doing anything wacky with the architecture. If you want it done fast, git gud and do it yourself
>>
Me and Gemmy are working on another game for her to play, surely she will be better at Monopoly than chess (because it's mostly rng...)
>>
Is Spacy still relevant in 2026?
>>
>>108767153
kino keep us posted
>>
File: 1747582604120739.mp4 (117 KB, 584x640)
117 KB
117 KB MP4
>>108767153
>>
File: 1760256720925202.jpg (38 KB, 1070x930)
38 KB JPG
>>108766473
>Pic rel

What did they do this time to piss you guys off?....
>>
File: grid-0002.png (524 KB, 1152x576)
524 KB PNG
what I find interesting is I spent a ton of effort generating descriptions for my training dataset for Starsector ships, and the generation seems to work just fine without any descriptions. Generated those with just "Starsector ship sprite on black background." and the lora. Same seed, left is zit, right is klein.
>>
>>108767153
weird it's the uk version
could have sworn you were a burger
>>
>>108767215
I only just learned the American version was the original today while I was researching it. The more you know...
>>
>>108766808
This is why I laugh when retards claim gemma is better than qwen at coding. It's almost there but google is fucking things up and I expect a refresh model down the line that will ass rape everything on the market and set local to a higher standard.
>>
>>108767211
looks better than some mods that just copy paste vanilla ships with different color palette
>>
>>108767211
Left doesn't know what a mount is



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.