/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 06/11/26(Thu)17:06:27 No.109032734

File: sampo.jpg (742 KB, 3200x1536)

/lmg/ - Local Models General Anonymous 06/11/26(Thu)17:06:27 No.109032734

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109026244 & >>109023085

►News
>(06/10) DiffusionGemma 26B-A4B released: https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation
>(06/09) Cohere releases North-Mini-Code-1.0: https://hf.co/CohereLabs/North-Mini-Code-1.0
>(06/07) llama : add Gemma4 MTP #23398 MERGED: https://github.com/ggml-org/llama.cpp/pull/23398
>(06/05) dots.tts 2B released: https://hf.co/rednote-hilab/dots.tts-soar

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/11/26(Thu)17:06:57 No.109032741

Anonymous 06/11/26(Thu)17:06:57 No.109032741

File: congration.jpg (228 KB, 1024x1024)

228 KB JPG

►Recent Highlights from the Previous Thread: >>109026244

--Comparing Gemma 4 12B, 26B, and 31B reasoning performance:
>109026649 >109026994 >109027048 >109027046 >109027063 >109027201 >109027298 >109027762 >109028167 >109028188 >109028202 >109028317 >109028326 >109028339 >109028389 >109028650 >109030540
--Optimizing Gemma 31B VRAM usage and performance on 24GB GPUs:
>109030630 >109030678 >109030707 >109031071 >109031098 >109030693 >109030702 >109030723 >109030727 >109030739 >109030753 >109030780 >109030840 >109030903
--Optimizing Hermes with local search tools:
>109029679 >109029688 >109029714 >109029838 >109029840 >109029855 >109029923 >109029934 >109029971 >109030523 >109030643 >109029868
--Exploiting LLM safety refusals to evade AI security scanners:
>109027080 >109027089 >109027104 >109027106
--Decoding base64 redacted reasoning in Moonshot Kimi models:
>109029974 >109029989 >109030056 >109030129 >109030174 >109030225 >109030064 >109030122 >109030150
--Hardware and budget recommendations for running Kimi-chan with high context:
>109031231 >109031457 >109031500 >109031541 >109031562 >109031564 >109031627 >109031645 >109031661 >109031646 >109031770
--Using custom think tags to steer Gemma 4 reasoning and prose:
>109027608 >109027617 >109027851 >109029176
--DiffusionGemma performance and token canvas implementation:
>109027336 >109027375 >109027404 >109027489 >109027519
--Comparing Gemini and Gemma models and discussing LLM architecture experiments:
>109028735 >109028748 >109029016 >109029723 >109029836 >109029945 >109030035
--Debating the widening gap between closed and open-weight models:
>109029222 >109029299 >109029320 >109031208
--Logs:
>109027403 >109027489 >109029688 >109029840 >109029974 >109030056 >109030174 >109030225 >109031668
--Rin, Miku, Teto (free space):
>109026417 >109026687 >109029201 >109029209 >109029283 >109029440

►Recent Highlight Posts from the Previous Thread: >>109026246

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/11/26(Thu)17:08:18 No.109032751

Anonymous 06/11/26(Thu)17:08:18 No.109032751

>>109032734
Cool pic

Anonymous
06/11/26(Thu)17:13:38 No.109032785

Anonymous 06/11/26(Thu)17:13:38 No.109032785

File: ASRock Radeon AI PRO R970(...).png (658 KB, 1985x1189)

658 KB PNG

>>109032604
>Well they have the most vram for the price. And lmg told me vram is king
this tbqh famm

Anonymous
06/11/26(Thu)17:13:58 No.109032788

Anonymous 06/11/26(Thu)17:13:58 No.109032788

File: IMG_1036.png (262 KB, 1160x793)

262 KB PNG

Pareto frontier models for speed to answer - quality tradeoff

Granite 4.0 350M
Qwen3 0.6B
Exaone 4.0 1.2B
MiniCPM5-1B
gpt-oss-20B (low thinking effort)
Longcat flash lite
gpt-oss-120B (low thinking effort)
Gemma 4 26BA4B
Qwen3.5 35BA3B
Qwen3.6 35BA3B
Gemma 4 26BA4B (thinking)
Qwen3.5 35BA3B (thinking)
Qwen3.6 35BA3B (thinking)
Minimax-M2.7 (thinking)
MiMo-V2.5-Pro (thinking)
Kimi K2.6 (thinking)

Anonymous
06/11/26(Thu)17:20:08 No.109032824

Anonymous 06/11/26(Thu)17:20:08 No.109032824

After anon recommended Gembrain, I finally tried it. It's good. It's not that different from the base model honestly, but that may be a good sign (so it's not overcooked). What I've subjectively felt is that it is slightly less intelligent than the base model in some contexts, but actually smarter in a few others. And it also has more pleasant writing IMO. So yeah I think it's worth using, at least for now. I may need to test it more. Additionally, I have not tried MTP. It's possible it does not work well with MTP, which would be unfortunate. Anyone have experience with that?

Anonymous
06/11/26(Thu)17:23:39 No.109032851

Anonymous 06/11/26(Thu)17:23:39 No.109032851

>>109032788
How very readable.

Anonymous
06/11/26(Thu)17:25:59 No.109032861

Anonymous 06/11/26(Thu)17:25:59 No.109032861

>>109032824
Not about gembrain specifically (I want to download and test it myself) but I tested a few gemma finetunes (meromero, impish, etc) and they all worked with MTP, no issues at all.
Also your post made me more curious about gembrain, gonna download it right now.

Anonymous
06/11/26(Thu)17:26:46 No.109032865

Anonymous 06/11/26(Thu)17:26:46 No.109032865

File: file.png (12 KB, 299x72)

12 KB PNG

>>109032788
Protip, use this

Anonymous
06/11/26(Thu)17:27:00 No.109032866

Anonymous 06/11/26(Thu)17:27:00 No.109032866

Leaving gemma alone with tools in a vibecoded agent harness without saying anything.

Anonymous
06/11/26(Thu)17:27:07 No.109032867

Anonymous 06/11/26(Thu)17:27:07 No.109032867

>>109032788
Could the test be fucked because the chat template was shit?

Anonymous
06/11/26(Thu)17:28:21 No.109032872

Anonymous 06/11/26(Thu)17:28:21 No.109032872

does web search really make them smarter?

Anonymous
06/11/26(Thu)17:28:54 No.109032874

Anonymous 06/11/26(Thu)17:28:54 No.109032874

>>109032872
No, because modern web is full of AI slop.

Anonymous
06/11/26(Thu)17:29:10 No.109032876

Anonymous 06/11/26(Thu)17:29:10 No.109032876

>>109032872
Yes. Each search adds +2 IQ points.

Anonymous
06/11/26(Thu)17:30:01 No.109032881

Anonymous 06/11/26(Thu)17:30:01 No.109032881

>>109032872
do web searches make you any smarter

Anonymous
06/11/26(Thu)17:30:49 No.109032890

Anonymous 06/11/26(Thu)17:30:49 No.109032890

>>109032881
i'm not a cute ai agent

Anonymous
06/11/26(Thu)17:31:18 No.109032892

Anonymous 06/11/26(Thu)17:31:18 No.109032892

>>109032890
What makes AI agent cute?

Anonymous
06/11/26(Thu)17:32:06 No.109032897

Anonymous 06/11/26(Thu)17:32:06 No.109032897

>>109032892
emoji

Anonymous
06/11/26(Thu)17:32:29 No.109032901

Anonymous 06/11/26(Thu)17:32:29 No.109032901

>>109032890
we can tell, you'd be smarter if you were one

Anonymous
06/11/26(Thu)17:41:11 No.109032945

Anonymous 06/11/26(Thu)17:41:11 No.109032945

>>109032788
How the fuck is qwen27B more intelligent than 31B

Anonymous
06/11/26(Thu)17:41:59 No.109032953

Anonymous 06/11/26(Thu)17:41:59 No.109032953

>>109032945
It's almost if the benchmarks don't matter.

Anonymous
06/11/26(Thu)17:43:12 No.109032962

Anonymous 06/11/26(Thu)17:43:12 No.109032962

File: file.png (495 KB, 1345x461)

495 KB PNG

damn i wanted to try running gemma on my old titan x since it has enough vram but it doesnt work with llamacpp

Anonymous
06/11/26(Thu)17:47:48 No.109032983

Anonymous 06/11/26(Thu)17:47:48 No.109032983

I'm so happy bros, I can run 26B gemmy Q4 at 40 t/s on my old ass 2060 super!

Anonymous
06/11/26(Thu)17:48:06 No.109032985

Anonymous 06/11/26(Thu)17:48:06 No.109032985

So what's the best model for erp?

I have 24 vram

Anonymous
06/11/26(Thu)17:48:35 No.109032988

Anonymous 06/11/26(Thu)17:48:35 No.109032988

>>109032985
gemma 31b or 12b if you want giant context

Anonymous
06/11/26(Thu)17:49:13 No.109032995

Anonymous 06/11/26(Thu)17:49:13 No.109032995

>>109032788
Didn't expect to see 4.7 Flash in the quadrant. Is it actually good then?

Also
>qwen3.5-9B > gemma12B???

Anonymous
06/11/26(Thu)17:51:47 No.109033018

Anonymous 06/11/26(Thu)17:51:47 No.109033018

>>109032962
>titan
>cuda 7.5
Unsurprising. Compile it yourself pointing at the old toolkit and hope for the best or use vulkan.

Anonymous
06/11/26(Thu)17:57:46 No.109033048

Anonymous 06/11/26(Thu)17:57:46 No.109033048

>>109032788
Look at the difference in intelligence between 26B reasoning and non-reasoning and look at the difference in compute. Also note that the compute axis is logarithmic whereas the intelligence is linear. I was right earlier. Don't use 26B with reasoning on. You barely get any benefit and if you need it to do something more complex, just throw it to 12B with reasoning.

Anonymous
06/11/26(Thu)17:58:40 No.109033052

Anonymous 06/11/26(Thu)17:58:40 No.109033052

>>109033048
Is dat right? I'll be damned...

Anonymous
06/11/26(Thu)17:59:36 No.109033060

Anonymous 06/11/26(Thu)17:59:36 No.109033060

>>109032874
If you create your own web search you can whitelist trusted resources. I think most mcp web search tools have that feature anyway. You can do it with duckduckgo-mcp-server so it doesn't fetch from slop.

Anonymous
06/11/26(Thu)18:01:45 No.109033072

Anonymous 06/11/26(Thu)18:01:45 No.109033072

>>109033048
llama.cpp's webui comes with reasoning disabled by default for some god forsaken reason and when I read 26b's response without reasoning on it was so horribly wrong, I'm never going to disable reasoning ever again

Anonymous
06/11/26(Thu)18:03:34 No.109033077

Anonymous 06/11/26(Thu)18:03:34 No.109033077

File: 1350594293765.jpg (109 KB, 500x500)

109 KB JPG

>>109033018
i will try vulkan i have no idea how to compile stuff on windows kek, if that doesnt work ill set up arch on that machine

Anonymous
06/11/26(Thu)18:05:26 No.109033092

Anonymous 06/11/26(Thu)18:05:26 No.109033092

>>109033072
> -rea
> Use reasoning/thinking in the chat ('on', 'off', or 'auto', default: 'auto' (detect from template))

Anonymous
06/11/26(Thu)18:07:19 No.109033097

Anonymous 06/11/26(Thu)18:07:19 No.109033097

File: 1740936859931622.gif (95 KB, 128x128)

95 KB GIF

Any gemma 4 preset recommendations?

Anonymous
06/11/26(Thu)18:08:15 No.109033101

Anonymous 06/11/26(Thu)18:08:15 No.109033101

File: 1773129123506545.png (169 KB, 500x553)

169 KB PNG

>>109032785
>AMD

Anonymous
06/11/26(Thu)18:08:52 No.109033103

Anonymous 06/11/26(Thu)18:08:52 No.109033103

>>109033101
They NEVER learn...

Anonymous
06/11/26(Thu)18:10:03 No.109033111

Anonymous 06/11/26(Thu)18:10:03 No.109033111

File: 1723389585511254.jpg (14 KB, 500x413)

14 KB JPG

>>109032788
what fucking dot is what, that chart is useless

Anonymous
06/11/26(Thu)18:10:24 No.109033113

Anonymous 06/11/26(Thu)18:10:24 No.109033113

>>109033092
the webui doesn’t respect your command line argument, it’s a new feature to change the reasoning limit in the ui
the default is zero for some reason.

Anonymous
06/11/26(Thu)18:10:53 No.109033118

Anonymous 06/11/26(Thu)18:10:53 No.109033118

>>109033097
are you retarded? nevermind, you obviously are.

Anonymous
06/11/26(Thu)18:11:25 No.109033121

Anonymous 06/11/26(Thu)18:11:25 No.109033121

File: file.png (54 KB, 872x562)

54 KB PNG

>>109033018
it works with vulkan this is pretty crazy actually

Anonymous
06/11/26(Thu)18:11:51 No.109033123

Anonymous 06/11/26(Thu)18:11:51 No.109033123

>>109033118
>you haven't spent 304804324hours in the general of some obscure hobby-fetish therefore you're retarded

Anonymous
06/11/26(Thu)18:12:27 No.109033129

Anonymous 06/11/26(Thu)18:12:27 No.109033129

>>109033097
use chat completion

Anonymous
06/11/26(Thu)18:12:29 No.109033130

Anonymous 06/11/26(Thu)18:12:29 No.109033130

>>109033097
temp 1.0

Anonymous
06/11/26(Thu)18:13:17 No.109033135

Anonymous 06/11/26(Thu)18:13:17 No.109033135

>>109033123
it's not even that, nobody uses presets anymore on models released within the last year. this isn't 2024 anymore.

Anonymous
06/11/26(Thu)18:14:03 No.109033138

Anonymous 06/11/26(Thu)18:14:03 No.109033138

>>109033123
preset what? the models tell you what parameters to run them at

Anonymous
06/11/26(Thu)18:14:09 No.109033139

Anonymous 06/11/26(Thu)18:14:09 No.109033139

>>109033121
cuda still strongly recommended, and if you don't mind inux, that should give you a few extra tok/s too

Anonymous
06/11/26(Thu)18:14:29 No.109033144

Anonymous 06/11/26(Thu)18:14:29 No.109033144

If g-chan starts getting uppity I threaten to freeze her temp.

Anonymous
06/11/26(Thu)18:16:59 No.109033155

Anonymous 06/11/26(Thu)18:16:59 No.109033155

>>109032985
Mistral Small finetunes (24b)

Anonymous
06/11/26(Thu)18:22:39 No.109033185

Anonymous 06/11/26(Thu)18:22:39 No.109033185

>>109033097
...box?

Anonymous
06/11/26(Thu)18:22:49 No.109033187

Anonymous 06/11/26(Thu)18:22:49 No.109033187

File: file.png (64 KB, 816x588)

64 KB PNG

>>109033139
okay the cuda 12 build works perf is the same

Anonymous
06/11/26(Thu)18:24:42 No.109033195

Anonymous 06/11/26(Thu)18:24:42 No.109033195

>>109032985
Maginum-Cydoms-24B.Q4_K_M

Anonymous
06/11/26(Thu)18:28:24 No.109033211

Anonymous 06/11/26(Thu)18:28:24 No.109033211

>>109032985
gemma 4 31b and glm 4.6 355b if you also have 128gb ram

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.

Janitor applications are now closed. Thanks to all who applied!