/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 06/21/26(Sun)23:03:14 No.109108346

File: perfecional.png (1.06 MB, 768x1024)

/lmg/ - Local Models General Anonymous 06/21/26(Sun)23:03:14 No.109108346

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109101986 & >>109098000

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/21/26(Sun)23:04:47 No.109108358

Anonymous 06/21/26(Sun)23:04:47 No.109108358

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

►Recent Highlights from the Previous Thread: >>109101986

--Recommending models for RTX 4050 and discussing Gemma 4 depurpling:
>109104682 >109104688 >109104707 >109104756 >109104803 >109104834 >109104929 >109104838 >109104849 >109104871 >109104809 >109104856 >109104867
--Feasibility and bottlenecks of pooling VRAM via RPC over gigabit networks:
>109106828 >109106858 >109106872 >109106891
--Using 2.5 mproj to give K2-Thinking vision capabilities:
>109103511 >109104588 >109105603
--DeepSeek-V4-Flash-Base GGUF reports and architecture naming issues:
>109104143 >109104818 >109104965
--MTP speed benchmarks for Gemma-4 using Vulkan on RX6700XT:
>109102307 >109103451 >109107003 >109108056
--Optimizing settings for Gemma-4 models on low-VRAM hardware:
>109102301 >109102361 >109102385 >109102398 >109102405 >109102402 >109102429 >109102434
--Gemma 4's tendency toward robotic prose with long system prompts:
>109103211 >109103223 >109103241 >109103258 >109103266 >109103382 >109103689
--Desire for smaller zai models and Gemma-4-12B performance:
>109106452 >109106505 >109106547 >109106569 >109106628 >109106654 >109106674 >109106723 >109107585
--Comparing Fable 5 to OSS and discussing Anthropic's ID verification:
>109103890 >109103940 >109104006 >109103944 >109104032 >109104082 >109104391 >109106627 >109105684
--Discussing high-quality MoE models and MoE vs dense architectures:
>109104895 >109104901 >109104918 >109104962 >109105038 >109105053 >109105093 >109104992 >109105095 >109105118 >109105388 >109105428 >109106706 >109106766
--Viability of running llama.cpp across mixed Metal and ROCm devices:
>109106205 >109106300 >109106340 >109106329 >109106347
--Discussing a neural network that converts images into playable games:
>109107514 >109107570
--Logs:
>109103511 >109104424 >109104803 >109104809 >109106627
--Miku (free space):
>109103689

►Recent Highlight Posts from the Previous Thread: >>109101988

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/21/26(Sun)23:10:14 No.109108382

Anonymous 06/21/26(Sun)23:10:14 No.109108382

gemmaballz

Anonymous
06/21/26(Sun)23:11:23 No.109108388

Anonymous 06/21/26(Sun)23:11:23 No.109108388

>no DSv4 yet
>no M3 sparse attention yet
>no one looking at PRs
more like LAMEa.cpp amirite

Anonymous
06/21/26(Sun)23:12:33 No.109108395

Anonymous 06/21/26(Sun)23:12:33 No.109108395

so what comes after Mythos?
a new super model capable of what exactly?
and after that? if this is an eternal race, what are the future capabilities? programming languages are limited and all the security holes should get filled at some point.
then what does the model do? and how does it improve? to do WHAT? invent new programming languages so it can hack it and then it can create the shields for whatever it invented?
i don't fully get it

Anonymous
06/21/26(Sun)23:16:38 No.109108403

Anonymous 06/21/26(Sun)23:16:38 No.109108403

>>109108395
>so what comes after Mythos?
Thread

Anonymous
06/21/26(Sun)23:17:16 No.109108405

Anonymous 06/21/26(Sun)23:17:16 No.109108405

>>109108395
>so what comes after Mythos?
legends

Anonymous
06/21/26(Sun)23:19:24 No.109108414

Anonymous 06/21/26(Sun)23:19:24 No.109108414

anyone using gemma31 for translations, especially long ones (5000-10000+)
is it good?

Anonymous
06/21/26(Sun)23:19:54 No.109108419

Anonymous 06/21/26(Sun)23:19:54 No.109108419

anons, when some of you say you're using multiple agents, do you mean :
- sequentially, basically every iteration checking the one before for anything wrong
- at the same time
?

Anonymous
06/21/26(Sun)23:20:22 No.109108422

Anonymous 06/21/26(Sun)23:20:22 No.109108422

Relative noob here, just perfected my SillyTavern frontend.

What CLI do you guys use for your Gemmy? Gemini is telling me to use Aider.

Anonymous
06/21/26(Sun)23:22:49 No.109108433

Anonymous 06/21/26(Sun)23:22:49 No.109108433

>>109108422
pi.dev, then whatever plugins you like. only one I've been using is pi-fff with the override for better find and grep

Anonymous
06/21/26(Sun)23:28:21 No.109108449

Anonymous 06/21/26(Sun)23:28:21 No.109108449

>>109108388
good thing forks exist and you can literally use them right now before waiting months for those fags to implement it

Anonymous
06/21/26(Sun)23:33:20 No.109108462

Anonymous 06/21/26(Sun)23:33:20 No.109108462

>>109108422
opencode is good

Anonymous
06/21/26(Sun)23:35:40 No.109108472

Anonymous 06/21/26(Sun)23:35:40 No.109108472

70b dense

Anonymous
06/21/26(Sun)23:39:04 No.109108484

Anonymous 06/21/26(Sun)23:39:04 No.109108484

>>109108472
>2 t/s is slow
>waits 5 hours for a (you)

Anonymous
06/21/26(Sun)23:55:09 No.109108531

Anonymous 06/21/26(Sun)23:55:09 No.109108531

File: dsv4lite logs teto.png (118 KB, 1578x474)

118 KB PNG

>>109108388
Using the PR, I'm liking how DS v4 lite writes its in-character thinking, and story completion but I can't stand how I need to wait 10 seconds for each story continuation to begin in mikupad even without changing any tokens in the prompt after the previous generation. Sucks to be poor running GPU+CPU.

Anonymous
06/22/26(Mon)00:04:13 No.109108570

Anonymous 06/22/26(Mon)00:04:13 No.109108570

>>109108346
>currybook
cringe

Anonymous
06/22/26(Mon)00:35:04 No.109108678

Anonymous 06/22/26(Mon)00:35:04 No.109108678

>>109108531
So far I'm liking it at high temp for rp/stories and how much more efficient and nicer the thinking is compared to some other models.

Anonymous
06/22/26(Mon)00:35:49 No.109108680

Anonymous 06/22/26(Mon)00:35:49 No.109108680

loli feet

Anonymous
06/22/26(Mon)00:52:48 No.109108740

Anonymous 06/22/26(Mon)00:52:48 No.109108740

405b dense

Anonymous
06/22/26(Mon)01:02:13 No.109108762

Anonymous 06/22/26(Mon)01:02:13 No.109108762

>>109108449
But anon, I use AMD.

Anonymous
06/22/26(Mon)01:03:50 No.109108766

Anonymous 06/22/26(Mon)01:03:50 No.109108766

>Gemma-4-125B-IT
>Still can't do tool calls sucessfully

Anonymous
06/22/26(Mon)01:05:26 No.109108773

Anonymous 06/22/26(Mon)01:05:26 No.109108773

Who with a big rig is using a q4+ quant of glm5.2? Is it worthwhile vs k2.7-code for code, planning and logic work?
I'm running low on disk space to be quanting yet another model if it isn't a pretty significant jump.

Anonymous
06/22/26(Mon)01:28:06 No.109108866

Anonymous 06/22/26(Mon)01:28:06 No.109108866

I am requesting the Ace song guy to train on Tupac, we are long overdue for a Hit Em Up part 2 and general Tupac revival.

Anonymous
06/22/26(Mon)01:48:21 No.109108937

Anonymous 06/22/26(Mon)01:48:21 No.109108937

>>109106403
A Russian Argentine femboy, a South African ketamine billionaire and a above average intelligence Chinese man

Beautiful

Anonymous
06/22/26(Mon)02:02:33 No.109108991

Anonymous 06/22/26(Mon)02:02:33 No.109108991

>>109108433
>>109108449
>>109108422

How do I choose which one is best? I'm a VRAMlet, trying to set up something usable in a laptop.

Anonymous
06/22/26(Mon)02:09:10 No.109109010

Anonymous 06/22/26(Mon)02:09:10 No.109109010

>>109108991
Kobold + Sillytavern are plug and play solutions

Anonymous
06/22/26(Mon)02:11:28 No.109109019

Anonymous 06/22/26(Mon)02:11:28 No.109109019

File: 1753353105402840.jpg (144 KB, 544x776)

144 KB JPG

I have officially tired of Gemma slop and gone back to Mistral 3.2 tunes. It's dumber, but the slop isn't nearly as grating and repetitive.

Anonymous
06/22/26(Mon)02:12:41 No.109109024

Anonymous 06/22/26(Mon)02:12:41 No.109109024

I have an 31GB's of Vram and AMD card. As of june 2026 what is currently the best local model to download for pure roleplay?

Anonymous
06/22/26(Mon)02:41:05 No.109109073

Anonymous 06/22/26(Mon)02:41:05 No.109109073

>>109109010
I'm already running bash scripts with compiled llama.cpp, and I already perfected my character cards and memory management with SillyTavern + STLO + STMB.

I want to move onto CLI agentic stuff after seeing how nice those are.

Anonymous
06/22/26(Mon)02:45:30 No.109109092

Anonymous 06/22/26(Mon)02:45:30 No.109109092

>>109109073
>STLO + STMB.
what

Anonymous
06/22/26(Mon)02:46:33 No.109109096

Anonymous 06/22/26(Mon)02:46:33 No.109109096

an update on my ewaste epyc rome 256gb ddr4-3200: swapping out the 2060 super for a power-limited 3090 has more than doubled my t/s on minimax m3.
It strains the vram and sysram to the hilt, but 7t/s is speedy enough that rp is satisfying. 3t/s was WAY too slow (closer to 1t/s at really high context).
tl;dr If you can piece together an 8 channel ddr4-3200 rig with a used 3090 you'll have a good time on a budget.
Consumer rigs limited to a couple channels and 128GB or less are just a dead end for running good models.
PS: wiring arbitrary llama.cpp compiles into ooba is actually really easy now that they've abandoned that llama-cpp-python abomination.

Anonymous
06/22/26(Mon)02:51:35 No.109109112

Anonymous 06/22/26(Mon)02:51:35 No.109109112

>>109109019
Got any comparisons of the two to show what you mean?

Anonymous
06/22/26(Mon)02:54:35 No.109109124

Anonymous 06/22/26(Mon)02:54:35 No.109109124

The new "Mythos 2" training run I had leaked a couple of weeks ago has been completed. There are internal discussions of calling the model "Claude Opus 5 extra big" to prevent US blocking the model. Two different factions fighting about it. Faction 1 thinks it will allow them to release it and evade the current restrictions since the restriction was arbitrary anyway. Faction 2 thinks it will just taint the "Opus" line and might result in all Opus models being banned as well as giving opportunity to the government to act even more bad faith.

Model is supposedly "extra jagged" with barely any improvement in most use cases but extremely advanced in AI research to the point where some of the more junior AI researchers at Anthropic feel threatened by it and are starting to try and sabotage the deployment of it to try and safeguard their position.

Anonymous
06/22/26(Mon)02:58:27 No.109109134

Anonymous 06/22/26(Mon)02:58:27 No.109109134

For the anon that was wondering about Japanese (or other languages) embedded in an otherwise purely English conversation: m3 is doing all thinking in English with all Dialog in Japanese after just a couple of turns of speaking in Japanese with nothing directing it to do that in the (quite lengthy) character card except a Japanese name.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.

Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!