/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 06/21/26(Sun)23:03:14 No.109108346

File: perfecional.png (1.06 MB, 768x1024)

/lmg/ - Local Models General Anonymous 06/21/26(Sun)23:03:14 No.109108346

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109101986 & >>109098000

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/21/26(Sun)23:04:47 No.109108358

Anonymous 06/21/26(Sun)23:04:47 No.109108358

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

►Recent Highlights from the Previous Thread: >>109101986

--Recommending models for RTX 4050 and discussing Gemma 4 depurpling:
>109104682 >109104688 >109104707 >109104756 >109104803 >109104834 >109104929 >109104838 >109104849 >109104871 >109104809 >109104856 >109104867
--Feasibility and bottlenecks of pooling VRAM via RPC over gigabit networks:
>109106828 >109106858 >109106872 >109106891
--Using 2.5 mproj to give K2-Thinking vision capabilities:
>109103511 >109104588 >109105603
--DeepSeek-V4-Flash-Base GGUF reports and architecture naming issues:
>109104143 >109104818 >109104965
--MTP speed benchmarks for Gemma-4 using Vulkan on RX6700XT:
>109102307 >109103451 >109107003 >109108056
--Optimizing settings for Gemma-4 models on low-VRAM hardware:
>109102301 >109102361 >109102385 >109102398 >109102405 >109102402 >109102429 >109102434
--Gemma 4's tendency toward robotic prose with long system prompts:
>109103211 >109103223 >109103241 >109103258 >109103266 >109103382 >109103689
--Desire for smaller zai models and Gemma-4-12B performance:
>109106452 >109106505 >109106547 >109106569 >109106628 >109106654 >109106674 >109106723 >109107585
--Comparing Fable 5 to OSS and discussing Anthropic's ID verification:
>109103890 >109103940 >109104006 >109103944 >109104032 >109104082 >109104391 >109106627 >109105684
--Discussing high-quality MoE models and MoE vs dense architectures:
>109104895 >109104901 >109104918 >109104962 >109105038 >109105053 >109105093 >109104992 >109105095 >109105118 >109105388 >109105428 >109106706 >109106766
--Viability of running llama.cpp across mixed Metal and ROCm devices:
>109106205 >109106300 >109106340 >109106329 >109106347
--Discussing a neural network that converts images into playable games:
>109107514 >109107570
--Logs:
>109103511 >109104424 >109104803 >109104809 >109106627
--Miku (free space):
>109103689

►Recent Highlight Posts from the Previous Thread: >>109101988

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/21/26(Sun)23:10:14 No.109108382

Anonymous 06/21/26(Sun)23:10:14 No.109108382

gemmaballz

Anonymous
06/21/26(Sun)23:11:23 No.109108388

Anonymous 06/21/26(Sun)23:11:23 No.109108388

>no DSv4 yet
>no M3 sparse attention yet
>no one looking at PRs
more like LAMEa.cpp amirite

Anonymous
06/21/26(Sun)23:12:33 No.109108395

Anonymous 06/21/26(Sun)23:12:33 No.109108395

so what comes after Mythos?
a new super model capable of what exactly?
and after that? if this is an eternal race, what are the future capabilities? programming languages are limited and all the security holes should get filled at some point.
then what does the model do? and how does it improve? to do WHAT? invent new programming languages so it can hack it and then it can create the shields for whatever it invented?
i don't fully get it

Anonymous
06/21/26(Sun)23:16:38 No.109108403

Anonymous 06/21/26(Sun)23:16:38 No.109108403

>>109108395
>so what comes after Mythos?
Thread

Anonymous
06/21/26(Sun)23:17:16 No.109108405

Anonymous 06/21/26(Sun)23:17:16 No.109108405

>>109108395
>so what comes after Mythos?
legends

Anonymous
06/21/26(Sun)23:19:24 No.109108414

Anonymous 06/21/26(Sun)23:19:24 No.109108414

anyone using gemma31 for translations, especially long ones (5000-10000+)
is it good?

Anonymous
06/21/26(Sun)23:19:54 No.109108419

Anonymous 06/21/26(Sun)23:19:54 No.109108419

anons, when some of you say you're using multiple agents, do you mean :
- sequentially, basically every iteration checking the one before for anything wrong
- at the same time
?

Anonymous
06/21/26(Sun)23:20:22 No.109108422

Anonymous 06/21/26(Sun)23:20:22 No.109108422

Relative noob here, just perfected my SillyTavern frontend.

What CLI do you guys use for your Gemmy? Gemini is telling me to use Aider.

Anonymous
06/21/26(Sun)23:22:49 No.109108433

Anonymous 06/21/26(Sun)23:22:49 No.109108433

>>109108422
pi.dev, then whatever plugins you like. only one I've been using is pi-fff with the override for better find and grep

Anonymous
06/21/26(Sun)23:28:21 No.109108449

Anonymous 06/21/26(Sun)23:28:21 No.109108449

>>109108388
good thing forks exist and you can literally use them right now before waiting months for those fags to implement it

Anonymous
06/21/26(Sun)23:33:20 No.109108462

Anonymous 06/21/26(Sun)23:33:20 No.109108462

>>109108422
opencode is good

Anonymous
06/21/26(Sun)23:35:40 No.109108472

Anonymous 06/21/26(Sun)23:35:40 No.109108472

70b dense

Anonymous
06/21/26(Sun)23:39:04 No.109108484

Anonymous 06/21/26(Sun)23:39:04 No.109108484

>>109108472
>2 t/s is slow
>waits 5 hours for a (you)

Anonymous
06/21/26(Sun)23:55:09 No.109108531

Anonymous 06/21/26(Sun)23:55:09 No.109108531

File: dsv4lite logs teto.png (118 KB, 1578x474)

118 KB PNG

>>109108388
Using the PR, I'm liking how DS v4 lite writes its in-character thinking, and story completion but I can't stand how I need to wait 10 seconds for each story continuation to begin in mikupad even without changing any tokens in the prompt after the previous generation. Sucks to be poor running GPU+CPU.

Anonymous
06/22/26(Mon)00:04:13 No.109108570

Anonymous 06/22/26(Mon)00:04:13 No.109108570

>>109108346
>currybook
cringe

Anonymous
06/22/26(Mon)00:35:04 No.109108678

Anonymous 06/22/26(Mon)00:35:04 No.109108678

>>109108531
So far I'm liking it at high temp for rp/stories and how much more efficient and nicer the thinking is compared to some other models.

Anonymous
06/22/26(Mon)00:35:49 No.109108680

Anonymous 06/22/26(Mon)00:35:49 No.109108680

loli feet

Anonymous
06/22/26(Mon)00:52:48 No.109108740

Anonymous 06/22/26(Mon)00:52:48 No.109108740

405b dense

Anonymous
06/22/26(Mon)01:02:13 No.109108762

Anonymous 06/22/26(Mon)01:02:13 No.109108762

>>109108449
But anon, I use AMD.

Anonymous
06/22/26(Mon)01:03:50 No.109108766

Anonymous 06/22/26(Mon)01:03:50 No.109108766

>Gemma-4-125B-IT
>Still can't do tool calls sucessfully

Anonymous
06/22/26(Mon)01:05:26 No.109108773

Anonymous 06/22/26(Mon)01:05:26 No.109108773

Who with a big rig is using a q4+ quant of glm5.2? Is it worthwhile vs k2.7-code for code, planning and logic work?
I'm running low on disk space to be quanting yet another model if it isn't a pretty significant jump.

Anonymous
06/22/26(Mon)01:28:06 No.109108866

Anonymous 06/22/26(Mon)01:28:06 No.109108866

I am requesting the Ace song guy to train on Tupac, we are long overdue for a Hit Em Up part 2 and general Tupac revival.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.

Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!