/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>109101986 & >>109098000►News>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://swe-rebench.comAgentic Coding: https://deepswe.datacurve.aiContext Length: https://github.com/RecapAnon/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>109101986--Recommending models for RTX 4050 and discussing Gemma 4 depurpling:>109104682 >109104688 >109104707 >109104756 >109104803 >109104834 >109104929 >109104838 >109104849 >109104871 >109104809 >109104856 >109104867--Feasibility and bottlenecks of pooling VRAM via RPC over gigabit networks:>109106828 >109106858 >109106872 >109106891--Using 2.5 mproj to give K2-Thinking vision capabilities:>109103511 >109104588 >109105603--DeepSeek-V4-Flash-Base GGUF reports and architecture naming issues:>109104143 >109104818 >109104965--MTP speed benchmarks for Gemma-4 using Vulkan on RX6700XT:>109102307 >109103451 >109107003 >109108056--Optimizing settings for Gemma-4 models on low-VRAM hardware:>109102301 >109102361 >109102385 >109102398 >109102405 >109102402 >109102429 >109102434--Gemma 4's tendency toward robotic prose with long system prompts:>109103211 >109103223 >109103241 >109103258 >109103266 >109103382 >109103689--Desire for smaller zai models and Gemma-4-12B performance:>109106452 >109106505 >109106547 >109106569 >109106628 >109106654 >109106674 >109106723 >109107585--Comparing Fable 5 to OSS and discussing Anthropic's ID verification:>109103890 >109103940 >109104006 >109103944 >109104032 >109104082 >109104391 >109106627 >109105684--Discussing high-quality MoE models and MoE vs dense architectures:>109104895 >109104901 >109104918 >109104962 >109105038 >109105053 >109105093 >109104992 >109105095 >109105118 >109105388 >109105428 >109106706 >109106766--Viability of running llama.cpp across mixed Metal and ROCm devices:>109106205 >109106300 >109106340 >109106329 >109106347--Discussing a neural network that converts images into playable games:>109107514 >109107570--Logs:>109103511 >109104424 >109104803 >109104809 >109106627--Miku (free space):>109103689►Recent Highlight Posts from the Previous Thread: >>109101988Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
gemmaballz
>no DSv4 yet>no M3 sparse attention yet>no one looking at PRsmore like LAMEa.cpp amirite
so what comes after Mythos?a new super model capable of what exactly?and after that? if this is an eternal race, what are the future capabilities? programming languages are limited and all the security holes should get filled at some point.then what does the model do? and how does it improve? to do WHAT? invent new programming languages so it can hack it and then it can create the shields for whatever it invented?i don't fully get it
>>109108395>so what comes after Mythos?Thread
>>109108395>so what comes after Mythos?legends
anyone using gemma31 for translations, especially long ones (5000-10000+)is it good?
anons, when some of you say you're using multiple agents, do you mean : - sequentially, basically every iteration checking the one before for anything wrong- at the same time?
Relative noob here, just perfected my SillyTavern frontend.What CLI do you guys use for your Gemmy? Gemini is telling me to use Aider.
>>109108422pi.dev, then whatever plugins you like. only one I've been using is pi-fff with the override for better find and grep
>>109108388good thing forks exist and you can literally use them right now before waiting months for those fags to implement it
>>109108422opencode is good
70b dense
>>109108472>2 t/s is slow>waits 5 hours for a (you)
>>109108388Using the PR, I'm liking how DS v4 lite writes its in-character thinking, and story completion but I can't stand how I need to wait 10 seconds for each story continuation to begin in mikupad even without changing any tokens in the prompt after the previous generation. Sucks to be poor running GPU+CPU.
>>109108346>currybookcringe
>>109108531So far I'm liking it at high temp for rp/stories and how much more efficient and nicer the thinking is compared to some other models.
loli feet
405b dense
>>109108449But anon, I use AMD.
>Gemma-4-125B-IT>Still can't do tool calls sucessfully
Who with a big rig is using a q4+ quant of glm5.2? Is it worthwhile vs k2.7-code for code, planning and logic work?I'm running low on disk space to be quanting yet another model if it isn't a pretty significant jump.
I am requesting the Ace song guy to train on Tupac, we are long overdue for a Hit Em Up part 2 and general Tupac revival.
>>109106403A Russian Argentine femboy, a South African ketamine billionaire and a above average intelligence Chinese manBeautiful
>>109108433>>109108449>>109108422How do I choose which one is best? I'm a VRAMlet, trying to set up something usable in a laptop.
>>109108991Kobold + Sillytavern are plug and play solutions
I have officially tired of Gemma slop and gone back to Mistral 3.2 tunes. It's dumber, but the slop isn't nearly as grating and repetitive.
I have an 31GB's of Vram and AMD card. As of june 2026 what is currently the best local model to download for pure roleplay?
>>109109010I'm already running bash scripts with compiled llama.cpp, and I already perfected my character cards and memory management with SillyTavern + STLO + STMB.I want to move onto CLI agentic stuff after seeing how nice those are.
>>109109073>STLO + STMB.what
an update on my ewaste epyc rome 256gb ddr4-3200: swapping out the 2060 super for a power-limited 3090 has more than doubled my t/s on minimax m3. It strains the vram and sysram to the hilt, but 7t/s is speedy enough that rp is satisfying. 3t/s was WAY too slow (closer to 1t/s at really high context).tl;dr If you can piece together an 8 channel ddr4-3200 rig with a used 3090 you'll have a good time on a budget.Consumer rigs limited to a couple channels and 128GB or less are just a dead end for running good models.PS: wiring arbitrary llama.cpp compiles into ooba is actually really easy now that they've abandoned that llama-cpp-python abomination.
>>109109019Got any comparisons of the two to show what you mean?
The new "Mythos 2" training run I had leaked a couple of weeks ago has been completed. There are internal discussions of calling the model "Claude Opus 5 extra big" to prevent US blocking the model. Two different factions fighting about it. Faction 1 thinks it will allow them to release it and evade the current restrictions since the restriction was arbitrary anyway. Faction 2 thinks it will just taint the "Opus" line and might result in all Opus models being banned as well as giving opportunity to the government to act even more bad faith.Model is supposedly "extra jagged" with barely any improvement in most use cases but extremely advanced in AI research to the point where some of the more junior AI researchers at Anthropic feel threatened by it and are starting to try and sabotage the deployment of it to try and safeguard their position.
For the anon that was wondering about Japanese (or other languages) embedded in an otherwise purely English conversation: m3 is doing all thinking in English with all Dialog in Japanese after just a couple of turns of speaking in Japanese with nothing directing it to do that in the (quite lengthy) character card except a Japanese name.