/lmg/ - a general dedicated to the discussion and development of local language models.Miku Day EditionPrevious threads: >>108321632 & >>108316141►News>(03/04) Yuan3.0 Ultra 1010B-A68.8B released: https://hf.co/YuanLabAI/Yuan3.0-Ultra>(03/03) WizardLM publishes "Beyond Length Scaling" GRM paper: https://hf.co/papers/2603.01571>(03/03) Junyang Lin leaves Qwen: https://xcancel.com/JustinLin610/status/2028865835373359513>(03/02) Step 3.5 Flash Base, Midtrain, and SteptronOSS released: https://xcancel.com/StepFun_ai/status/2028551435290554450►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108321632--A16 vs 3090 performance benchmarks with llama.cpp tensor parallelism:>108322578 >108322596 >108322610 >108322640 >108322679 >108322861 >108323005 >108323564 >108323613 >108323688 >108323721 >108323754 >108323790 >108323811--The Synthetic Data Playbook: Generating Trillions of the Finest Tokens:>108323497 >108323508 >108323519 >108323530 >108323831 >108323962 >108323971 >108323557 >108323551 >108323565 >108323599 >108323872 >108323884--Qwen3.5 27B dense model matches 122B MoE performance in benchmarks:>108326810 >108326837 >108326876 >108326878 >108326893--Debating MoE's speed-memory tradeoffs vs dense models:>108326854 >108326888 >108326931 >108326959 >108327002 >108327041 >108327054--Optimizing 256GB RAM setup for large model inference:>108321871 >108321876 >108321927 >108321884 >108321984--Qwen 3.5 quantization performance differences between dense and MoE models:>108323521 >108325440--TTS options for SillyTavern voice output:>108321822 >108322058--Claude Opus 4.6 benchmark contamination claims spark skepticism:>108322721 >108322782 >108322804 >108322819 >108322920 >108322865--Debate over AI emotional support ethics after poetic suicide response:>108321732 >108321749 >108321837 >108321948 >108323070 >108322594 >108322624 >108322916 >108322948 >108322967 >108323139 >108323151 >108323199 >108323274 >108323406 >108324518--Miku (free space):>108322482 >108323847 >108323976 >108326678 >108326684 >108326942 >108327209 >108327668►Recent Highlight Posts from the Previous Thread: >>108321820Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
based thread
Miku is BBC coded
>>108328174>>108328170Happy Miku Day (3/9 = SanKyuu = MiKu)https://sonicwire.com/product/virtualsinger/special/miku-v6
Mikulove
>>108328174V4 on March 90th
>>108328183we're so back
Haven't been on here in a while. Are the new qwen3.5-30b and glm4.7 flash models worth a shit or just a tiny 5% imrpovement
>>108328185Why do you both love it so much?
dead hobby
>>108328376the new qwen is ok for boring things, and worse for sexo
This one is going to be fun.https://github.com/ggml-org/llama.cpp/pull/20266
>>108328376qwen 35b is a big jump
>>108328403>Files changed: 102jesus christ
>>108328423Somehow he missed a few. I'm sure he'll rectify it immediately.
would it be possible to run a local model on my phone with 12gb of ram?
>>108328403>PKUGonna give them benefit of the doubt
>>108328452yes in theory, the new qwen 3.5 small models shouldhttps://x.com/Alibaba_Qwen/status/2028460046510965160i'm not sure if there's actually an app on the app store that does it yet though
>>108328452I think you can compile llama.cpp on termux.
which local model is best for divine revelation and interpreting the word of god?
>>108328500https://huggingface.co/PleIAs/Monadhttps://huggingface.co/PleIAs/Baguettotron
>>108328402they are literally all bad for gooningtext based cooming is for losers ngltbqhwyfam
>>108328170It reads Sankyu retard
DeepSeek V2->V3 felt like a really special moment because I could/do trust V3 to move my projects along reliably and I, personally, don't have that same feeling with V4 so I guess I'm a kinda disappointed.It's definitely a stronger coder / problem solver. It is more likely to fix a problem, and fix it quickly, compared to V3 (i have no doubt about this) but it just feels like its more likely to break my code without realizing. There's a laziness about it - even when i have explicit end criteria for a turn, it will end prematurely. When it claims success, i don't believe it. Often i challenge it and its like, 'i should have lead with...' or 'what i said was partially true... (no, mate, it was totally false). I dont trust it.I certainly am surprised to see the model being glazed over as much as it has been. I, for one, was hoping for a bit more.
is it safe to pull now? did they fix all piotr's autoparser bugs?
>>108328606V4 isn't out bwo
>>108328618>fix all piotr's autoparser bugskekagent people deserve him
>>108328622I just want to use MCP to do really relevant web searches like 'are 4chan memes relevant in 2026'
>>108328619dumbass
>>108328635mcp is an outdated concept unc
>>108328638agent swarms are cringe
Whats the best gooning local llm nowQwen 3.5 heretic is so disappointing
>>108328653general move towards curated pretraining means there is nothing much to reveal with uncensoring
>>108328665And that's good thing!
>>108328606Are you talking about V4 full or V4 lite? I thought lite was the only one they were demoing?
>>108328678nta but I heard they are demoing it in some select circles, mostly people who work in ml
>>108328665Damn so it's better to go back to 2025 models like stheno or tiger gemma?