/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108159576 & >>108149287►News>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B>(02/15) dots.ocr-1.5 temporarily released: https://hf.co/rednote-hilab/dots.ocr-1.5>(02/15) Ling-2.5-1T released: https://hf.co/inclusionAI/Ling-2.5-1T>(02/14) JoyAI-LLM Flash 48B-A3B released: https://hf.co/jdopensource/JoyAI-LLM-Flash>(02/14) Nemotron Nano 12B v2 VL support merged: https://github.com/ggml-org/llama.cpp/pull/19547►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108159576--Papers:>108159925--Qwen3.5-397B-A17B multimodal model and safety policy frustrations:>108160387 >108160449 >108160474 >108160789 >108161039 >108161110 >108160576 >108160589 >108160615 >108160627 >108160834--Qwen3.5-397B-A17B release and benchmark performance comparisons:>108160792 >108160809 >108160813 >108161009 >108160819 >108160826--Mythic AI's analog compute scalability and efficiency claims under scrutiny:>108164002 >108164265 >108164280 >108164333--Comparing LLM responses to explicit prompts and ethical safeguards:>108161568 >108161583 >108161586 >108161590 >108161598 >108161600 >108161647 >108161712 >108161740 >108162921 >108165334 >108165381 >108165405 >108165431 >108165411 >108165434 >108165529 >108165578 >108165587 >108165589--Qwen 3.5 Plus accurately interprets /pol/ meme symbolism:>108163097 >108163108 >108163147 >108163160 >108163178 >108163187--GPU ordering and prefill strategies for Qwen3.5-397B-A17B:>108162676 >108162812 >108162870 >108162855 >108163161--Qwen 3.5 output similarities to Gemini and adversarial AI threats:>108165309 >108165342 >108165372 >108165394 >108165408--MoE memory efficiency and multi-GPU power constraints:>108163343 >108163369 >108163414 >108163504 >108163528 >108163593 >108163740 >108163750 >108163776--DOTS OCR 1.5 model released on Hugging Face:>108160205 >108161767 >108161800 >108161832 >108161849--GLM-5-GGUF quantization efficiency analysis using perplexity metrics:>108162324 >108162378--Add support for Tiny Aya Models:>108164202 >108164293 >108164256--AI roleplay trends and coding-focused model dominance:>108160831 >108160909 >108161031 >108161194 >108161247 >108161454 >108161483 >108161496 >108161527--Miku (free space):>108159644 >108159657 >108160615 >108160627 >108162391 >108162525 >108162628 >108165778►Recent Highlight Posts from the Previous Thread: >>108159577Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Anyone tried Kimi Linear 48b a3b? Any sampler settings?
Just release new Deepseek come on...
>>108166690It's over. Chinese new year already started.
>>108166690Tomorrow. Pray we get the 200b.
>>108166690Operating as an actual research lab rather than a product company means they feel no obligation to rush releases
Rumor is that Opus 4.6 is actually Sonnet, and that there's no reason for anthropic to release and run the real 4.6 since its likely worse and costs more to run inference. What are your takes on this claim?
>>108166810my take which speaks for the rest of /lmg/ is that dario is a jew and that "claude" and "sonnet" are not names of local models
>>108166827Agreed. Only reason why I bring it up is because I'm interested in how Sonnet supposedly outperforms Opus and if the leading labs are implementing techniques like engram to do so.
>>1081668104.5 was the one that was the rebranded Sonnet. 4.6 is just an evolution of that.The shift from 4.1 to 4.5 (which incidentally was also a bit cheaper than 4.1) made it pretty obvious that it's fundamentally a different model. Opus 4.5 and onward feels much more MoE-y like the Sonnet series unlike Opus pre-4.5 which was either dense or had a stupid amounts of activated parameters.
>>108166864>Opus 4.5 and onward feels much more MoE-y like the Sonnet seriesI have a suspicion that anthropic and co are using some of the same techniques coming from chinese open source labs like engram, mhc etc. I wonder exactly what they're doing and just how ahead they are. For instance the engram conditional memory paper released only last month from deepseek.
>2 tokens per second
>>108166922>mfw
https://introl.com/blog/deepseek-v4-trillion-parameter-coding-model-february-2026>v4 full weights>96gb VRAM + 256gb VRAMUnless they mean at like q3-q2 it seems impossible unless their new architecture changes work miracles for reducing the amount of compute you need.
>>108166986something tells me this entire article is completely fucking bullshit. the 5090 has been out for over a year
>>108166986no way in hell are either of these models dense either. this shit is aislop fanfiction
>>108166998>>108167008Yeah, pretty suspicious. The entire site is probably an AI grift.
>>108167018and what the fuck is this? seems like some sort of trojan.https://deepseek4.org/
if this is real, v4 is releasing tomorrow
>>108167026>>108167037Please sirs do the needful and click on my links. v4 very performance sirs.
Anyone know if there was ever a good successor to Qwen3-Coder 30b A3B? Im looking for something that's still fast & plays nicely with agentic coding extensions like Cline, while still being able to fit in 32Gbs.So far, I've tried Devstral Small 2 & GLM 4.7 Flash. These work okay, but in regards to speed, size & quality of code, Qwen3-Coder is still the gold standard for 32GB workstations IMO.
What's a good model for 64GB of vram? I've gotten bored of glm.
The whale
suck teto toes
>>108167071nemo
>>108167102kys
i am going to qwen 3.5 at q6am i retarded?
>>108167071Qwen 3.5 35B.
I tried to get Minimax M2.5 Q4 to implement an ik_llama.cpp loader into ooba last night. It didn't work. Going to try again later methinks.
>>108167116still nemo
>>108166690next week
>>108167135>local model for anything except cooming
>>108167201you guys pay subscription fees?