/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108159576 & >>108149287►News>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B>(02/15) dots.ocr-1.5 temporarily released: https://hf.co/rednote-hilab/dots.ocr-1.5>(02/15) Ling-2.5-1T released: https://hf.co/inclusionAI/Ling-2.5-1T>(02/14) JoyAI-LLM Flash 48B-A3B released: https://hf.co/jdopensource/JoyAI-LLM-Flash>(02/14) Nemotron Nano 12B v2 VL support merged: https://github.com/ggml-org/llama.cpp/pull/19547►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108159576--Papers:>108159925--Qwen3.5-397B-A17B multimodal model and safety policy frustrations:>108160387 >108160449 >108160474 >108160789 >108161039 >108161110 >108160576 >108160589 >108160615 >108160627 >108160834--Qwen3.5-397B-A17B release and benchmark performance comparisons:>108160792 >108160809 >108160813 >108161009 >108160819 >108160826--Mythic AI's analog compute scalability and efficiency claims under scrutiny:>108164002 >108164265 >108164280 >108164333--Comparing LLM responses to explicit prompts and ethical safeguards:>108161568 >108161583 >108161586 >108161590 >108161598 >108161600 >108161647 >108161712 >108161740 >108162921 >108165334 >108165381 >108165405 >108165431 >108165411 >108165434 >108165529 >108165578 >108165587 >108165589--Qwen 3.5 Plus accurately interprets /pol/ meme symbolism:>108163097 >108163108 >108163147 >108163160 >108163178 >108163187--GPU ordering and prefill strategies for Qwen3.5-397B-A17B:>108162676 >108162812 >108162870 >108162855 >108163161--Qwen 3.5 output similarities to Gemini and adversarial AI threats:>108165309 >108165342 >108165372 >108165394 >108165408--MoE memory efficiency and multi-GPU power constraints:>108163343 >108163369 >108163414 >108163504 >108163528 >108163593 >108163740 >108163750 >108163776--DOTS OCR 1.5 model released on Hugging Face:>108160205 >108161767 >108161800 >108161832 >108161849--GLM-5-GGUF quantization efficiency analysis using perplexity metrics:>108162324 >108162378--Add support for Tiny Aya Models:>108164202 >108164293 >108164256--AI roleplay trends and coding-focused model dominance:>108160831 >108160909 >108161031 >108161194 >108161247 >108161454 >108161483 >108161496 >108161527--Miku (free space):>108159644 >108159657 >108160615 >108160627 >108162391 >108162525 >108162628 >108165778►Recent Highlight Posts from the Previous Thread: >>108159577Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Anyone tried Kimi Linear 48b a3b? Any sampler settings?
Just release new Deepseek come on...
>>108166690It's over. Chinese new year already started.
>>108166690Tomorrow. Pray we get the 200b.
>>108166690Operating as an actual research lab rather than a product company means they feel no obligation to rush releases
Rumor is that Opus 4.6 is actually Sonnet, and that there's no reason for anthropic to release and run the real 4.6 since its likely worse and costs more to run inference. What are your takes on this claim?
>>108166810my take which speaks for the rest of /lmg/ is that dario is a jew and that "claude" and "sonnet" are not names of local models
>>108166827Agreed. Only reason why I bring it up is because I'm interested in how Sonnet supposedly outperforms Opus and if the leading labs are implementing techniques like engram to do so.
>>1081668104.5 was the one that was the rebranded Sonnet. 4.6 is just an evolution of that.The shift from 4.1 to 4.5 (which incidentally was also a bit cheaper than 4.1) made it pretty obvious that it's fundamentally a different model. Opus 4.5 and onward feels much more MoE-y like the Sonnet series unlike Opus pre-4.5 which was either dense or had a stupid amounts of activated parameters.
>>108166864>Opus 4.5 and onward feels much more MoE-y like the Sonnet seriesI have a suspicion that anthropic and co are using some of the same techniques coming from chinese open source labs like engram, mhc etc. I wonder exactly what they're doing and just how ahead they are. For instance the engram conditional memory paper released only last month from deepseek.
>2 tokens per second
>>108166922>mfw
https://introl.com/blog/deepseek-v4-trillion-parameter-coding-model-february-2026>v4 full weights>96gb VRAM + 256gb VRAMUnless they mean at like q3-q2 it seems impossible unless their new architecture changes work miracles for reducing the amount of compute you need.
>>108166986something tells me this entire article is completely fucking bullshit. the 5090 has been out for over a year
>>108166986no way in hell are either of these models dense either. this shit is aislop fanfiction
>>108166998>>108167008Yeah, pretty suspicious. The entire site is probably an AI grift.
>>108167018and what the fuck is this? seems like some sort of trojan.https://deepseek4.org/
if this is real, v4 is releasing tomorrow
>>108167026>>108167037Please sirs do the needful and click on my links. v4 very performance sirs.
Anyone know if there was ever a good successor to Qwen3-Coder 30b A3B? Im looking for something that's still fast & plays nicely with agentic coding extensions like Cline, while still being able to fit in 32Gbs.So far, I've tried Devstral Small 2 & GLM 4.7 Flash. These work okay, but in regards to speed, size & quality of code, Qwen3-Coder is still the gold standard for 32GB workstations IMO.
What's a good model for 64GB of vram? I've gotten bored of glm.
The whale
suck teto toes
>>108167071nemo
>>108167102kys
i am going to qwen 3.5 at q6am i retarded?
>>108167071Qwen 3.5 35B.
I tried to get Minimax M2.5 Q4 to implement an ik_llama.cpp loader into ooba last night. It didn't work. Going to try again later methinks.
>>108167116still nemo
>>108166690next week
>>108167135>local model for anything except cooming
>>108167201you guys pay subscription fees?
>>108167201I've coomed so much with glm chan that I've been neglecting all of the other uses of AI. If you haven't tried it yet, OpenCode and Perplexica are kinda dope.
>>108167201What other use does AI even have? I just don't need anything coded.
>>108167258Translating an untranslated visual novel
just hook nemo into openclaw and have it figure out how to implement dsa into llama.cpp
>>108167258>Coding>Deep Research>Image recognition, generation, and editing>Translation>Therapy and a general partner to talk too about sensitive topics.>Browser use to shitpost on /lmg/
>>108167271EOPs aren't welcome.
>>108167258Teaching yourself conversational nipspeak with a translator to checkYou can also combine the two uses, and it's kind of funny when you keep interrupting the character mid suck to elaborate on a grammar concept
>>108167271Gambs-chama, we all know it just doesn't work
>>108167308>>108167339nyoooooo
oh and local LLM DnD stories have absorbed my life. it was tricky to implement and balance things but I can drop in a combat template code block, fill in all the blanks and the rolls, and have it narrate the fight. it even knows to kill my character and throw a GAME OVER splash screen bit when I die. if you're willing to be impartial and let the dice tell the story it's pretty cool. I can finally rip off someone's arm and beat them to death with it in a game that isn't dwarf fortress. you can also combine this with coom and it's a barrel of laughs
>>108167354Make a rentry of your process.Lots of anons would be interested in replicating that.
>>108167258B-but what about using an LLM to proofread and improve our English before posting on 4chan? It's very useful, right?? I just don't want my text being logged by cloud providers and prefer local only!
Imagine how good qwen would be it they used GLM's training data.New qwen doesn't know about /lmg/ for example.
>>108167308I'm disappointed in Qwen 3.5. It can't even transliterate Japanese correctly.信天翁 is "Shinatan'ō" lmao!Even though the game title is written on the screenshot, Gemini 3 thinks it's from Umineko. Qwen thinks it's from Dies Irae.There is still no LLM that knows Albatross Log.
Is this normal and I'm just a poorfag? Why does it need 16k tokens to do this?Or is that something like their system message?yea, im new
>>108167513rail-soft's shit are obscure and among the most difficult vns to translate so that's not surprising
>>108167560Let me ask my LLM what sort of backwater piece of shit setup and settings you're using. I'll come back to you.
What's a decent model to run with 64gb ram, 12gb vram? Does having a 9800X3D help?
>>108163945>>108162827>>108162210After playing a little more with it, it is indeed usable, but also kind of dumb, I think.And barely fast enough with the settings I was running (6t/s).I'm testing Qwen3-Coder-Next-Q3_K_XL now and I'm surprised by how much better it writes compared to the original 80B MoE.It didn't do thislike thisthiseven once yet, and it jumped straight in into portraying a character exactly how I imagined it, even without any example dialog or the like, just with the initial message and the character card (which I made).I think this one's a keeper.
>>108167562Sure, it's okay if LLMs don't know it directly, but it annoys me that they hallucinate source even though the title is written there. Also, Qwen is making translation mistakes.
>>108167560opencode is famous for using a huge amount of tokens just for the system prompt. you need to write your own agentic harness/code assistant.
>>108167593Oh yeah, and TG is a lot better at 16 t/s, and I can fit 120k (!) context, unquanted, with 2048 batch size, which speeds up PP real nice too.