/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 02/16/26(Mon)19:21:19 No.108166576

File: 499acf00c9daf1379af68ef56(...).jpg (336 KB, 1536x2048)

/lmg/ - Local Models General Anonymous 02/16/26(Mon)19:21:19 No.108166576

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108159576 & >>108149287

►News
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/15) dots.ocr-1.5 temporarily released: https://hf.co/rednote-hilab/dots.ocr-1.5
>(02/15) Ling-2.5-1T released: https://hf.co/inclusionAI/Ling-2.5-1T
>(02/14) JoyAI-LLM Flash 48B-A3B released: https://hf.co/jdopensource/JoyAI-LLM-Flash
>(02/14) Nemotron Nano 12B v2 VL support merged: https://github.com/ggml-org/llama.cpp/pull/19547

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/16/26(Mon)19:21:39 No.108166579

Anonymous 02/16/26(Mon)19:21:39 No.108166579

File: catto.jpg (160 KB, 1120x1120)

160 KB JPG

►Recent Highlights from the Previous Thread: >>108159576

--Papers:
>108159925
--Qwen3.5-397B-A17B multimodal model and safety policy frustrations:
>108160387 >108160449 >108160474 >108160789 >108161039 >108161110 >108160576 >108160589 >108160615 >108160627 >108160834
--Qwen3.5-397B-A17B release and benchmark performance comparisons:
>108160792 >108160809 >108160813 >108161009 >108160819 >108160826
--Mythic AI's analog compute scalability and efficiency claims under scrutiny:
>108164002 >108164265 >108164280 >108164333
--Comparing LLM responses to explicit prompts and ethical safeguards:
>108161568 >108161583 >108161586 >108161590 >108161598 >108161600 >108161647 >108161712 >108161740 >108162921 >108165334 >108165381 >108165405 >108165431 >108165411 >108165434 >108165529 >108165578 >108165587 >108165589
--Qwen 3.5 Plus accurately interprets /pol/ meme symbolism:
>108163097 >108163108 >108163147 >108163160 >108163178 >108163187
--GPU ordering and prefill strategies for Qwen3.5-397B-A17B:
>108162676 >108162812 >108162870 >108162855 >108163161
--Qwen 3.5 output similarities to Gemini and adversarial AI threats:
>108165309 >108165342 >108165372 >108165394 >108165408
--MoE memory efficiency and multi-GPU power constraints:
>108163343 >108163369 >108163414 >108163504 >108163528 >108163593 >108163740 >108163750 >108163776
--DOTS OCR 1.5 model released on Hugging Face:
>108160205 >108161767 >108161800 >108161832 >108161849
--GLM-5-GGUF quantization efficiency analysis using perplexity metrics:
>108162324 >108162378
--Add support for Tiny Aya Models:
>108164202 >108164293 >108164256
--AI roleplay trends and coding-focused model dominance:
>108160831 >108160909 >108161031 >108161194 >108161247 >108161454 >108161483 >108161496 >108161527
--Miku (free space):
>108159644 >108159657 >108160615 >108160627 >108162391 >108162525 >108162628 >108165778

►Recent Highlight Posts from the Previous Thread: >>108159577

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/16/26(Mon)19:32:49 No.108166669

Anonymous 02/16/26(Mon)19:32:49 No.108166669

Anyone tried Kimi Linear 48b a3b? Any sampler settings?

Anonymous
02/16/26(Mon)19:36:38 No.108166690

Anonymous 02/16/26(Mon)19:36:38 No.108166690

Just release new Deepseek come on...

Anonymous
02/16/26(Mon)19:42:22 No.108166736

Anonymous 02/16/26(Mon)19:42:22 No.108166736

>>108166690
It's over. Chinese new year already started.

Anonymous
02/16/26(Mon)19:42:31 No.108166738

Anonymous 02/16/26(Mon)19:42:31 No.108166738

>>108166690
Tomorrow. Pray we get the 200b.

Anonymous
02/16/26(Mon)19:45:58 No.108166761

Anonymous 02/16/26(Mon)19:45:58 No.108166761

>>108166690
Operating as an actual research lab rather than a product company means they feel no obligation to rush releases

Anonymous
02/16/26(Mon)19:51:30 No.108166810

Anonymous 02/16/26(Mon)19:51:30 No.108166810

Rumor is that Opus 4.6 is actually Sonnet, and that there's no reason for anthropic to release and run the real 4.6 since its likely worse and costs more to run inference. What are your takes on this claim?

Anonymous
02/16/26(Mon)19:53:30 No.108166827

Anonymous 02/16/26(Mon)19:53:30 No.108166827

>>108166810
my take which speaks for the rest of /lmg/ is that dario is a jew and that "claude" and "sonnet" are not names of local models

Anonymous
02/16/26(Mon)19:55:58 No.108166840

Anonymous 02/16/26(Mon)19:55:58 No.108166840

>>108166827
Agreed. Only reason why I bring it up is because I'm interested in how Sonnet supposedly outperforms Opus and if the leading labs are implementing techniques like engram to do so.

Anonymous
02/16/26(Mon)19:59:34 No.108166864

Anonymous 02/16/26(Mon)19:59:34 No.108166864

>>108166810
4.5 was the one that was the rebranded Sonnet. 4.6 is just an evolution of that.
The shift from 4.1 to 4.5 (which incidentally was also a bit cheaper than 4.1) made it pretty obvious that it's fundamentally a different model. Opus 4.5 and onward feels much more MoE-y like the Sonnet series unlike Opus pre-4.5 which was either dense or had a stupid amounts of activated parameters.

Anonymous
02/16/26(Mon)20:03:16 No.108166889

Anonymous 02/16/26(Mon)20:03:16 No.108166889

>>108166864
>Opus 4.5 and onward feels much more MoE-y like the Sonnet series
I have a suspicion that anthropic and co are using some of the same techniques coming from chinese open source labs like engram, mhc etc. I wonder exactly what they're doing and just how ahead they are. For instance the engram conditional memory paper released only last month from deepseek.

Anonymous
02/16/26(Mon)20:07:01 No.108166922

Anonymous 02/16/26(Mon)20:07:01 No.108166922

File: 1763300371395374.jpg (37 KB, 971x845)

37 KB JPG

>2 tokens per second

Anonymous
02/16/26(Mon)20:10:13 No.108166948

Anonymous 02/16/26(Mon)20:10:13 No.108166948

File: yodi.gif (1.79 MB, 480x270)

1.79 MB GIF

>>108166922
>mfw

Anonymous
02/16/26(Mon)20:18:52 No.108166986

Anonymous 02/16/26(Mon)20:18:52 No.108166986

File: deepseekv4.jpg (441 KB, 1097x1167)

441 KB JPG

https://introl.com/blog/deepseek-v4-trillion-parameter-coding-model-february-2026
>v4 full weights
>96gb VRAM + 256gb VRAM
Unless they mean at like q3-q2 it seems impossible unless their new architecture changes work miracles for reducing the amount of compute you need.

Anonymous
02/16/26(Mon)20:22:06 No.108166998

Anonymous 02/16/26(Mon)20:22:06 No.108166998

File: file.png (43 KB, 1191x181)

43 KB PNG

>>108166986
something tells me this entire article is completely fucking bullshit. the 5090 has been out for over a year

Anonymous
02/16/26(Mon)20:23:16 No.108167008

Anonymous 02/16/26(Mon)20:23:16 No.108167008

File: file.png (28 KB, 513x356)

28 KB PNG

>>108166986
no way in hell are either of these models dense either. this shit is aislop fanfiction

Anonymous
02/16/26(Mon)20:24:49 No.108167018

Anonymous 02/16/26(Mon)20:24:49 No.108167018

>>108166998
>>108167008
Yeah, pretty suspicious. The entire site is probably an AI grift.

Anonymous
02/16/26(Mon)20:25:58 No.108167026

Anonymous 02/16/26(Mon)20:25:58 No.108167026

File: file.png (59 KB, 1172x308)

59 KB PNG

>>108167018
and what the fuck is this? seems like some sort of trojan.
https://deepseek4.org/

Anonymous
02/16/26(Mon)20:27:12 No.108167037

Anonymous 02/16/26(Mon)20:27:12 No.108167037

File: file.png (280 KB, 1878x1116)

280 KB PNG

if this is real, v4 is releasing tomorrow

Anonymous
02/16/26(Mon)20:29:21 No.108167052

Anonymous 02/16/26(Mon)20:29:21 No.108167052

File: jeet.jpg (155 KB, 2258x372)

155 KB JPG

>>108167026
>>108167037
Please sirs do the needful and click on my links. v4 very performance sirs.

Anonymous
02/16/26(Mon)20:30:00 No.108167055

Anonymous 02/16/26(Mon)20:30:00 No.108167055

Anyone know if there was ever a good successor to Qwen3-Coder 30b A3B?

Im looking for something that's still fast & plays nicely with agentic coding extensions like Cline, while still being able to fit in 32Gbs.

So far, I've tried Devstral Small 2 & GLM 4.7 Flash. These work okay, but in regards to speed, size & quality of code, Qwen3-Coder is still the gold standard for 32GB workstations IMO.

Anonymous
02/16/26(Mon)20:32:16 No.108167071

Anonymous 02/16/26(Mon)20:32:16 No.108167071

What's a good model for 64GB of vram? I've gotten bored of glm.

Anonymous
02/16/26(Mon)20:33:59 No.108167077

Anonymous 02/16/26(Mon)20:33:59 No.108167077

The whale

Anonymous
02/16/26(Mon)20:37:52 No.108167097

Anonymous 02/16/26(Mon)20:37:52 No.108167097

suck teto toes

Anonymous
02/16/26(Mon)20:38:59 No.108167102

Anonymous 02/16/26(Mon)20:38:59 No.108167102

>>108167071
nemo

Anonymous
02/16/26(Mon)20:41:12 No.108167116

Anonymous 02/16/26(Mon)20:41:12 No.108167116

>>108167102
kys

Anonymous
02/16/26(Mon)20:41:32 No.108167119

Anonymous 02/16/26(Mon)20:41:32 No.108167119

i am going to qwen 3.5 at q6
am i retarded?

Anonymous
02/16/26(Mon)20:42:47 No.108167132

Anonymous 02/16/26(Mon)20:42:47 No.108167132

>>108167071
Qwen 3.5 35B.

Anonymous
02/16/26(Mon)20:43:05 No.108167135

Anonymous 02/16/26(Mon)20:43:05 No.108167135

File: deepseekv4.jpg (441 KB, 1097x1167)

441 KB JPG

I tried to get Minimax M2.5 Q4 to implement an ik_llama.cpp loader into ooba last night.

It didn't work. Going to try again later methinks.

Anonymous
02/16/26(Mon)20:50:57 No.108167182

Anonymous 02/16/26(Mon)20:50:57 No.108167182

>>108167116
still nemo

Anonymous
02/16/26(Mon)20:51:27 No.108167185

Anonymous 02/16/26(Mon)20:51:27 No.108167185

>>108166690
next week

Anonymous
02/16/26(Mon)20:53:44 No.108167201

Anonymous 02/16/26(Mon)20:53:44 No.108167201

File: 1743768237481027.jpg (54 KB, 535x462)

54 KB JPG

>>108167135
>local model for anything except cooming

Anonymous
02/16/26(Mon)20:57:25 No.108167230

Anonymous 02/16/26(Mon)20:57:25 No.108167230

>>108167201
you guys pay subscription fees?

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.