[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108159576 & >>108149287

►News
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/15) dots.ocr-1.5 temporarily released: https://hf.co/rednote-hilab/dots.ocr-1.5
>(02/15) Ling-2.5-1T released: https://hf.co/inclusionAI/Ling-2.5-1T
>(02/14) JoyAI-LLM Flash 48B-A3B released: https://hf.co/jdopensource/JoyAI-LLM-Flash
>(02/14) Nemotron Nano 12B v2 VL support merged: https://github.com/ggml-org/llama.cpp/pull/19547

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: catto.jpg (160 KB, 1120x1120)
160 KB
160 KB JPG
►Recent Highlights from the Previous Thread: >>108159576

--Papers:
>108159925
--Qwen3.5-397B-A17B multimodal model and safety policy frustrations:
>108160387 >108160449 >108160474 >108160789 >108161039 >108161110 >108160576 >108160589 >108160615 >108160627 >108160834
--Qwen3.5-397B-A17B release and benchmark performance comparisons:
>108160792 >108160809 >108160813 >108161009 >108160819 >108160826
--Mythic AI's analog compute scalability and efficiency claims under scrutiny:
>108164002 >108164265 >108164280 >108164333
--Comparing LLM responses to explicit prompts and ethical safeguards:
>108161568 >108161583 >108161586 >108161590 >108161598 >108161600 >108161647 >108161712 >108161740 >108162921 >108165334 >108165381 >108165405 >108165431 >108165411 >108165434 >108165529 >108165578 >108165587 >108165589
--Qwen 3.5 Plus accurately interprets /pol/ meme symbolism:
>108163097 >108163108 >108163147 >108163160 >108163178 >108163187
--GPU ordering and prefill strategies for Qwen3.5-397B-A17B:
>108162676 >108162812 >108162870 >108162855 >108163161
--Qwen 3.5 output similarities to Gemini and adversarial AI threats:
>108165309 >108165342 >108165372 >108165394 >108165408
--MoE memory efficiency and multi-GPU power constraints:
>108163343 >108163369 >108163414 >108163504 >108163528 >108163593 >108163740 >108163750 >108163776
--DOTS OCR 1.5 model released on Hugging Face:
>108160205 >108161767 >108161800 >108161832 >108161849
--GLM-5-GGUF quantization efficiency analysis using perplexity metrics:
>108162324 >108162378
--Add support for Tiny Aya Models:
>108164202 >108164293 >108164256
--AI roleplay trends and coding-focused model dominance:
>108160831 >108160909 >108161031 >108161194 >108161247 >108161454 >108161483 >108161496 >108161527
--Miku (free space):
>108159644 >108159657 >108160615 >108160627 >108162391 >108162525 >108162628 >108165778

►Recent Highlight Posts from the Previous Thread: >>108159577

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Anyone tried Kimi Linear 48b a3b? Any sampler settings?
>>
Just release new Deepseek come on...
>>
>>108166690
It's over. Chinese new year already started.
>>
>>108166690
Tomorrow. Pray we get the 200b.
>>
>>108166690
Operating as an actual research lab rather than a product company means they feel no obligation to rush releases
>>
Rumor is that Opus 4.6 is actually Sonnet, and that there's no reason for anthropic to release and run the real 4.6 since its likely worse and costs more to run inference. What are your takes on this claim?
>>
>>108166810
my take which speaks for the rest of /lmg/ is that dario is a jew and that "claude" and "sonnet" are not names of local models
>>
>>108166827
Agreed. Only reason why I bring it up is because I'm interested in how Sonnet supposedly outperforms Opus and if the leading labs are implementing techniques like engram to do so.
>>
>>108166810
4.5 was the one that was the rebranded Sonnet. 4.6 is just an evolution of that.
The shift from 4.1 to 4.5 (which incidentally was also a bit cheaper than 4.1) made it pretty obvious that it's fundamentally a different model. Opus 4.5 and onward feels much more MoE-y like the Sonnet series unlike Opus pre-4.5 which was either dense or had a stupid amounts of activated parameters.
>>
>>108166864
>Opus 4.5 and onward feels much more MoE-y like the Sonnet series
I have a suspicion that anthropic and co are using some of the same techniques coming from chinese open source labs like engram, mhc etc. I wonder exactly what they're doing and just how ahead they are. For instance the engram conditional memory paper released only last month from deepseek.
>>
File: 1763300371395374.jpg (37 KB, 971x845)
37 KB
37 KB JPG
>2 tokens per second
>>
File: yodi.gif (1.79 MB, 480x270)
1.79 MB
1.79 MB GIF
>>108166922
>mfw
>>
File: deepseekv4.jpg (441 KB, 1097x1167)
441 KB
441 KB JPG
https://introl.com/blog/deepseek-v4-trillion-parameter-coding-model-february-2026
>v4 full weights
>96gb VRAM + 256gb VRAM
Unless they mean at like q3-q2 it seems impossible unless their new architecture changes work miracles for reducing the amount of compute you need.
>>
File: file.png (43 KB, 1191x181)
43 KB
43 KB PNG
>>108166986
something tells me this entire article is completely fucking bullshit. the 5090 has been out for over a year
>>
File: file.png (28 KB, 513x356)
28 KB
28 KB PNG
>>108166986
no way in hell are either of these models dense either. this shit is aislop fanfiction
>>
>>108166998
>>108167008
Yeah, pretty suspicious. The entire site is probably an AI grift.
>>
File: file.png (59 KB, 1172x308)
59 KB
59 KB PNG
>>108167018
and what the fuck is this? seems like some sort of trojan.
https://deepseek4.org/
>>
File: file.png (280 KB, 1878x1116)
280 KB
280 KB PNG
if this is real, v4 is releasing tomorrow
>>
File: jeet.jpg (155 KB, 2258x372)
155 KB
155 KB JPG
>>108167026
>>108167037
Please sirs do the needful and click on my links. v4 very performance sirs.
>>
Anyone know if there was ever a good successor to Qwen3-Coder 30b A3B?

Im looking for something that's still fast & plays nicely with agentic coding extensions like Cline, while still being able to fit in 32Gbs.

So far, I've tried Devstral Small 2 & GLM 4.7 Flash. These work okay, but in regards to speed, size & quality of code, Qwen3-Coder is still the gold standard for 32GB workstations IMO.
>>
What's a good model for 64GB of vram? I've gotten bored of glm.
>>
The whale
>>
suck teto toes
>>
>>108167071
nemo
>>
>>108167102
kys
>>
i am going to qwen 3.5 at q6
am i retarded?
>>
>>108167071
Qwen 3.5 35B.
>>
File: deepseekv4.jpg (441 KB, 1097x1167)
441 KB
441 KB JPG
I tried to get Minimax M2.5 Q4 to implement an ik_llama.cpp loader into ooba last night.

It didn't work. Going to try again later methinks.
>>
>>108167116
still nemo
>>
>>108166690
next week
>>
File: 1743768237481027.jpg (54 KB, 535x462)
54 KB
54 KB JPG
>>108167135
>local model for anything except cooming
>>
>>108167201
you guys pay subscription fees?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.