/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 03/14/26(Sat)02:49:49 No.108368195

File: 3ssion.jpg (217 KB, 1024x1024)

/lmg/ - Local Models General Anonymous 03/14/26(Sat)02:49:49 No.108368195

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108362305 & >>108356979

►News
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/14/26(Sat)02:50:14 No.108368198

Anonymous 03/14/26(Sat)02:50:14 No.108368198

File: dssssssss.jpg (101 KB, 854x854)

101 KB JPG

►Recent Highlights from the Previous Thread: >>108362305

--llama.cpp reasoning budget sampler breaking tool calling workflows:
>108363630 >108363637 >108363647 >108363707 >108363721 >108363731 >108363741 >108363776
--Reasoning budget sampler for controlling Qwen 3.5 token usage:
>108362684 >108362795 >108363032 >108363053 >108363081 >108363112 >108363151 >108363187 >108363198 >108363229 >108363317
--Google releases WAXAL African language speech dataset amid Gemma 4 delays:
>108362761 >108362813
--High-memory LLM configurations and GPU utilization:
>108364020 >108364064 >108364392 >108364404 >108364422 >108364455 >108364481 >108364549 >108364598 >108364503 >108364926 >108365150 >108366414 >108366536
--Mistral-Large-3-675B-Instruct-2512 model obscurity and technical details:
>108365246 >108365259 >108365294 >108365285 >108365426
--Voice conversion methods and limitations with Qwen3-TTS:
>108363196 >108363211 >108363225 >108363263 >108363267 >108363290 >108363378
--Performance differences between llama-cli and llama-server:
>108363483 >108363549 >108363644 >108364517 >108364542 >108364669
--Qwen3.5-27B performance discrepancy due to quantization confusion:
>108367280 >108367297 >108367305 >108367311 >108367328
--String ban robustness and regex ban PR for ik_llama.cpp:
>108363666
--Comparing bare metal and VM performance benchmarks:
>108364326
--Anthropic and Meta lobbying for AI regulations:
>108362986
--MCP server persistence issues with llama.cpp frontend:
>108363692
--PocketTTS.cpp Windows compatibility fixes shared:
>108365171
--Miku (free space):
>108365163 >108366572 >108366629 >108367228 >108366923 >108367052

►Recent Highlight Posts from the Previous Thread: >>108362965

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/14/26(Sat)02:59:26 No.108368243

Anonymous 03/14/26(Sat)02:59:26 No.108368243

>ai models are getting more and more intelligent by time
then why these smart models cant describe something as simple as smell of mikupussy anymore? bring a 2023 model and ask it to describe the smell of mikupussy and see it yourself

Anonymous
03/14/26(Sat)03:09:21 No.108368283

Anonymous 03/14/26(Sat)03:09:21 No.108368283

File: Screenshot from 2026-03-1(...).png (232 KB, 893x813)

232 KB PNG

>>108368243
>tl;dr ozone and leeks with a hint of musk and vanilla
What is it supposed to smell like?

Anonymous
03/14/26(Sat)03:15:21 No.108368309

Anonymous 03/14/26(Sat)03:15:21 No.108368309

>wow, this model puts out some sweet writing I could never do myself
>writes plot like a woman
can't have everything

Anonymous
03/14/26(Sat)03:19:45 No.108368329

Anonymous 03/14/26(Sat)03:19:45 No.108368329

File: 1728100691817.jpg (33 KB, 540x528)

33 KB JPG

>>108368283
>What is it supposed to smell like?
no idea. just wanna see the models describe it

Anonymous
03/14/26(Sat)03:22:12 No.108368337

Anonymous 03/14/26(Sat)03:22:12 No.108368337

>>108368329
they seem to be struggling to do that and the newer the model is, the lesser it suits to my tastes

Anonymous
03/14/26(Sat)03:23:38 No.108368342

Anonymous 03/14/26(Sat)03:23:38 No.108368342

>>108368283
like the short circuit from a dumb boomer at starbucks dropping their coffe onto their laptop

Anonymous
03/14/26(Sat)03:36:59 No.108368387

Anonymous 03/14/26(Sat)03:36:59 No.108368387

>>108368309
nevermind
>(also, please don't write the plot like a woman. The prose is good, but try to stay consistent with the themes. No, X won't come back apologizing next day (Y will have to reach him), and no, Y won't magically understand everything instantly)
I can't believe it worked

Anonymous
03/14/26(Sat)04:06:36 No.108368469

Anonymous 03/14/26(Sat)04:06:36 No.108368469

When they will start installing dedicated ai cp on each phone and pc?

Anonymous
03/14/26(Sat)04:08:57 No.108368475

Anonymous 03/14/26(Sat)04:08:57 No.108368475

>>108368469
>cp
When pedophiles start ruling the world. Wait...

Anonymous
03/14/26(Sat)04:15:27 No.108368505

Anonymous 03/14/26(Sat)04:15:27 No.108368505

>>108368469
when they want you gone and cant find anything to get you on

Anonymous
03/14/26(Sat)04:15:39 No.108368506

Anonymous 03/14/26(Sat)04:15:39 No.108368506

>>108368469
>ai rm -rf

Anonymous
03/14/26(Sat)04:22:00 No.108368537

Anonymous 03/14/26(Sat)04:22:00 No.108368537

File: 1770754456808040.png (512 KB, 743x932)

512 KB PNG

>>108368475
Debunked.

Anonymous
03/14/26(Sat)04:44:41 No.108368619

Anonymous 03/14/26(Sat)04:44:41 No.108368619

where fears and lies
melt away

music will tie
wonk uoy naht noitceffa erom deen i

Anonymous
03/14/26(Sat)05:02:35 No.108368672

Anonymous 03/14/26(Sat)05:02:35 No.108368672

File: 1745708976003853.png (131 KB, 1149x490)

131 KB PNG

>>108368243

Anonymous
03/14/26(Sat)05:03:39 No.108368679

Anonymous 03/14/26(Sat)05:03:39 No.108368679

>>108368672
slop

Anonymous
03/14/26(Sat)05:05:02 No.108368685

Anonymous 03/14/26(Sat)05:05:02 No.108368685

>>108368679
>slop
Define it.

Anonymous
03/14/26(Sat)05:19:04 No.108368727

Anonymous 03/14/26(Sat)05:19:04 No.108368727

>>108368469
The FBI and CIA has been doing this to troublemakers for years. If they really want you then they are going to get you.

Anonymous
03/14/26(Sat)05:20:42 No.108368733

Anonymous 03/14/26(Sat)05:20:42 No.108368733

Not local, but anyone knows why I can't use GPT-5.4 Pro on openrouter? It says I have insufficient credits but my balance is positive

Anonymous
03/14/26(Sat)05:22:33 No.108368737

Anonymous 03/14/26(Sat)05:22:33 No.108368737

>>108368733
>>>/g/aicg/

Anonymous
03/14/26(Sat)05:23:27 No.108368739

Anonymous 03/14/26(Sat)05:23:27 No.108368739

>>108368243
Mikupussy smells like BLACK BULL semen

Anonymous
03/14/26(Sat)05:25:08 No.108368746

Anonymous 03/14/26(Sat)05:25:08 No.108368746

File: grok.png (328 KB, 915x675)

328 KB PNG

>>108368243

Anonymous
03/14/26(Sat)05:26:47 No.108368753

Anonymous 03/14/26(Sat)05:26:47 No.108368753

What's the advantage of saving the cache? The model still needs to reprocess everything, no?

Anonymous
03/14/26(Sat)05:29:13 No.108368757

Anonymous 03/14/26(Sat)05:29:13 No.108368757

>>108368746
Now this is slop.

Anonymous
03/14/26(Sat)05:31:53 No.108368761

Anonymous 03/14/26(Sat)05:31:53 No.108368761

>>108368753
>What's the advantage of saving the cache?
Not having to reprocess the whole thing.
>The model still needs to reprocess everything, no?
Not if you have/load a previous cache.
But are you talking about the rnn/ssm state from the new qwen models or the save/restore you can do with the /slots/n/action={save|restore} endpoint? Both should work.

Anonymous
03/14/26(Sat)05:32:02 No.108368762

Anonymous 03/14/26(Sat)05:32:02 No.108368762

>>108368746
Without last 3 lines, I like.

Anonymous
03/14/26(Sat)05:51:32 No.108368825

Anonymous 03/14/26(Sat)05:51:32 No.108368825

Moonshot will announce Kimi K3 on GTC on March 18th

Anonymous
03/14/26(Sat)05:54:12 No.108368835

Anonymous 03/14/26(Sat)05:54:12 No.108368835

Hunter Alpha and Healer Alpha are both from Zhipu

Anonymous
03/14/26(Sat)05:56:03 No.108368848

Anonymous 03/14/26(Sat)05:56:03 No.108368848

File: breakallthethings.png (212 KB, 1224x1022)

212 KB PNG

lol, breakages caused by the vibeshitter are endless and still are to be fully fixed, this one must have flown under the radar because almost none of us run models like Kimi locally. If you use more uncommon models, you'd be better off not merging any of the parser related commits still.
this is the power of agentic niggers and claude code. this is why we must gatekeep this thread away from telling people how to vibecode. they need to eat razor blades instead.

Anonymous
03/14/26(Sat)06:01:49 No.108368868

Anonymous 03/14/26(Sat)06:01:49 No.108368868

File: mikupussz.png (19 KB, 804x296)

19 KB PNG

>>108368243
Can't get the ozone out of it.

Anonymous
03/14/26(Sat)06:11:14 No.108368894

Anonymous 03/14/26(Sat)06:11:14 No.108368894

>>108368835
I wouldn't mind if these were DS because 3.0 was rather crappy, and they tuned it into greatness. Unless they're back to being completely unmemorable like the pre-3.0 era (though I know this is /lmg/ and some anons used their small coder models), 4.0 can be uninspiring but technologically novel and they'll bring it home with 4.1 or R2.

Anonymous
03/14/26(Sat)06:17:22 No.108368921

Anonymous 03/14/26(Sat)06:17:22 No.108368921

>>108368848
That shit should have never been implemented on the server. That's client-side stuff. The problem started before he got involved, but he's definitely not helping.

Anonymous
03/14/26(Sat)06:18:50 No.108368929

Anonymous 03/14/26(Sat)06:18:50 No.108368929

how do i make qwen3.5 27B not think for 10000 tokens?

Anonymous
03/14/26(Sat)06:19:04 No.108368933

Anonymous 03/14/26(Sat)06:19:04 No.108368933

>>108368672
>no ozone
trash

Anonymous
03/14/26(Sat)06:21:47 No.108368950

Anonymous 03/14/26(Sat)06:21:47 No.108368950

>>108368929
good system prompt + pwilkin's new vibeshitted reasoning budget + end phrase :D
I LOVE VIBEGARBOJ

Anonymous
03/14/26(Sat)06:22:11 No.108368954

Anonymous 03/14/26(Sat)06:22:11 No.108368954

>>108368929
turn the reasoning off with edited template

Anonymous
03/14/26(Sat)06:22:16 No.108368955

Anonymous 03/14/26(Sat)06:22:16 No.108368955

>>108368929
Prefill <think></think>

Anonymous
03/14/26(Sat)06:39:43 No.108369021

Anonymous 03/14/26(Sat)06:39:43 No.108369021

File: 1639692511780.jpg (249 KB, 1000x998)

249 KB JPG

>>108365171
Thanks for the information/update. I almost missed this because I was working on ASR-related AI stuff the other day. I pushed the changes into the main repo with some minor edits. Onnx runtime should now default to using the more updated version cmake pulls by default, so you won't have to pull in the dll yourself manually.

Very interested to see what the performance looks like on other machines. If you could share a screenshot of the --profile and include what CPU you have for reference I'd greatly appreciate it.

https://github.com/VolgaGerm/PocketTTS.cpp

>>108368198
Also thanks anon for the threadly qrd. I would have missed the update otherwise, lel.

Anonymous
03/14/26(Sat)06:52:58 No.108369070

Anonymous 03/14/26(Sat)06:52:58 No.108369070

What an absolutely worthless thread we have today. I hope blacked miku spam returns to show mikutroons their place.

Anonymous
03/14/26(Sat)06:54:50 No.108369081

Anonymous 03/14/26(Sat)06:54:50 No.108369081

be the chang you want to xi

Anonymous
03/14/26(Sat)06:55:56 No.108369090

Anonymous 03/14/26(Sat)06:55:56 No.108369090

>>108368283
That is a trick question. The vocaloid's pussy is actually a dick. The riddle demonstrates how deeply ingrained gender roles are in society, often causing people to assume that a long green haired person is actually a woman when in reality it is a troon.

Anonymous
03/14/26(Sat)06:57:43 No.108369098

Anonymous 03/14/26(Sat)06:57:43 No.108369098

>>108369090
kek

Anonymous
03/14/26(Sat)07:00:11 No.108369108

Anonymous 03/14/26(Sat)07:00:11 No.108369108

anyone use local for real work and not just fucking around?

Anonymous
03/14/26(Sat)07:03:50 No.108369121

Anonymous 03/14/26(Sat)07:03:50 No.108369121

>>108369108
I used to use it for RP, nowadays it's mainly for personal information (like some law stuff, or finance stuff since I'm investing) obviously with web search / fetching.
For my own projects free tiers fo gemini are usually enough (gemini pro / flash), never ran out of flash usage.
For actual work at my company, we have the company provided Amazon Q with sonnet 4.6 (no opussy because they're big nosed sadly)

Anonymous
03/14/26(Sat)07:14:02 No.108369155

Anonymous 03/14/26(Sat)07:14:02 No.108369155

>>108369108
GLM 4.7 is perfectly fine for programming

Anonymous
03/14/26(Sat)07:18:38 No.108369166

Anonymous 03/14/26(Sat)07:18:38 No.108369166

>>108369108
I've been using it recently for asking stupid programming-related questions and generating example snippets.
I copy-pasted a Javascript SSE parser out of it, which isn't really complicated but it's less thinking to read and fix the solution (e.g. the AbortController was instantiated, but not plumbed through to fetch) than to write it from nothing.
It's debatable whether you could call anything I do "real work", though.

Anonymous
03/14/26(Sat)07:22:31 No.108369180

Anonymous 03/14/26(Sat)07:22:31 No.108369180

File: teto.jpg (493 KB, 1040x1422)

493 KB JPG

>vibe-ported Qualcomm charge control from Android to Linux using Qwen3.5-35B-A3B
wish me luck, my phone about to turn into Galaxy Note 7

Anonymous
03/14/26(Sat)07:29:09 No.108369205

Anonymous 03/14/26(Sat)07:29:09 No.108369205

>>108369180
cellphones sure would be more useful if you could just boot linux on them

Anonymous
03/14/26(Sat)07:29:10 No.108369206

Anonymous 03/14/26(Sat)07:29:10 No.108369206

>>108369180
You should use 27B unless you REALLY need speed.

Anonymous
03/14/26(Sat)07:31:22 No.108369219

Anonymous 03/14/26(Sat)07:31:22 No.108369219

>>108369206
It is such a weird thing how dense model fetish was created by frivolous 3090 purchases.

Anonymous
03/14/26(Sat)07:38:44 No.108369245

Anonymous 03/14/26(Sat)07:38:44 No.108369245

>>108369205
You can though you just need specific phones

Anonymous
03/14/26(Sat)07:40:42 No.108369255

Anonymous 03/14/26(Sat)07:40:42 No.108369255

>>108369245
yeah but I mean all of them like you can install it on a pc instead of windows, the list of phones that exist vs ones you can run linux on is microscopic

Anonymous
03/14/26(Sat)07:41:19 No.108369260

Anonymous 03/14/26(Sat)07:41:19 No.108369260

File: pmos.png (60 KB, 848x615)

60 KB PNG

>>108369205
yeah, the Android kernel support model is so retarded.
Thankfully a few older SoCs are pretty well supported on upstream Linux, you can boot mainline Linux on them, even stuff like GSM, GPS and hardware acceleration work. They are all still buggy though, so close yet so far from making it a daily-drive'able phone.
>>108369206
nah, i'm on 1060

Anonymous
03/14/26(Sat)07:43:59 No.108369273

Anonymous 03/14/26(Sat)07:43:59 No.108369273

>>108369180
>Qualcomm charge control
What are you doing, exactly. Are you trying to get wireless charging controls working on your desktop or something? I don't get it.

Anonymous
03/14/26(Sat)07:47:59 No.108369287

Anonymous 03/14/26(Sat)07:47:59 No.108369287

File: moonshine2.webm (460 KB, 1280x720)

460 KB WEBM

My demo of Moonshinev2 ASR.

https://files.catbox.moe/t5tr26.webm

Anonymous
03/14/26(Sat)07:48:24 No.108369289

Anonymous 03/14/26(Sat)07:48:24 No.108369289

>>108368825
Do you think it's Hunter or Healer? Gotta be Hunter, right?

Anonymous
03/14/26(Sat)07:51:30 No.108369298

Anonymous 03/14/26(Sat)07:51:30 No.108369298

>>108369287
Cool

Anonymous
03/14/26(Sat)07:54:06 No.108369307

Anonymous 03/14/26(Sat)07:54:06 No.108369307

>>108369273
disabling charging after reaching a certain percentage to not wear down the battery. Linux already has current control for this Qualcomm charger, but there's a separate on/off charging bit that never got implemented (but it is used by Oneplus Android Kernel) that from that i've read could disable battery charging entirely and allow to power the SoC without funneling all the power through the battery first.
Should prolong the battery life if it really works like that. Batteries for older phones are a commodity, original replacements are still sold by Oneplus, but they are all new-old stock from 2020 that already sit at 0% at some warehouse and degrade.

Anonymous
03/14/26(Sat)07:56:05 No.108369312

Anonymous 03/14/26(Sat)07:56:05 No.108369312

>>108369289
>>108368835

Anonymous
03/14/26(Sat)08:03:21 No.108369340

Anonymous 03/14/26(Sat)08:03:21 No.108369340

talking head sota?

Anonymous
03/14/26(Sat)08:03:23 No.108369341

Anonymous 03/14/26(Sat)08:03:23 No.108369341

>>108368825
God I wish we got an upgrade to K2.5 that fixes its abhorrent writing style. Right now I'm stuck between GLM5 for writing and K2.5 for image recognition/vision.
If they fix K2.5's stupid ADHD style of writing, it'd be close to endgame for me.

Anonymous
03/14/26(Sat)08:03:49 No.108369344

Anonymous 03/14/26(Sat)08:03:49 No.108369344

>>108368835
4.9 pls. Just a different slop profile and about 500% less determinism please.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.