/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 03/14/26(Sat)02:49:49 No.108368195

File: 3ssion.jpg (217 KB, 1024x1024)

/lmg/ - Local Models General Anonymous 03/14/26(Sat)02:49:49 No.108368195

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108362305 & >>108356979

►News
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/14/26(Sat)02:50:14 No.108368198

Anonymous 03/14/26(Sat)02:50:14 No.108368198

File: dssssssss.jpg (101 KB, 854x854)

101 KB JPG

►Recent Highlights from the Previous Thread: >>108362305

--llama.cpp reasoning budget sampler breaking tool calling workflows:
>108363630 >108363637 >108363647 >108363707 >108363721 >108363731 >108363741 >108363776
--Reasoning budget sampler for controlling Qwen 3.5 token usage:
>108362684 >108362795 >108363032 >108363053 >108363081 >108363112 >108363151 >108363187 >108363198 >108363229 >108363317
--Google releases WAXAL African language speech dataset amid Gemma 4 delays:
>108362761 >108362813
--High-memory LLM configurations and GPU utilization:
>108364020 >108364064 >108364392 >108364404 >108364422 >108364455 >108364481 >108364549 >108364598 >108364503 >108364926 >108365150 >108366414 >108366536
--Mistral-Large-3-675B-Instruct-2512 model obscurity and technical details:
>108365246 >108365259 >108365294 >108365285 >108365426
--Voice conversion methods and limitations with Qwen3-TTS:
>108363196 >108363211 >108363225 >108363263 >108363267 >108363290 >108363378
--Performance differences between llama-cli and llama-server:
>108363483 >108363549 >108363644 >108364517 >108364542 >108364669
--Qwen3.5-27B performance discrepancy due to quantization confusion:
>108367280 >108367297 >108367305 >108367311 >108367328
--String ban robustness and regex ban PR for ik_llama.cpp:
>108363666
--Comparing bare metal and VM performance benchmarks:
>108364326
--Anthropic and Meta lobbying for AI regulations:
>108362986
--MCP server persistence issues with llama.cpp frontend:
>108363692
--PocketTTS.cpp Windows compatibility fixes shared:
>108365171
--Miku (free space):
>108365163 >108366572 >108366629 >108367228 >108366923 >108367052

►Recent Highlight Posts from the Previous Thread: >>108362965

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/14/26(Sat)02:59:26 No.108368243

Anonymous 03/14/26(Sat)02:59:26 No.108368243

>ai models are getting more and more intelligent by time
then why these smart models cant describe something as simple as smell of mikupussy anymore? bring a 2023 model and ask it to describe the smell of mikupussy and see it yourself

Anonymous
03/14/26(Sat)03:09:21 No.108368283

Anonymous 03/14/26(Sat)03:09:21 No.108368283

File: Screenshot from 2026-03-1(...).png (232 KB, 893x813)

232 KB PNG

>>108368243
>tl;dr ozone and leeks with a hint of musk and vanilla
What is it supposed to smell like?

Anonymous
03/14/26(Sat)03:15:21 No.108368309

Anonymous 03/14/26(Sat)03:15:21 No.108368309

>wow, this model puts out some sweet writing I could never do myself
>writes plot like a woman
can't have everything

Anonymous
03/14/26(Sat)03:19:45 No.108368329

Anonymous 03/14/26(Sat)03:19:45 No.108368329

File: 1728100691817.jpg (33 KB, 540x528)

33 KB JPG

>>108368283
>What is it supposed to smell like?
no idea. just wanna see the models describe it

Anonymous
03/14/26(Sat)03:22:12 No.108368337

Anonymous 03/14/26(Sat)03:22:12 No.108368337

>>108368329
they seem to be struggling to do that and the newer the model is, the lesser it suits to my tastes

Anonymous
03/14/26(Sat)03:23:38 No.108368342

Anonymous 03/14/26(Sat)03:23:38 No.108368342

>>108368283
like the short circuit from a dumb boomer at starbucks dropping their coffe onto their laptop

Anonymous
03/14/26(Sat)03:36:59 No.108368387

Anonymous 03/14/26(Sat)03:36:59 No.108368387

>>108368309
nevermind
>(also, please don't write the plot like a woman. The prose is good, but try to stay consistent with the themes. No, X won't come back apologizing next day (Y will have to reach him), and no, Y won't magically understand everything instantly)
I can't believe it worked

Anonymous
03/14/26(Sat)04:06:36 No.108368469

Anonymous 03/14/26(Sat)04:06:36 No.108368469

When they will start installing dedicated ai cp on each phone and pc?

Anonymous
03/14/26(Sat)04:08:57 No.108368475

Anonymous 03/14/26(Sat)04:08:57 No.108368475

>>108368469
>cp
When pedophiles start ruling the world. Wait...

Anonymous
03/14/26(Sat)04:15:27 No.108368505

Anonymous 03/14/26(Sat)04:15:27 No.108368505

>>108368469
when they want you gone and cant find anything to get you on

Anonymous
03/14/26(Sat)04:15:39 No.108368506

Anonymous 03/14/26(Sat)04:15:39 No.108368506

>>108368469
>ai rm -rf

Anonymous
03/14/26(Sat)04:22:00 No.108368537

Anonymous 03/14/26(Sat)04:22:00 No.108368537

File: 1770754456808040.png (512 KB, 743x932)

512 KB PNG

>>108368475
Debunked.

Anonymous
03/14/26(Sat)04:44:41 No.108368619

Anonymous 03/14/26(Sat)04:44:41 No.108368619

where fears and lies
melt away

music will tie
wonk uoy naht noitceffa erom deen i

Anonymous
03/14/26(Sat)05:02:35 No.108368672

Anonymous 03/14/26(Sat)05:02:35 No.108368672

File: 1745708976003853.png (131 KB, 1149x490)

131 KB PNG

>>108368243

Anonymous
03/14/26(Sat)05:03:39 No.108368679

Anonymous 03/14/26(Sat)05:03:39 No.108368679

>>108368672
slop

Anonymous
03/14/26(Sat)05:05:02 No.108368685

Anonymous 03/14/26(Sat)05:05:02 No.108368685

>>108368679
>slop
Define it.

Anonymous
03/14/26(Sat)05:19:04 No.108368727

Anonymous 03/14/26(Sat)05:19:04 No.108368727

>>108368469
The FBI and CIA has been doing this to troublemakers for years. If they really want you then they are going to get you.

Anonymous
03/14/26(Sat)05:20:42 No.108368733

Anonymous 03/14/26(Sat)05:20:42 No.108368733

Not local, but anyone knows why I can't use GPT-5.4 Pro on openrouter? It says I have insufficient credits but my balance is positive

Anonymous
03/14/26(Sat)05:22:33 No.108368737

Anonymous 03/14/26(Sat)05:22:33 No.108368737

>>108368733
>>>/g/aicg/

Anonymous
03/14/26(Sat)05:23:27 No.108368739

Anonymous 03/14/26(Sat)05:23:27 No.108368739

>>108368243
Mikupussy smells like BLACK BULL semen

Anonymous
03/14/26(Sat)05:25:08 No.108368746

Anonymous 03/14/26(Sat)05:25:08 No.108368746

File: grok.png (328 KB, 915x675)

328 KB PNG

>>108368243

Anonymous
03/14/26(Sat)05:26:47 No.108368753

Anonymous 03/14/26(Sat)05:26:47 No.108368753

What's the advantage of saving the cache? The model still needs to reprocess everything, no?

Anonymous
03/14/26(Sat)05:29:13 No.108368757

Anonymous 03/14/26(Sat)05:29:13 No.108368757

>>108368746
Now this is slop.

Anonymous
03/14/26(Sat)05:31:53 No.108368761

Anonymous 03/14/26(Sat)05:31:53 No.108368761

>>108368753
>What's the advantage of saving the cache?
Not having to reprocess the whole thing.
>The model still needs to reprocess everything, no?
Not if you have/load a previous cache.
But are you talking about the rnn/ssm state from the new qwen models or the save/restore you can do with the /slots/n/action={save|restore} endpoint? Both should work.

Anonymous
03/14/26(Sat)05:32:02 No.108368762

Anonymous 03/14/26(Sat)05:32:02 No.108368762

>>108368746
Without last 3 lines, I like.

Anonymous
03/14/26(Sat)05:51:32 No.108368825

Anonymous 03/14/26(Sat)05:51:32 No.108368825

Moonshot will announce Kimi K3 on GTC on March 18th

Anonymous
03/14/26(Sat)05:54:12 No.108368835

Anonymous 03/14/26(Sat)05:54:12 No.108368835

Hunter Alpha and Healer Alpha are both from Zhipu

Anonymous
03/14/26(Sat)05:56:03 No.108368848

Anonymous 03/14/26(Sat)05:56:03 No.108368848

File: breakallthethings.png (212 KB, 1224x1022)

212 KB PNG

lol, breakages caused by the vibeshitter are endless and still are to be fully fixed, this one must have flown under the radar because almost none of us run models like Kimi locally. If you use more uncommon models, you'd be better off not merging any of the parser related commits still.
this is the power of agentic niggers and claude code. this is why we must gatekeep this thread away from telling people how to vibecode. they need to eat razor blades instead.

Anonymous
03/14/26(Sat)06:01:49 No.108368868

Anonymous 03/14/26(Sat)06:01:49 No.108368868

File: mikupussz.png (19 KB, 804x296)

19 KB PNG

>>108368243
Can't get the ozone out of it.

Anonymous
03/14/26(Sat)06:11:14 No.108368894

Anonymous 03/14/26(Sat)06:11:14 No.108368894

>>108368835
I wouldn't mind if these were DS because 3.0 was rather crappy, and they tuned it into greatness. Unless they're back to being completely unmemorable like the pre-3.0 era (though I know this is /lmg/ and some anons used their small coder models), 4.0 can be uninspiring but technologically novel and they'll bring it home with 4.1 or R2.

Anonymous
03/14/26(Sat)06:17:22 No.108368921

Anonymous 03/14/26(Sat)06:17:22 No.108368921

>>108368848
That shit should have never been implemented on the server. That's client-side stuff. The problem started before he got involved, but he's definitely not helping.

Anonymous
03/14/26(Sat)06:18:50 No.108368929

Anonymous 03/14/26(Sat)06:18:50 No.108368929

how do i make qwen3.5 27B not think for 10000 tokens?

Anonymous
03/14/26(Sat)06:19:04 No.108368933

Anonymous 03/14/26(Sat)06:19:04 No.108368933

>>108368672
>no ozone
trash

Anonymous
03/14/26(Sat)06:21:47 No.108368950

Anonymous 03/14/26(Sat)06:21:47 No.108368950

>>108368929
good system prompt + pwilkin's new vibeshitted reasoning budget + end phrase :D
I LOVE VIBEGARBOJ

Anonymous
03/14/26(Sat)06:22:11 No.108368954

Anonymous 03/14/26(Sat)06:22:11 No.108368954

>>108368929
turn the reasoning off with edited template

Anonymous
03/14/26(Sat)06:22:16 No.108368955

Anonymous 03/14/26(Sat)06:22:16 No.108368955

>>108368929
Prefill <think></think>

Anonymous
03/14/26(Sat)06:39:43 No.108369021

Anonymous 03/14/26(Sat)06:39:43 No.108369021

File: 1639692511780.jpg (249 KB, 1000x998)

249 KB JPG

>>108365171
Thanks for the information/update. I almost missed this because I was working on ASR-related AI stuff the other day. I pushed the changes into the main repo with some minor edits. Onnx runtime should now default to using the more updated version cmake pulls by default, so you won't have to pull in the dll yourself manually.

Very interested to see what the performance looks like on other machines. If you could share a screenshot of the --profile and include what CPU you have for reference I'd greatly appreciate it.

https://github.com/VolgaGerm/PocketTTS.cpp

>>108368198
Also thanks anon for the threadly qrd. I would have missed the update otherwise, lel.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.