/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 12/24/25(Wed)20:19:50 No.107660171

File: hat.jpg (913 KB, 2048x2048)

/lmg/ - Local Models General Anonymous 12/24/25(Wed)20:19:50 No.107660171

/lmg/ - a general dedicated to the discussion and development of local language models.

Christmas Edition

Previous threads: >>107652767 & >>107643997

►News
>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/24/25(Wed)20:20:21 No.107660173

Anonymous 12/24/25(Wed)20:20:21 No.107660173

File: __hatsune_miku_vocaloid_d(...).png (440 KB, 825x707)

440 KB PNG

►Recent Highlights from the Previous Thread: >>107652767

--Reasoning step control tradeoffs and multi-GPU setup fixes in SillyTavern:
>107654025 >107654033 >107654054 >107654563 >107654882 >107655765 >107655833 >107655903 >107656043 >107656116 >107656253 >107656486 >107656988 >107657096 >107657168 >107657180 >107657297 >107657689 >107657823 >107657906 >107658051 >107658061 >107658104 >107658169 >107657498 >107657307 >107657350 >107657351 >107657477 >107657573 >107657627 >107657639 >107657404 >107657294 >107657176 >107657194
--Performance comparison between ik_llama and exllamav3 in VRAM-bound scenarios:
>107656297 >107656349 >107656555 >107656715 >107656838 >107657115
--Resolving GGUF conversion errors with outdated dependencies:
>107659075 >107659099 >107659110 >107659130 >107659134 >107659157 >107659165 >107659117 >107659129
--Cost and performance considerations for Mac-based AI clusters vs traditional GPU setups:
>107657777 >107657794 >107657813 >107657828 >107657854 >107657870 >107657937 >107657853 >107657876 >107657816
--MoE model parameter vs expert count performance analysis:
>107652819 >107652836 >107652840 >107654372
--ARC-AGI 2 achievement and its implications for future LLM advancements:
>107653556 >107653757 >107653789
--Benchmarking GLM-4.7 models with livebench and GGUF format:
>107656875 >107657040 >107657121 >107658101
--GLM 4.7 model performance and quantization calibration controversies:
>107656256 >107656302 >107656312 >107656327 >107656401 >107656577
--llamafile project update from Mozilla.ai:
>107658257
--Post-training resource demands for advanced AI models:
>107653833
--Critique of dense models and praise for alternatives like qwen3:
>107655084
--Logs: GLM-4.7:
>107658013 >107658080
--Miku (free space):
>107652814 >107652980 >107652999 >107653495 >107654563 >107656486 >107656586 >107657689 >107658850 >107659977

►Recent Highlight Posts from the Previous Thread: >>107652827

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/24/25(Wed)20:22:29 No.107660184

Anonymous 12/24/25(Wed)20:22:29 No.107660184

I still can't turn off thinking for GLM 4.7

Anonymous
12/24/25(Wed)20:24:19 No.107660197

Anonymous 12/24/25(Wed)20:24:19 No.107660197

File: 1749875201211488.png (572 KB, 1080x1259)

572 KB PNG

>>107660184
Works for me

Anonymous
12/24/25(Wed)20:24:20 No.107660198

Anonymous 12/24/25(Wed)20:24:20 No.107660198

>>107660184
<|assistant|>
</think>

Anonymous
12/24/25(Wed)20:24:23 No.107660199

Anonymous 12/24/25(Wed)20:24:23 No.107660199

>24gb
>700eur
>460gb/s bandwidth
really intel?
>24gb under 500 dollars
>it will be 500$
>where's my 600$
>HE STOLE MY 700EUROS
https://videocardz.com/newz/sparkle-says-its-arc-pro-b60-gpus-are-now-available
https://videocardz.com/newz/intel-arc-pro-b60-24gb-workstation-gpu-to-launch-in-europe-mid-to-late-november-starting-at-e769
who is this card for?
>770euros

Anonymous
12/24/25(Wed)20:24:51 No.107660202

Anonymous 12/24/25(Wed)20:24:51 No.107660202

>>107660197
Wtf? I didn't post that pic

Anonymous
12/24/25(Wed)20:25:19 No.107660206

Anonymous 12/24/25(Wed)20:25:19 No.107660206

>>107660198
still thinks

Anonymous
12/24/25(Wed)20:29:06 No.107660226

Anonymous 12/24/25(Wed)20:29:06 No.107660226

>>107660199
>https://youtu.be/0qS6HmiRNzE
>llama 70b
>5t/s
>that gpu utilization
its OVER

Anonymous
12/24/25(Wed)20:31:21 No.107660238

Anonymous 12/24/25(Wed)20:31:21 No.107660238

File: file.png (698 KB, 1920x1075)

698 KB PNG

>>107660199
>23t/s with qwen3 30b a3b
>23t/s
>on empty context

Anonymous
12/24/25(Wed)20:32:01 No.107660245

Anonymous 12/24/25(Wed)20:32:01 No.107660245

>>107660199
Enterprise™

Anonymous
12/24/25(Wed)20:32:19 No.107660248

Anonymous 12/24/25(Wed)20:32:19 No.107660248

What do you mean local. Is everyone here a billionaire? How are you fuckers affording anything?

Anonymous
12/24/25(Wed)20:32:42 No.107660252

Anonymous 12/24/25(Wed)20:32:42 No.107660252

>>107660238
don't sound right. the software must be horrible.

Anonymous
12/24/25(Wed)20:32:52 No.107660254

Anonymous 12/24/25(Wed)20:32:52 No.107660254

>>107660248
Most of us have gainful employment. Shocking, I know.

Anonymous
12/24/25(Wed)20:40:32 No.107660297

Anonymous 12/24/25(Wed)20:40:32 No.107660297

>>107660254
Considering most people don't earn more than 100k a year, I still don't see it.

Anonymous
12/24/25(Wed)20:41:04 No.107660301

Anonymous 12/24/25(Wed)20:41:04 No.107660301

>>107660248
No but I have a job

Anonymous
12/24/25(Wed)20:41:10 No.107660302

Anonymous 12/24/25(Wed)20:41:10 No.107660302

File: 1000006812.png (377 KB, 593x602)

377 KB PNG

I downloaded locallm and a gpt oss 20b uncensored model and now I'm drawing a blank on what to try. What can I do with local models besides cooming and coding?

Anonymous
12/24/25(Wed)20:42:04 No.107660306

Anonymous 12/24/25(Wed)20:42:04 No.107660306

>>107660302
wife agent

Anonymous
12/24/25(Wed)20:42:05 No.107660307

Anonymous 12/24/25(Wed)20:42:05 No.107660307

>>107660297
most people ITT bought their ram maxxed hardware for deepseek/kimi/glm before the ram price surge
some anons run deepseek on hardware that cost like 1000-1500$

Anonymous
12/24/25(Wed)20:49:07 No.107660355

Anonymous 12/24/25(Wed)20:49:07 No.107660355

>>107660248
I'm not American, so I can splurge a little.

Anonymous
12/24/25(Wed)20:50:11 No.107660362

Anonymous 12/24/25(Wed)20:50:11 No.107660362

>>107660302
Vibe code a revolutionary app.

Anonymous
12/24/25(Wed)20:51:24 No.107660370

Anonymous 12/24/25(Wed)20:51:24 No.107660370

>>107660248
If you don't mind low speeds and sloppy slop you can run quantized models on most PC hardware

Anonymous
12/24/25(Wed)20:52:41 No.107660376

Anonymous 12/24/25(Wed)20:52:41 No.107660376

File: 1760999571161174.jpg (38 KB, 460x490)

38 KB JPG

>>107660302
>coding
>toss 20B
You can remove that part

Anonymous
12/24/25(Wed)20:52:46 No.107660378

Anonymous 12/24/25(Wed)20:52:46 No.107660378

File: file.png (195 KB, 655x984)

195 KB PNG

so this is the power of llama1 7b...

Anonymous
12/24/25(Wed)20:54:15 No.107660386

Anonymous 12/24/25(Wed)20:54:15 No.107660386

GLM4.7 feels like the K2-0905 to 4.6's K2-0726 or the 4.5 Opus to 4.6's 4.1 Opus. Everything really is going down the shitter.

Anonymous
12/24/25(Wed)20:55:12 No.107660393

Anonymous 12/24/25(Wed)20:55:12 No.107660393

File: 1750555438514899.jpg (1.29 MB, 1764x875)

1.29 MB JPG

>>107660248
Trillionaire actually

Anonymous
12/24/25(Wed)21:00:04 No.107660418

Anonymous 12/24/25(Wed)21:00:04 No.107660418

>>107660307
Yup, my Rome DDR4 build from like 2 years ago runs DS reasonably and was like $1500 at the time (not counting the 3090s I already had).

Anonymous
12/24/25(Wed)21:02:48 No.107660437

Anonymous 12/24/25(Wed)21:02:48 No.107660437

>>107660307
That's the only benefit of being a /lmg/ resident. Buying hardware before their price surge.

Anonymous
12/24/25(Wed)21:06:46 No.107660462

Anonymous 12/24/25(Wed)21:06:46 No.107660462

>>107660248
>How are you fuckers affording anything?
No poors allowed

Anonymous
12/24/25(Wed)21:08:48 No.107660468

Anonymous 12/24/25(Wed)21:08:48 No.107660468

>>107660462
Is middle class at least tolerated?

Anonymous
12/24/25(Wed)21:10:45 No.107660479

Anonymous 12/24/25(Wed)21:10:45 No.107660479

>>107660468
As long as you don't get too far in debt

Anonymous
12/24/25(Wed)21:11:58 No.107660485

Anonymous 12/24/25(Wed)21:11:58 No.107660485

>>107660248
We dont. Most are coping with a small model

Anonymous
12/24/25(Wed)21:12:11 No.107660487

Anonymous 12/24/25(Wed)21:12:11 No.107660487

>>107660248
3.5T/s for 4.7 with just a high end gayming desktop.

Anonymous
12/24/25(Wed)21:12:38 No.107660491

Anonymous 12/24/25(Wed)21:12:38 No.107660491

>>107660248
it only costs 300k for a full h200 server

Anonymous
12/24/25(Wed)21:13:37 No.107660496

Anonymous 12/24/25(Wed)21:13:37 No.107660496

>>107660386
That's precisely how I feel about it. Can't they make more "calm" models?

Anonymous
12/24/25(Wed)21:13:47 No.107660500

Anonymous 12/24/25(Wed)21:13:47 No.107660500

>>107660197
How do they already have a datapoint for 2100?

Anonymous
12/24/25(Wed)21:14:45 No.107660505

Anonymous 12/24/25(Wed)21:14:45 No.107660505

>>107660491
>only

Anonymous
12/24/25(Wed)21:46:30 No.107660722

Anonymous 12/24/25(Wed)21:46:30 No.107660722

>>107660491
For that price you could rent that same h200 rack on vast.ai for two years straight lmao

Anonymous
12/24/25(Wed)21:47:27 No.107660729

Anonymous 12/24/25(Wed)21:47:27 No.107660729

>>107660722
and after 2 years you would have nothing

Anonymous
12/24/25(Wed)21:49:37 No.107660745

Anonymous 12/24/25(Wed)21:49:37 No.107660745

>>107660729
Your h200 servers would have been obsolete in 2 years anyways

Anonymous
12/24/25(Wed)21:51:12 No.107660763

Anonymous 12/24/25(Wed)21:51:12 No.107660763

>>107660745
they wouldn't be obsolete even under the most rushed deprecation schedule possible, but /g/tards love pretending things are obsolete

Anonymous
12/24/25(Wed)21:55:41 No.107660796

Anonymous 12/24/25(Wed)21:55:41 No.107660796

>>107660763
If they won't be obsolete why is Nvidia selling GPUs with buyback clause?

Anonymous
12/24/25(Wed)21:57:06 No.107660803

Anonymous 12/24/25(Wed)21:57:06 No.107660803

>>107660796
I don't know since I'm not nvidia's sales department
technical obsolesce has a very strenuous relation with sale conditions

Anonymous
12/24/25(Wed)22:03:45 No.107660850

Anonymous 12/24/25(Wed)22:03:45 No.107660850

>Thump-thump. Thump-thump

Anonymous
12/24/25(Wed)22:33:08 No.107661082

Anonymous 12/24/25(Wed)22:33:08 No.107661082

>switch to linux
>2x as fast
>processing takes 1/10th the time
wtf is wrong with windows???

Anonymous
12/24/25(Wed)22:33:24 No.107661084

Anonymous 12/24/25(Wed)22:33:24 No.107661084

>>107661082
jeets

Anonymous
12/24/25(Wed)22:39:43 No.107661127

Anonymous 12/24/25(Wed)22:39:43 No.107661127

Fuck me, with the power of linux I can actually run a 24B model at Q3 now on my poorfag rig. So far, cydonia is way smarter than anything I've used before but feels really sloppy. Any recs? Magidonia?

Anonymous
12/24/25(Wed)22:44:16 No.107661160

Anonymous 12/24/25(Wed)22:44:16 No.107661160

>>107660202
well now we know what you were previously planning to post on pol, chuddie
i definitely had this happen when i was using kuroba ex though i think, it remembers if you uploaded a pic previously which will only ever fuck you over
>>107661082
Same but i won't pretend i didn't wish i could run this and have text appear like i'm on a 30b a3b moe. Especially when this little fucker decides to spend 10000 tokens on thought

Anonymous
12/24/25(Wed)22:47:07 No.107661188

Anonymous 12/24/25(Wed)22:47:07 No.107661188

>>107661127
dont use drummerslop models.

Anonymous
12/24/25(Wed)23:00:44 No.107661285

Anonymous 12/24/25(Wed)23:00:44 No.107661285

>>>/v/729277223
>NovelAI's whole thing is being unfiltered.
>Now they offer GLM 4.6 with 32k context, which is pretty good considering that you get unlimited use.
>I think it is a good service and very user friendly.
Yeah, I think I'm sticking with NAI. Z.ai ruined 4.7 with their safety training.

Anonymous
12/24/25(Wed)23:09:08 No.107661340

Anonymous 12/24/25(Wed)23:09:08 No.107661340

>>107661285
4.7 seems as horny as ever. It's just a worse model because it's one of those modern releases that have zero sense for pacing and the ADHD "but wait, self-correction:" style of thinking that's made a horrible return in the past few months.
3.2-Speciale remains the best of the modern bunch because it at least writes well but 4.6 will have to do until that one guy working to implement it is done learning how to vibecode

Anonymous
12/24/25(Wed)23:12:50 No.107661362

Anonymous 12/24/25(Wed)23:12:50 No.107661362

>FirePaintedCydonia
It's slop time

Anonymous
12/24/25(Wed)23:18:14 No.107661391

Anonymous 12/24/25(Wed)23:18:14 No.107661391

>>107661127
>Q3
>24B
you can't be serious

Anonymous
12/24/25(Wed)23:21:21 No.107661416

Anonymous 12/24/25(Wed)23:21:21 No.107661416

>>107661127
try the base model (mistral small)

Anonymous
12/24/25(Wed)23:23:15 No.107661427

Anonymous 12/24/25(Wed)23:23:15 No.107661427

File: 1763657832885601.jpg (353 KB, 1024x1440)

353 KB JPG

>>107660171

Anonymous
12/24/25(Wed)23:24:29 No.107661432

Anonymous 12/24/25(Wed)23:24:29 No.107661432

>>107660171
is trooncinante still the top-tier 12b model? i got so used to its isms i already know what it's going to generate before it does

Anonymous
12/24/25(Wed)23:44:48 No.107661570

Anonymous 12/24/25(Wed)23:44:48 No.107661570

>>107661391
IQ3_M is surprisingly usable at 24B. Wouldn't use it on anything less though.

Anonymous
12/24/25(Wed)23:45:14 No.107661573

Anonymous 12/24/25(Wed)23:45:14 No.107661573

Now that llama.cpp server has model routing support, how to enable prompt caching between model reloads? I want to unload a huge model with slow prompt processing, then reload it again and not have to process the entire prompt.
Cuda dev?

Anonymous
12/24/25(Wed)23:51:20 No.107661606

Anonymous 12/24/25(Wed)23:51:20 No.107661606

>>107661432
>trooncinante
looool

Anonymous
12/24/25(Wed)23:55:07 No.107661636

Anonymous 12/24/25(Wed)23:55:07 No.107661636

>>107661432
Neona is better but all 12B models have brain damage (nemo was dumb to begin with). Be like me and run a 24B at a cope quant.

Anonymous
12/25/25(Thu)00:02:12 No.107661694

Anonymous 12/25/25(Thu)00:02:12 No.107661694

>>107661573
You can configure options for individual models. Do I need to repeat RTFM?

Anonymous
12/25/25(Thu)00:27:03 No.107661834

Anonymous 12/25/25(Thu)00:27:03 No.107661834

>>107661427
It contains a jar of urine.

Anonymous
12/25/25(Thu)00:31:30 No.107661859

Anonymous 12/25/25(Thu)00:31:30 No.107661859

>>107660248
>xhe didn't buy hardware when it was cheap
t.neet

Anonymous
12/25/25(Thu)00:32:34 No.107661869

Anonymous 12/25/25(Thu)00:32:34 No.107661869

>have to put 2 random Chinese runes GLM 4.7 generates sometimes into google translate to get what it was trying to say
Umm, I wasn't having that issue in the main output before. Am I supposed to learn Mandarin to coom efficiently with these new models?

Anonymous
12/25/25(Thu)00:45:02 No.107661943

Anonymous 12/25/25(Thu)00:45:02 No.107661943

>>107660171
merry christmas you insufferable faggots

Anonymous
12/25/25(Thu)00:48:50 No.107661968

Anonymous 12/25/25(Thu)00:48:50 No.107661968

>>107661694
Learn to read. My question isn't about individual models. It's about router behavior. Model cache is lost when model is unloaded. I want to preserve cache in system ram until model loaded again.

Anonymous
12/25/25(Thu)00:56:42 No.107662025

Anonymous 12/25/25(Thu)00:56:42 No.107662025

>>107661968
Save the cache to a file.

Anonymous
12/25/25(Thu)01:09:53 No.107662102

Anonymous 12/25/25(Thu)01:09:53 No.107662102

File: 1757469874450654.jpg (52 KB, 940x1024)

52 KB JPG

>>107660184
>>107660206
>>107660198
Nta. Had this same problem last night but with gpt-oss-20b

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.