/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 12/25/25(Thu)19:22:10 No.107668478

File: IMG_9037.png (2.14 MB, 1024x1536)

/lmg/ - Local Models General Anonymous 12/25/25(Thu)19:22:10 No.107668478

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107660171 & >>107652767

►News
>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/25/25(Thu)19:22:31 No.107668479

Anonymous 12/25/25(Thu)19:22:31 No.107668479

File: xmas migu.jpg (828 KB, 1408x752)

828 KB JPG

►Recent Highlights from the Previous Thread: >>107660171

--Hardware upgrades for running quantized AI models and multi-GPU configuration:
>107666345 >107666402 >107666408 >107666433 >107666445 >107666462 >107666463
--Crafting effective system prompts for specific LLM behavior:
>107664962 >107665087 >107665580 >107666887 >107666904 >107667023 >107667116 >107667119 >107667145 >107667173 >107667303 >107665026 >107665077 >107665090 >107665101 >107665167
--Proposing MoE-LLM architecture with LRU expert cache for local deployment:
>107666054 >107666073 >107666262 >107666288 >107666325
--Challenges in implementing persistent prompt caching for model routers:
>107661573 >107661968 >107662025 >107662156 >107662236 >107662311 >107662450 >107662575 >107662599 >107662681 >107662725 >107662782 >107663650 >107663642
--Debating GLM 4.7's censorship:
>107664203 >107664223 >107664270 >107664467 >107664687 >107664738 >107664766 >107665148 >107665188 >107665225 >107665325 >107665461 >107665807 >107665943 >107666016 >107666035 >107665706
--Challenges with disabling reasoning in AI models and potential workarounds:
>107660184 >107660198 >107662102 >107662167 >107663018 >107663033 >107663057 >107663102 >107663672 >107663721 >107666949 >107668010 >107667675
--Intel Arc Pro B60 GPU pricing and performance debates:
>107660199 >107660226 >107662460 >107662477 >107662497 >107662506 >107662572 >107662640 >107662650 >107662651 >107665998
--Repurposing NAS with S3 storage as HuggingFace API-compatible model source:
>107667186 >107667290
--Model limitations in generating complex vector graphics:
>107664405 >107664417 >107664438 >107664455 >107664461 >107664472 >107665953
--Translating NSFW ASMR/JAV audio with specialized Whisper models:
>107664657 >107664716 >107664792
--Miku (free space):
>107661427 >107663987 >107664266 >107667371 >107668010 >107668457 >107664599

►Recent Highlight Posts from the Previous Thread: >>107660173

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/25/25(Thu)19:24:29 No.107668489

Anonymous 12/25/25(Thu)19:24:29 No.107668489

>>107668478
v4 when

Anonymous
12/25/25(Thu)19:30:05 No.107668521

Anonymous 12/25/25(Thu)19:30:05 No.107668521

>>107668489
Assuming DeepSeek operates on Pacific Standard Time, there's still a chance for V4 within the next 6 and a half hours.

Anonymous
12/25/25(Thu)19:30:40 No.107668527

Anonymous 12/25/25(Thu)19:30:40 No.107668527

There's still nothing better than mythomax for rp, is there?

Anonymous
12/25/25(Thu)19:31:59 No.107668532

Anonymous 12/25/25(Thu)19:31:59 No.107668532

>>107668489
We don't even have 3.2 support in llama.cpp yet.

Anonymous
12/25/25(Thu)19:32:01 No.107668533

Anonymous 12/25/25(Thu)19:32:01 No.107668533

>>107668527
new drummer just dropped: https://huggingface.co/QuixiAI/Ina-v11.1

Anonymous
12/25/25(Thu)19:32:36 No.107668536

Anonymous 12/25/25(Thu)19:32:36 No.107668536

>>107668527
Miqu.
Kimi.

Anonymous
12/25/25(Thu)19:32:47 No.107668538

Anonymous 12/25/25(Thu)19:32:47 No.107668538

>>107668532
we don't even have deepseek_v32 support in mf'ing _transformers_ yet for god sake

Anonymous
12/25/25(Thu)19:34:04 No.107668547

Anonymous 12/25/25(Thu)19:34:04 No.107668547

>>107668532
vllm doesn't have this issue :^)

Anonymous
12/25/25(Thu)19:35:36 No.107668555

Anonymous 12/25/25(Thu)19:35:36 No.107668555

>>107668547
Let me know when someone makes a 2 bit quant for vllm.

Anonymous
12/25/25(Thu)19:36:23 No.107668560

Anonymous 12/25/25(Thu)19:36:23 No.107668560

Is 3.2 good for sex?

Anonymous
12/25/25(Thu)19:38:51 No.107668575

Anonymous 12/25/25(Thu)19:38:51 No.107668575

GLM Air or Gemma better for safe and not safe rp?

Anonymous
12/25/25(Thu)19:41:28 No.107668598

Anonymous 12/25/25(Thu)19:41:28 No.107668598

>>107668555
Be the change you want to see

Anonymous
12/25/25(Thu)19:42:45 No.107668602

Anonymous 12/25/25(Thu)19:42:45 No.107668602

>>107668575
Yes... Gemma good 100%

Anonymous
12/25/25(Thu)19:42:54 No.107668603

Anonymous 12/25/25(Thu)19:42:54 No.107668603

>>107668532
v4 would be big news and they would have to make adding support a priority

Anonymous
12/25/25(Thu)19:46:00 No.107668628

Anonymous 12/25/25(Thu)19:46:00 No.107668628

>>107668527
have you tried Mixtral-8x7B ?
it's probably the best model i had when i had only 16gb vram

Anonymous
12/25/25(Thu)19:48:10 No.107668639

Anonymous 12/25/25(Thu)19:48:10 No.107668639

>>107668575
air is miles better for both.

Anonymous
12/25/25(Thu)20:05:10 No.107668737

Anonymous 12/25/25(Thu)20:05:10 No.107668737

Kimi thinking IQ4-XS or Q4_0 or Q4_K_M?

Anonymous
12/25/25(Thu)20:14:51 No.107668794

Anonymous 12/25/25(Thu)20:14:51 No.107668794

>>107668737
IQ4-XS. imatrix quants are generally better than normal quants at the same bit count. Don't download an unsloth quant.Those are always broken.

Anonymous
12/25/25(Thu)20:20:29 No.107668821

Anonymous 12/25/25(Thu)20:20:29 No.107668821

>>107668478
I just got A/B tested on Aistudio (while using G3 Pro). Maybe it's Gemma 4? What's weird is that one respond used internet resources, while the other was the raw model (I did not activate web searches). The new model was much better: It was an actual written respond and not a list of lists of bulleted 6-word sentences.

Anonymous
12/25/25(Thu)20:24:05 No.107668841

Anonymous 12/25/25(Thu)20:24:05 No.107668841

>>107665998
EP no, marlin kind of, they've done a lot of kernel support for shit like fp8 but you won't just load fp8 off huggingface, if you can quant the model you'll likely get anything to run, I got devstral2 123b working recently but I had to quant it to W4A16.

if you have the system resources you can dynamically at load time turn fp16 into fp8 or mxfp4

some things are still cuda identified/cuda only in VLLM like cpu offload gb and some of the dtypes that intel actually supports, I'm not confident they'll get there in the end but it's fun to tinker with.

I got cucked by Qwen 3.2 thinking I believe but didn't try too hard to see if it was workable.

It's in good shape, it's not rocm but it's no cuda and probably never will be.

Anonymous
12/25/25(Thu)20:39:21 No.107668907

Anonymous 12/25/25(Thu)20:39:21 No.107668907

For the easiest out of the box setup, can I just get a consumer ryzen, 128gb of ddr5 ram, and a dual 9070xt for a total of 32gb of VRAM?

Anonymous
12/25/25(Thu)20:44:05 No.107668940

Anonymous 12/25/25(Thu)20:44:05 No.107668940

File: kaix.png (89 KB, 789x429)

89 KB PNG

We'll get there howevermany more thousand rolls it takes
Perfection is possible
The models shall delight in every interaction and hardware will be irrelevant
Robowaifus will be truly viable for those choosing that path
I believe

Anonymous
12/25/25(Thu)20:47:26 No.107668955

Anonymous 12/25/25(Thu)20:47:26 No.107668955

>>107668940
btw I don't want to be near a stinky ass irl it's merely an interesting test-case
honest

Anonymous
12/25/25(Thu)20:49:50 No.107668964

Anonymous 12/25/25(Thu)20:49:50 No.107668964

File: 1762052533318086.gif (140 KB, 379x440)

140 KB GIF

>>107668940

Anonymous
12/25/25(Thu)20:57:23 No.107668993

Anonymous 12/25/25(Thu)20:57:23 No.107668993

File: 1751138313342076.jpg (294 KB, 784x1168)

294 KB JPG

Anonymous
12/25/25(Thu)20:57:48 No.107668995

Anonymous 12/25/25(Thu)20:57:48 No.107668995

File: file.png (2 KB, 78x78)

2 KB PNG

>>107668940
>oily thicket

Anonymous
12/25/25(Thu)21:00:45 No.107669004

Anonymous 12/25/25(Thu)21:00:45 No.107669004

>>107668940
Impressive.
Not excuse me while I puke.

Anonymous
12/25/25(Thu)21:01:00 No.107669006

Anonymous 12/25/25(Thu)21:01:00 No.107669006

>>107668940
anon that is disgusting worse the cockbortion ngl

Anonymous
12/25/25(Thu)21:01:27 No.107669008

Anonymous 12/25/25(Thu)21:01:27 No.107669008

File: 1746880839747321.jpg (55 KB, 600x445)

55 KB JPG

>>107668940

Anonymous
12/25/25(Thu)21:03:35 No.107669017

Anonymous 12/25/25(Thu)21:03:35 No.107669017

>>107669006
im sickened but curious

Anonymous
12/25/25(Thu)21:05:11 No.107669022

Anonymous 12/25/25(Thu)21:05:11 No.107669022

File: n1onmlW1Dz1qz9wlpo1_500.gif (2.04 MB, 476x268)

2.04 MB GIF

>>107668940

Anonymous
12/25/25(Thu)21:06:37 No.107669034

Anonymous 12/25/25(Thu)21:06:37 No.107669034

i am convinced nobody actually has any system prompt that makes GLM 4.5 air actually be not cucked

Anonymous
12/25/25(Thu)21:06:39 No.107669035

Anonymous 12/25/25(Thu)21:06:39 No.107669035

>>107668940
I remind you that our Lord Jesus Christ was born on this day.

Anonymous
12/25/25(Thu)21:07:21 No.107669040

Anonymous 12/25/25(Thu)21:07:21 No.107669040

>>107668907
probably not the best use of your money, but that would be a decent build. quad 16gb 9060xts or 16gb 5060tis might be better.

Anonymous
12/25/25(Thu)21:07:29 No.107669042

Anonymous 12/25/25(Thu)21:07:29 No.107669042

>>107669034
>system prompt
The system prompt is the character card.
I just used a thinking prefill.

Anonymous
12/25/25(Thu)21:07:50 No.107669047

Anonymous 12/25/25(Thu)21:07:50 No.107669047

>>107668940
nvidia made fabs... for this?

Anonymous
12/25/25(Thu)21:08:46 No.107669052

Anonymous 12/25/25(Thu)21:08:46 No.107669052

>>107668955
It's pretty funny. Nice alternative to... you know what... Just say the word!

Anonymous
12/25/25(Thu)21:15:30 No.107669093

Anonymous 12/25/25(Thu)21:15:30 No.107669093

>>107668995
stirred something inside you didn't it?

Anonymous
12/25/25(Thu)21:17:33 No.107669099

Anonymous 12/25/25(Thu)21:17:33 No.107669099

>>107669047
It's the perfect task for an text generative AI. It's not like a human would ever be so despite as to write that for others.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.