/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 01/11/26(Sun)21:05:37 No.107838898

File: media_G7ulKxsaQAAxOKT.jpg (318 KB, 1414x2000)

/lmg/ - Local Models General Anonymous 01/11/26(Sun)21:05:37 No.107838898

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107834480 & >>107826643

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/11/26(Sun)21:06:10 No.107838903

Anonymous 01/11/26(Sun)21:06:10 No.107838903

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>107834480

--Quantization performance discrepancies in Gemma-3-27B GGUF models:
>107837128 >107837136 >107837144 >107837149 >107837148 >107837164 >107837180 >107837192 >107837306 >107837197
--Local AI models for image feedback and multimodal tasks:
>107835848 >107835882 >107835900 >107835915 >107835941 >107835980 >107835996 >107836032 >107836089 >107836175 >107836347
--Running LLMs on low VRAM hardware with quantization and CPU offloading:
>107837436 >107837473 >107837514 >107837539 >107837558 >107837573 >107837603 >107837609 >107837633 >107837639
--Self-taught AI learner's motivation vs. math complexity challenges:
>107835318 >107835331 >107835431 >107835463 >107835488 >107835627 >107835644 >107835375 >107835403 >107835383
--PowerShell cmd confusion with Gemma model response critique:
>107835679 >107835772 >107835785 >107835832
--Ethical and practical concerns about AI-generated PR descriptions on llama.cpp's GitHub:
>107837074 >107837130
--Struggles with model size limitations and RAM requirements for large vision models:
>107836240 >107836244 >107836537 >107836554 >107836556 >107836593 >107836268 >107836272 >107836286
--Extensive banned token list with model-specific customizations:
>107835736 >107835765
--Meta's controversial nuclear energy investment for AI criticized as misguided:
>107835873
--Critique of complex ST webui implementation:
>107834750
--Optimizing documentation vectorization through token-efficient formatting:
>107835121 >107835923
--File numbering organization debates in documentation directories:
>107834742 >107834901 >107835077
--CPU-optimized GPT-SoVITS via ONNX inference engine:
>107836452
--Teto (free space):
>107835833

►Recent Highlight Posts from the Previous Thread: >>107834483

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/11/26(Sun)21:12:14 No.107838945

Anonymous 01/11/26(Sun)21:12:14 No.107838945

>>107838898
The box has clearly just been opened, why is her hair outside the box?

Anonymous
01/11/26(Sun)21:13:13 No.107838951

Anonymous 01/11/26(Sun)21:13:13 No.107838951

Is the Pangu age finally upon us?

Anonymous
01/11/26(Sun)21:16:35 No.107838976

Anonymous 01/11/26(Sun)21:16:35 No.107838976

Sup qt's

Anonymous
01/11/26(Sun)21:29:10 No.107839038

Anonymous 01/11/26(Sun)21:29:10 No.107839038

>>107838951
Yes, 鬼佬 maximum white man benchmark score best model revolutionary intelligence new LLM AI for men, women, children, pets

Anonymous
01/11/26(Sun)21:58:54 No.107839186

Anonymous 01/11/26(Sun)21:58:54 No.107839186

Do you think they will start up production of additional ram or do you think cloud models will be the only way to go in the future?

Anonymous
01/11/26(Sun)22:00:37 No.107839196

Anonymous 01/11/26(Sun)22:00:37 No.107839196

>>107839186
No, everyone expects the bubble to pop

Anonymous
01/11/26(Sun)22:01:50 No.107839204

Anonymous 01/11/26(Sun)22:01:50 No.107839204

File: 1539701490464.jpg (176 KB, 1022x688)

176 KB JPG

Why are all templates greyed out in ST when using chat completion? And why are chat and text completion two differrent things?

Anonymous
01/11/26(Sun)22:02:08 No.107839208

Anonymous 01/11/26(Sun)22:02:08 No.107839208

why do a lot of you favor GLM 4.6 and not 4.7? I was running 4.6 for awhile but I've been liking 4.7, it feels different. just curious

Anonymous
01/11/26(Sun)22:03:32 No.107839213

Anonymous 01/11/26(Sun)22:03:32 No.107839213

>>107839204
chat completion applies model default templates on the backend

Anonymous
01/11/26(Sun)22:13:08 No.107839240

Anonymous 01/11/26(Sun)22:13:08 No.107839240

>>107839208
>why do a lot of you favor GLM 4.6 and not 4.7?
Only NAI shills do.

Anonymous
01/11/26(Sun)22:17:29 No.107839250

Anonymous 01/11/26(Sun)22:17:29 No.107839250

File: 1760515040413703.png (458 KB, 800x800)

458 KB PNG

>>107838898
>open box
>I accidentally Miku

Anonymous
01/11/26(Sun)22:44:12 No.107839363

Anonymous 01/11/26(Sun)22:44:12 No.107839363

>>107839196
How bad will things get if the bubble simply doesn't pop?

Anonymous
01/11/26(Sun)22:45:24 No.107839370

Anonymous 01/11/26(Sun)22:45:24 No.107839370

>>107839363
When rich people start becoming bothered by it

Anonymous
01/11/26(Sun)22:46:31 No.107839377

Anonymous 01/11/26(Sun)22:46:31 No.107839377

>>107839186
>>107839363
If you're not poor it's actually extremely funny to watch on the sidelines. I hope it won't pop.

Anonymous
01/11/26(Sun)22:47:05 No.107839383

Anonymous 01/11/26(Sun)22:47:05 No.107839383

File: 1754908814745187.gif (815 KB, 498x275)

815 KB GIF

>>107839196
>he thinks RAM prices are going to return to normal after the bubble pops

Anonymous
01/11/26(Sun)22:54:46 No.107839425

Anonymous 01/11/26(Sun)22:54:46 No.107839425

OP's pic reminded me of when I had to put my cat that died in a shoebox to take him in for cremation and now i'm sad

Anonymous
01/11/26(Sun)23:02:47 No.107839488

Anonymous 01/11/26(Sun)23:02:47 No.107839488

>>107839383
retard

Anonymous
01/11/26(Sun)23:05:24 No.107839499

Anonymous 01/11/26(Sun)23:05:24 No.107839499

>>107839208
I like both for different reasons. 4.7 is smarter and tends to stick to the prompt more closely. Often to an autistic degree which sometimes leads to issues with cards that aren't tightly written. 4.6 feels more flexible and creative out of the box.
But I've had a lot of success with 4.7 here as well after adjusting my older cards to cover for that.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.