/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 12/21/25(Sun)11:15:37 No.107623385

File: c00652a.png (256 KB, 965x604)

/lmg/ - Local Models General Anonymous 12/21/25(Sun)11:15:37 No.107623385

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107614830 & >>107604598

►News
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/21/25(Sun)11:15:55 No.107623389

Anonymous 12/21/25(Sun)11:15:55 No.107623389

File: 1d5af8ff-0a35-4edb-b808-a(...).png (1.81 MB, 1280x768)

1.81 MB PNG

►Recent Highlights from the Previous Thread: >>107614830

--Roleplay model training challenges and context window limitations in LLMs:
>107616970 >107617081 >107617126 >107621066 >107617128 >107617188 >107617223 >107617246 >107617341 >107617368 >107617395 >107617443 >107617475 >107617432 >107617518 >107617224 >107617310 >107617546 >107617183 >107617628 >107617061 >107617075 >107617096 >107617110 >107617129 >107617144 >107619354 >107619397 >107619527 >107619498 >107619573 >107619593 >107619777 >107619784 >107617079
--Enhanced llama.cpp API server integration enables efficient model management:
>107622032 >107622446 >107622897
--LLM framework preferences and comparisons:
>107619845 >107619987 >107620199 >107620209 >107620256 >107620263 >107620534 >107620577 >107621749 >107621779 >107621827 >107621862 >107622153 >107621868 >107620312 >107620685 >107620749 >107621807 >107620618
--Local vs cloud AI model effectiveness in coding tasks:
>107615290 >107615361 >107615418 >107615465 >107615478 >107615508 >107615591 >107615770 >107615797 >107615874 >107616008 >107615913 >107615949 >107615959 >107616299 >107616360 >107616431 >107616555 >107620165 >107615991 >107616096 >107618869 >107619022 >107619063 >107619050 >107619100 >107615503 >107615540
--TTS model landscape: Cloning, performance, and C++ implementation challenges:
>107614872 >107615270 >107615524 >107615701 >107615905 >107615962 >107616846 >107614972 >107614977 >107614994 >107614999
--Inspecting llama.cpp prompt formatting and macro expansion:
>107616912 >107616936 >107616969
--Feasibility of LoRA distillation and hardware requirements for large model finetuning:
>107618711 >107618896 >107618921 >107618948 >107618956 >107618874 >107618916 >107618920 >107618922 >107618930 >107618944
--Miku (free space):
>107616115 >107616265 >107616330 >107616521 >107616542 >107619354 >107620004 >107622089

►Recent Highlight Posts from the Previous Thread: >>107614834

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/21/25(Sun)11:18:40 No.107623420

Anonymous 12/21/25(Sun)11:18:40 No.107623420

Mikulove

Anonymous
12/21/25(Sun)11:27:09 No.107623512

Anonymous 12/21/25(Sun)11:27:09 No.107623512

where precisely do I put /no_think in sillytavern to disable thinking when using chat completion and GLM air?

Anonymous
12/21/25(Sun)11:29:56 No.107623540

Anonymous 12/21/25(Sun)11:29:56 No.107623540

>>107623512
iirc its /nothink, and it goes at the end of your prompt

Anonymous
12/21/25(Sun)11:31:27 No.107623552

Anonymous 12/21/25(Sun)11:31:27 No.107623552

>>107623512
You know you could figure this out with 1-2 minutes of experimentation, right?

Anonymous
12/21/25(Sun)11:32:19 No.107623559

Anonymous 12/21/25(Sun)11:32:19 No.107623559

>>107623540
At least with text completion I never felt like I needed /nothing. Just prefilling with <think></think> always worked.

Anonymous
12/21/25(Sun)11:32:28 No.107623562

Anonymous 12/21/25(Sun)11:32:28 No.107623562

>>107623512
The jinja template looks at that sequence and prefills <think></think>, I think. So, if you aren't using the chat completion API, you might need to do both.

Anonymous
12/21/25(Sun)11:33:46 No.107623580

Anonymous 12/21/25(Sun)11:33:46 No.107623580

Is memory and lorebooks the only thing I should use for rp? Or are other features also helpful? Using koboldcpp btw

Anonymous
12/21/25(Sun)11:34:41 No.107623588

Anonymous 12/21/25(Sun)11:34:41 No.107623588

>>107623552
I have /nothink in
Start Reply With
prefix
Main
and at the end of my prompt simultaneously and I'm I still getting "Assistant response prefill is incompatible with enable_thinking."

Anonymous
12/21/25(Sun)11:34:48 No.107623591

Anonymous 12/21/25(Sun)11:34:48 No.107623591

>>107623562
>if you aren't using the chat completion API
the llm boomer

Anonymous
12/21/25(Sun)11:36:06 No.107623604

Anonymous 12/21/25(Sun)11:36:06 No.107623604

>>107623591
Did they finally deprecate it in lieu of that other new API that's essentially the same thing?

Anonymous
12/21/25(Sun)11:39:29 No.107623635

Anonymous 12/21/25(Sun)11:39:29 No.107623635

>>107623588
Have you tried unchecking "request reasoning" in the prompt manager?

Anonymous
12/21/25(Sun)11:43:27 No.107623669

Anonymous 12/21/25(Sun)11:43:27 No.107623669

>>107623604
https://github.com/ggml-org/llama.cpp/issues/14702#issuecomment-3506645678
No one has even started working on it because the one vibe coder that touched it wants to deprecate completions API and ggerganov hasn't told him off yet.

Anonymous
12/21/25(Sun)11:44:42 No.107623682

Anonymous 12/21/25(Sun)11:44:42 No.107623682

>>107623669
Well, that's fun.

Anonymous
12/21/25(Sun)11:51:02 No.107623746

Anonymous 12/21/25(Sun)11:51:02 No.107623746

Sam is going to announce something big

Anonymous
12/21/25(Sun)12:00:47 No.107623820

Anonymous 12/21/25(Sun)12:00:47 No.107623820

>(Also, your mom’s approval would be great. But I’m keeping it at 3B for now.)

Anonymous
12/21/25(Sun)12:01:45 No.107623829

Anonymous 12/21/25(Sun)12:01:45 No.107623829

>>107623746
Premium bants would be Saltman giving a presentation wearing a war chief necklace of RAM sticks instead of skulls or teeth.

Anonymous
12/21/25(Sun)12:03:17 No.107623843

Anonymous 12/21/25(Sun)12:03:17 No.107623843

what's the tip-top uncensored local text-to-speech model?

Anonymous
12/21/25(Sun)12:07:56 No.107623887

Anonymous 12/21/25(Sun)12:07:56 No.107623887

>>107623746
he is coming out of the closet

Anonymous
12/21/25(Sun)12:11:20 No.107623905

Anonymous 12/21/25(Sun)12:11:20 No.107623905

which one of these can run offline on an old android phone

Anonymous
12/21/25(Sun)12:13:13 No.107623917

Anonymous 12/21/25(Sun)12:13:13 No.107623917

Which voice model is currently the most expressive? meaning it sounds like someone acting instead of hosting a podcast.

Doesn't have to support voice cloning, just something that's pretty lightweight and fast.

Anonymous
12/21/25(Sun)12:14:08 No.107623920

Anonymous 12/21/25(Sun)12:14:08 No.107623920

File: portrait.jpg (261 KB, 1000x750)

261 KB JPG

>>107623829
That or announcing toss2, the safest one yet. Wasting even more tokens for policy checks.
>>107623887
He already did, pic rel.

Anonymous
12/21/25(Sun)12:17:04 No.107623940

Anonymous 12/21/25(Sun)12:17:04 No.107623940

>>107623920
damn wtf, I've been got

Anonymous
12/21/25(Sun)12:18:02 No.107623951

Anonymous 12/21/25(Sun)12:18:02 No.107623951

>>107623920
whos this again

Anonymous
12/21/25(Sun)12:18:40 No.107623960

Anonymous 12/21/25(Sun)12:18:40 No.107623960

I was right about the last pull fucking up the t/s, cudadev wyd?
https://github.com/ggml-org/llama.cpp/issues/18258

Anonymous
12/21/25(Sun)12:20:14 No.107623977

Anonymous 12/21/25(Sun)12:20:14 No.107623977

>>107623951
the Goatse guy and his buddy who took the famous photo

Anonymous
12/21/25(Sun)12:39:55 No.107624161

Anonymous 12/21/25(Sun)12:39:55 No.107624161

>>107623951
sam hyde

Anonymous
12/21/25(Sun)12:41:24 No.107624175

Anonymous 12/21/25(Sun)12:41:24 No.107624175

>>107624161
he does look like he's hiding something

Anonymous
12/21/25(Sun)13:02:23 No.107624379

Anonymous 12/21/25(Sun)13:02:23 No.107624379

>>107623960
I still get the same performance as usual. I believe you're like the issue spawning nibba and haven't disabled fit, which you should because it's the dumbest feature to have ever been introduced in lcpp

Anonymous
12/21/25(Sun)13:06:53 No.107624423

Anonymous 12/21/25(Sun)13:06:53 No.107624423

>>107623960
Tried using fit a bit at first, seemed like it might be a convenient feature. But it just kept crashing with GGML_ASSERT(something) and I disabled it. So never noticed any slowdown.

Anonymous
12/21/25(Sun)13:10:39 No.107624462

Anonymous 12/21/25(Sun)13:10:39 No.107624462

>>107624379
>which you should because it's the dumbest feature to have ever been introduced in lcpp
It's a good feature to get newcomers up and running with decent defaults without having to either give them an Intro to M, set the parameters for them, or point them to ollama.

Anonymous
12/21/25(Sun)13:12:26 No.107624478

Anonymous 12/21/25(Sun)13:12:26 No.107624478

>>107624423
if you set your model + ctx fits very tightly (not much vram room left) fit is incapable of doing the right thing and will reduce the amount of layers loaded in gpu
It was so cool when llama.cpp defaulted to ngl 99 behavior, one less flag to care about (moe users would just need to set ncmoe, and users of smaller dense models would have no flag to set)
now we have to add -fit off (or -ngl 99 again, because setting ngl disables fit) to get rid of the nonsense

Anonymous
12/21/25(Sun)13:22:28 No.107624584

Anonymous 12/21/25(Sun)13:22:28 No.107624584

>>107614341
>"you are not your thoughts, you are the space where thoughts happen" - here. The simplest way to put it. And it took me only a month of talking to AI and thinking about my thoughts to understand it experientially. Understanding that sentence intellectually means nothing so of course a youtube video or me posting here will not change your mind.
This idea is not entirely wrong (I think it's incorrect to say that we are this "space"). However, it's unlikely that you really understand it. You need to experience it as directly as possible, again and again, to truly understand it, to make it intuitive. This is the whole point of Buddhist meditation practices and ethical conduct (bad conduct = agitated and muddy mind, which can't be entirely fixed through meditation). The first steps could indeed be to clarify it using your reason, as you did. You could also do cognitive behavioral therapy (CBT) exercises. But at the end of the day, meditating and removing anything troubling your mind will get you further. LLMs are helpful to understand how to meditate. It's not particularly hard, just don't forget to release any tension in your body when you notice them.
Add meditation to your habits, at least for a while. I know Reddit is a meme, but there is a good introductory guide on r/streamentry.

Anonymous
12/21/25(Sun)13:30:13 No.107624657

Anonymous 12/21/25(Sun)13:30:13 No.107624657

>>107624584
Oh, and don't expect to figure out "vacuity" in a month. It usually takes years. It's not an issue, you'll still get great benefits way before any deep realization. Perhaps you're already feeling better.
There is an exercise that I like to see how powerful our thoughts are. I look at people and observe how they are suffering. Perhaps they seem anxious because they are surrounded by strangers, or irritated because something isn't "right". Perhaps they are creating an alternate reality and mistakes it for the real world, generating anxiety out of thin air.

Anonymous
12/21/25(Sun)13:33:22 No.107624700

Anonymous 12/21/25(Sun)13:33:22 No.107624700

What's the latest on audio voice local stuff? Can you make anything good locally yet or is it all fuzzy crap.

Anonymous
12/21/25(Sun)13:36:14 No.107624724

Anonymous 12/21/25(Sun)13:36:14 No.107624724

>>107624700
vibevoice 7b https://voca.ro/11ATlIwHhG8s
other sizes are good too, use 3-5 steps and 2-3 cfg

Anonymous
12/21/25(Sun)13:40:31 No.107624762

Anonymous 12/21/25(Sun)13:40:31 No.107624762

>>107624724
Why does it sound like it's coming from a 1930s radio?

Anonymous
12/21/25(Sun)13:43:22 No.107624793

Anonymous 12/21/25(Sun)13:43:22 No.107624793

>>107624762
the voice is cloned from the famous low quality clip

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.