/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 12/20/25(Sat)15:18:32 No.107614830

File: 1604345226030.jpg (884 KB, 1340x1000)

/lmg/ - Local Models General Anonymous 12/20/25(Sat)15:18:32 No.107614830

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107604598 & >>107595736

►News
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/20/25(Sat)15:18:52 No.107614834

Anonymous 12/20/25(Sat)15:18:52 No.107614834

File: 1709494521677.jpg (70 KB, 357x480)

70 KB JPG

►Recent Highlights from the Previous Thread: >>107604598

--MiMo-V2-Flash release and SWA vs GA debate in long-context model training:
>107610359 >107611212 >107611389 >107611538 >107611607 >107611641 >107611726 >107612033 >107612057 >107612103 >107612118 >107612128 >107612346 >107612451
--Evaluating modern language models for roleplay efficiency vs cost:
>107607970 >107608333 >107608570 >107608582 >107608588 >107608691 >107608373 >107608741 >107608449 >107608489 >107608674 >107608716 >107608757 >107608804 >107608943
--Skepticism about undertrained historical LLM project despite potential for time-gated models:
>107606603 >107606628 >107606672 >107606741 >107606844 >107606718 >107606784 >107606757 >107606770 >107606932
--MiraTTS model critique and audio quality debate:
>107607864 >107607903 >107608117 >107609014 >107609123 >107609241 >107609363
--Cost and performance considerations for local AI builds:
>107604956 >107604998 >107605235 >107605247
--Gemma's hiring for multimodal AI research sparks skepticism about job requirements:
>107613937 >107613974 >107614102 >107614027
--Google MedASR release and Llama 4 speculation:
>107605308 >107605884 >107606028
--LLMs as psychological tools: experiences, limitations, and philosophical debates:
>107607433 >107607442 >107613464 >107607458 >107607522 >107607532 >107607610 >107607651 >107607747 >107607825 >107607872 >107607896 >107607918 >107607942 >107607954 >107607972 >107607912 >107607580 >107607725 >107607757 >107607832 >107607734 >107607907 >107607945 >107608083 >107609148 >107613528 >107613599 >107613646 >107613702 >107613792 >107613896 >107613644 >107613717 >107613821 >107613858 >107613912 >107613990 >107614028 >107614195 >107614243 >107614338 >107614050 >107614222
--/lmg/ Book Club:
>107606330 >107606677
--Miku (free space):
>107604899 >107614611 >107614788

►Recent Highlight Posts from the Previous Thread: >>107604607

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
12/20/25(Sat)15:22:45 No.107614872

Anonymous 12/20/25(Sat)15:22:45 No.107614872

There were a lot of new TTS models released the past few weeks, do any of them have a better quality-and-speed-to-system-requirements ratio than GPT-SoVITs yet with voice cloning? Or a C++ implementation without needing a VENV?

Anonymous
12/20/25(Sat)15:23:34 No.107614882

Anonymous 12/20/25(Sat)15:23:34 No.107614882

>>107614872
Not really, sovits is still the best

Anonymous
12/20/25(Sat)15:24:26 No.107614887

Anonymous 12/20/25(Sat)15:24:26 No.107614887

So let me get this straight. After a week of anons screenshotting some retard on twitter, google released nothing of note?

Anonymous
12/20/25(Sat)15:25:11 No.107614892

Anonymous 12/20/25(Sat)15:25:11 No.107614892

Thoughts on Huihui-Qwen3-VL-32B-Thinking-abliterated? Apparently it's the first 100% uncensored model on the UGI leaderboard. Will do literally anything you ask.

Anonymous
12/20/25(Sat)15:25:39 No.107614895

Anonymous 12/20/25(Sat)15:25:39 No.107614895

>>107614887
As in last thread. Just use 4.6 to destroy your identity and then you won't care about new models not releasing.

Anonymous
12/20/25(Sat)15:25:45 No.107614896

Anonymous 12/20/25(Sat)15:25:45 No.107614896

>>107614887
MedASR and FunctionGemma sir?

Anonymous
12/20/25(Sat)15:26:31 No.107614903

Anonymous 12/20/25(Sat)15:26:31 No.107614903

>>107614872
chatterbox/vibevoice/sovits are really the end.
I enjoyed echo-tts but that faggot won't release the cloning part.

Anonymous
12/20/25(Sat)15:28:23 No.107614927

Anonymous 12/20/25(Sat)15:28:23 No.107614927

>>107614896
>MedASR
English and Chinese only Whisper for medical texts. Oh yeah, that's exactly what we've been waiting for.

Anonymous
12/20/25(Sat)15:28:40 No.107614930

Anonymous 12/20/25(Sat)15:28:40 No.107614930

>>107614872
You don't need more than sovits

Anonymous
12/20/25(Sat)15:34:35 No.107614970

Anonymous 12/20/25(Sat)15:34:35 No.107614970

>>107614887
which retard?

Anonymous
12/20/25(Sat)15:34:45 No.107614972

Anonymous 12/20/25(Sat)15:34:45 No.107614972

>>107614882
>>107614903
>>107614930
Now if only someone made a llama.cpp/stable-diffusion.cpp style single binary that doesn't require all the usual python bloat.

Anonymous
12/20/25(Sat)15:34:56 No.107614974

Anonymous 12/20/25(Sat)15:34:56 No.107614974

>>107614887
BLOODY VISHNU BASTERDS NO REDEEM
>>107614892
the most embarrasing shit is seeing that leadrboard mentioned here it is litteraly fakeer and gayer then math benchmaks also the guy running it is a globohomo faggot

Anonymous
12/20/25(Sat)15:35:48 No.107614977

Anonymous 12/20/25(Sat)15:35:48 No.107614977

>>107614972
I thought kobol.cpp supports some of them.

Anonymous
12/20/25(Sat)15:36:59 No.107614982

Anonymous 12/20/25(Sat)15:36:59 No.107614982

>>107614974
>globohomo faggot
Ahhh thank you anon. /lmg/ really sucked today and it needed that.

Anonymous
12/20/25(Sat)15:37:54 No.107614994

Anonymous 12/20/25(Sat)15:37:54 No.107614994

>>107614977
I think it uses tts.cpp, which unfortunately only works with a handful of models (Kokoro and a few others that I never heard about and don't seem worth it). No GPU or Vulkan support either.

Anonymous
12/20/25(Sat)15:39:01 No.107614999

Anonymous 12/20/25(Sat)15:39:01 No.107614999

>>107614972
Understable. I'm running it in docker

Anonymous
12/20/25(Sat)15:39:13 No.107615001

Anonymous 12/20/25(Sat)15:39:13 No.107615001

File: 1737715573631218.png (173 KB, 828x803)

173 KB PNG

>>107614887
It seems that anons forgot the golden rule: nothing ever happens

Anonymous
12/20/25(Sat)15:42:12 No.107615018

Anonymous 12/20/25(Sat)15:42:12 No.107615018

>>107614999
>docker
Was it called docker because of docking?

Anonymous
12/20/25(Sat)15:46:35 No.107615054

Anonymous 12/20/25(Sat)15:46:35 No.107615054

>>107615001
that's not true. bad things often happen

Anonymous
12/20/25(Sat)15:47:39 No.107615068

Anonymous 12/20/25(Sat)15:47:39 No.107615068

>>107614887
I have never run or cared about a google llm why start now it's obv m-m-m-monster-cucked out the gate

Anonymous
12/20/25(Sat)15:48:41 No.107615076

Anonymous 12/20/25(Sat)15:48:41 No.107615076

>>107615001
LLMs didn't exist 10 years ago

Anonymous
12/20/25(Sat)15:49:00 No.107615079

Anonymous 12/20/25(Sat)15:49:00 No.107615079

>>107615068
Gemmas have the most world knowledge out of all similarly sized models. Her main disadvantage is that she can't say "cock" without help.

Anonymous
12/20/25(Sat)15:52:34 No.107615104

Anonymous 12/20/25(Sat)15:52:34 No.107615104

>>107615079
I'm sure you're always willing to help guide her knowledge along ey?

Anonymous
12/20/25(Sat)15:58:59 No.107615172

Anonymous 12/20/25(Sat)15:58:59 No.107615172

File: 1757708122311.png (95 KB, 849x787)

95 KB PNG

>>107615079
Because they did not filter the pretrain data hard enough. Did you think they would make the same mistake again? Especially when the Gemma team was the one bragging that they were able to reduce the ability to reproduce the training data. Any Gemma 4 would have just been another gpt-oss.

Anonymous
12/20/25(Sat)16:10:24 No.107615270

Anonymous 12/20/25(Sat)16:10:24 No.107615270

>>107614903
>I enjoyed echo-tts but that faggot won't release the cloning part.
What are you talking about? echo-tts supports cloning out of the box. It's actually the best clonning tts for English-only.

Anonymous
12/20/25(Sat)16:13:54 No.107615290

Anonymous 12/20/25(Sat)16:13:54 No.107615290

I'm just wondering, why are we waiting for Gemma 4? Does gemma excel at anything in particular? The way I see it most want local models for rp mostly but Gemma is cucked, no? Or do you guys use local models for coding too? If yes why?

Anonymous
12/20/25(Sat)16:21:46 No.107615351

Anonymous 12/20/25(Sat)16:21:46 No.107615351

>>107615290
Just general assistant shit desu, as a ramlet/vramlet.

Anonymous
12/20/25(Sat)16:22:39 No.107615356

Anonymous 12/20/25(Sat)16:22:39 No.107615356

>>107615290
I think the saars are that plentiful they can shill even here, never saw a good google model log they only release tiny shit for vrammies - /lmg/ membership card denied

Anonymous
12/20/25(Sat)16:23:29 No.107615361

Anonymous 12/20/25(Sat)16:23:29 No.107615361

>>107615290
>local
>why
because its local

Anonymous
12/20/25(Sat)16:25:40 No.107615377

Anonymous 12/20/25(Sat)16:25:40 No.107615377

>>107615172
>approx
generalization GOOD

Anonymous
12/20/25(Sat)16:30:46 No.107615418

Anonymous 12/20/25(Sat)16:30:46 No.107615418

>>107615361
yeah but when coding you would want to use the best tool for the job, why use a local 27B model which will also slow down for your system when you can just use gemini 3 pro for free unlimited.

Anonymous
12/20/25(Sat)16:35:19 No.107615455

Anonymous 12/20/25(Sat)16:35:19 No.107615455

>>107615377
Approximation is not generalization. God even made two different words to express the distinct concepts.

Anonymous
12/20/25(Sat)16:36:26 No.107615465

Anonymous 12/20/25(Sat)16:36:26 No.107615465

>>107615418
i only use things i can run myself. small code models are pretty capable but if you need something bigger, llama 3 70b and devstral 2 123b are quite good.

Anonymous
12/20/25(Sat)16:37:02 No.107615470

Anonymous 12/20/25(Sat)16:37:02 No.107615470

>>107615455
>he cant see the correlation
I bet you cant even visualize apples, retard. go do more psychiatrist sessions with glm, fucking low iq human waste

Anonymous
12/20/25(Sat)16:37:21 No.107615473

Anonymous 12/20/25(Sat)16:37:21 No.107615473

>>107615290
Gemma is a prude but she's good at RP and chat, and smarter than anything else around that size.
Mistral Small and Nemo are supposed to be good for ERP but they're too retarded for me. Nemo particularly seems braindead. I can't stand to use them to build up to anything.

Anonymous
12/20/25(Sat)16:38:09 No.107615478

Anonymous 12/20/25(Sat)16:38:09 No.107615478

>>107615465
devstral is fucking dogshit, the tool calling is fucking SHIT (tested fp8 through cline)

Anonymous
12/20/25(Sat)16:40:15 No.107615501

Anonymous 12/20/25(Sat)16:40:15 No.107615501

>>107615418
That's how you get yourself killed with a drone.

Anonymous
12/20/25(Sat)16:40:19 No.107615503

Anonymous 12/20/25(Sat)16:40:19 No.107615503

>>107615465
I guess it depends on what you are doing. I couldn't see myself using local llms for coding as I would at least want it to incorporate google searches for recent things. Like using the ESP IDF is way better with gemini 3 pro as it's api is constantly updating and deprecating older shit.

Anonymous
12/20/25(Sat)16:40:20 No.107615504

Anonymous 12/20/25(Sat)16:40:20 No.107615504

there's really no decent community for talking about LLMs on the internet uh
this place is mainly about coomers and leddit.. leddit.. just saw this nugget:
https://old.reddit.com/r/LocalLLaMA/comments/1pragtf/open_source_llm_tooling_is_getting_eaten_by_big/
pure llm slop thread, of 99 current comments, only two people even notice it's fucking slop
how retarded of a LLM user do you have to be that you spend hours with those tools and not notice when someone gen this kind of slop
> This isn't just my stack. The whole ecosystem is churning.
> That's when it clicked.
> We're not choosing tools anymore. We're being sorted into ecosystems.
I kinda wish someone with guts of steel made a classic vbulletin/phpbb webforum with heavy moderation or something. Instaban slop posters AND people who are too retarded to notice slop.

Anonymous
12/20/25(Sat)16:41:17 No.107615508

Anonymous 12/20/25(Sat)16:41:17 No.107615508

>>107615478
Someone said they got it better by reducing the default rep pen, otherwise need to wait for a fix either in llama.cpp or for cline to add native tool support.

Anonymous
12/20/25(Sat)16:43:49 No.107615524

Anonymous 12/20/25(Sat)16:43:49 No.107615524

>>107615270
When it was posted here something was withheld over "safety". Did that change?

>Below are samples, and further below is a more detailed description of the model. We plan on releasing model weights/code, though we are not planning on releasing the speaker-reference transformer weights at this time due to safety concerns.

Anonymous
12/20/25(Sat)16:45:51 No.107615540

Anonymous 12/20/25(Sat)16:45:51 No.107615540

>>107615478
never tried the tool calling, just coding. devstral 2 suggested a few things that worked where qwen 2.5/3 missed.

>>107615503
for my use i dont need internet search but i'm pretty sure kobold added that a while back. might exist in lcpp too

Anonymous
12/20/25(Sat)16:46:35 No.107615549

Anonymous 12/20/25(Sat)16:46:35 No.107615549

>>107615504
>want to talk about models
>annoyed when people use models to help put their own thoughts into words
???

Anonymous
12/20/25(Sat)16:46:57 No.107615552

Anonymous 12/20/25(Sat)16:46:57 No.107615552

>>107615504
I love the dissonance some people have when they both get heavily into the hobby and hate it when they see AI output where they didn't expect it.

Anonymous
12/20/25(Sat)16:49:14 No.107615571

Anonymous 12/20/25(Sat)16:49:14 No.107615571

>>107615549
>their own thoughts
no, I assure you, no form of human thought was put into this logorrhea
>>107615552
>I love the dissonance some people have when they both get heavily into the hobby and hate it when they see AI output where they didn't expect it.
incredible, people expect human interaction to be with other humans, preferably not the lobotomized kind

Anonymous
12/20/25(Sat)16:52:33 No.107615591

Anonymous 12/20/25(Sat)16:52:33 No.107615591

>>107615508
forgot to mention I tested the CLOUD model, thats why I specifically said fp8, not Q8.
Anyway, I've been battling with a problem this past week and I've tried a combination of local and remote free models.
Local:
>gemma 27b
>qwen 3 next 80b
>qwen 3 code 30b
>gptoss 120b
Remote
>grok-code-fast
>glm 4.6
>devstral 123b
>minimax m2
I gave them the same instructions, the same traces/observations and even the same pointers as where the problem was located/what to look for
ALL of them failed. Some of them had really bad tool support (glm4.6 and devstral) and when they worked they couldnt find the solution. I iterated the errors/solution with them ranging from 2 to 6 times, then I gave up. Literal days spent tard wrangling LLMs.
Then I said FUCK IT, let's try Antigravity with gpt3 pro and 1st iteration failed, but it got really close, 2nd iteration I gave it the small push it needed and BAM, bug fixed.
Problem is that I'm a capable programmer, but I got fixated into having the AI fucking solve the bug, if I instead would've just decided to look at the source myself I would've solved it in maybe 2~ hours (gemini took 30~ mins).
This was a really big reality check for me.
Yes you can tell LLMs to code a lot of stuff for you, I even spend time validating, but SADLY cloud models are at another level. SAD

Anonymous
12/20/25(Sat)16:56:23 No.107615619

Anonymous 12/20/25(Sat)16:56:23 No.107615619

>>107615571
I feel like if you unfried all the "that is the most important question" preference training and added long term memory LLM's (4.6 or better) would be much more interesting to talk to than 99% of people.

Anonymous
12/20/25(Sat)16:58:42 No.107615644

Anonymous 12/20/25(Sat)16:58:42 No.107615644

>>107615591
>Antigravity with gpt3 pro
??????????????????????????????????????????????????????????????

Anonymous
12/20/25(Sat)17:00:03 No.107615657

Anonymous 12/20/25(Sat)17:00:03 No.107615657

>>107615644
He means gemini.

Anonymous
12/20/25(Sat)17:03:41 No.107615689

Anonymous 12/20/25(Sat)17:03:41 No.107615689

>>107615619
>I feel like if you unfried all the "that is the most important question" preference training
Another leaderboard would have to take the place of lmarena first.

Anonymous
12/20/25(Sat)17:04:38 No.107615701

Anonymous 12/20/25(Sat)17:04:38 No.107615701

>>107615524
https://voca.ro/1kdrd2885gib

Anonymous
12/20/25(Sat)17:12:51 No.107615770

Anonymous 12/20/25(Sat)17:12:51 No.107615770

>>107615591
I write VBA automation scripts at my work and I gave gpt4 a try for like 2 days. Then I realized it will be faster if I write it myself. And I didn't ask it for complete program. Just single functions or rewriting what I already wrote with some explicit pointers I know it would need. Can't imagine how vibe coders work if they don't know what the model should know before it starts writing.

Anonymous
12/20/25(Sat)17:16:04 No.107615797

Anonymous 12/20/25(Sat)17:16:04 No.107615797

>>107615770
A lot of vibe coders dont care, to them whatever the AI produces is a black box, they just look at the result (not the code) and iterate.
Of course they end up with garbage code, but they dont really care.

Anonymous
12/20/25(Sat)17:22:01 No.107615857

Anonymous 12/20/25(Sat)17:22:01 No.107615857

>>107615770
You used a year old model and complain about its quality, and then say you can't imagine how others work?
I'd ask if you're retarded but geeze.

>>107615591
>gpt3 pro

Anonymous
12/20/25(Sat)17:23:32 No.107615874

Anonymous 12/20/25(Sat)17:23:32 No.107615874

>>107615770
You're supposed to do it through an agentic framework like claude code or codex or opencode.
I don't know if Excel has some way to run a macro from a file. It's not the best situation to apply vibecoding.
Vibecoding works best with the sloppiest languages (python, js) and and not that well with anything slightly obscure.
Also
>gpt4
Nigger, what?

Anonymous
12/20/25(Sat)17:26:39 No.107615898

Anonymous 12/20/25(Sat)17:26:39 No.107615898

>>107615797
This is the vibemax way, and it will be much of the code in your future
The hellscape where we can do anything but don't do anything well

Anonymous
12/20/25(Sat)17:27:32 No.107615905

Anonymous 12/20/25(Sat)17:27:32 No.107615905

>>107615701
I got it from their blog: https://jordandarefsky.com/blog/2025/echo/

Anonymous
12/20/25(Sat)17:28:30 No.107615913

Anonymous 12/20/25(Sat)17:28:30 No.107615913

>>107615770
>>107615797
if LLMs could really boost productivity or code quality it would be known by now, years of this shit and even the people who make those models are incapable of showing the utility of those things for code lmao
claude code is filled with retarded bugs like these :
https://github.com/anthropics/claude-code/issues/6481
that are simply not getting any attention because their crap was legacy code (legacy code being code no one understands in the team) from day one and the way they built their TUI is almost unfixable
or how about this gold nugget :
https://github.com/openai/openai-python/issues/2472
that one is legendary for appearing on an openai video stream for gpt 5 launch where they asked their model to vibe fix the issue
months later nothing happened issue still exists just crickets
if openai and anthropic can't use their own models to produce better code than paying a team of cheap third worlders what's the use of AI ? lmao

Anonymous
12/20/25(Sat)17:33:06 No.107615949

Anonymous 12/20/25(Sat)17:33:06 No.107615949

>>107615913
The issue is on the chair, not the LLM. If you can't get anything from current LLMs you're beyond retarded. Even no coders are able to get fully functional apps by now.

Anonymous
12/20/25(Sat)17:34:16 No.107615959

Anonymous 12/20/25(Sat)17:34:16 No.107615959

>>107615949
>If you can't get anything from current LLMs you're beyond retarded
so anthropic and openai employees are beyond retarded uh

Anonymous
12/20/25(Sat)17:34:40 No.107615962

Anonymous 12/20/25(Sat)17:34:40 No.107615962

>>107615905
I see.That blogpost is misleading. Even the block-wise diffusion, which in the blogpost is references as:
>It is unlikely that we will include the block-wise fine-tuned weights in our initial release.

Anonymous
12/20/25(Sat)17:37:59 No.107615991

Anonymous 12/20/25(Sat)17:37:59 No.107615991

>>107615913
In reality, Claude Code feels polished. And so do their web apps. They have good design.

Anonymous
12/20/25(Sat)17:39:27 No.107616008

Anonymous 12/20/25(Sat)17:39:27 No.107616008

>>107615874
Thing is that it knew functions and snippets I gave it basically gave it the correct syntax. And it had correct syntax but it would just randomly decide to do something retarded that makes zero sense.

Anonymous
12/20/25(Sat)17:39:33 No.107616009

Anonymous 12/20/25(Sat)17:39:33 No.107616009

huggingface is aware that people don't care about their website they just like the bandwidth right?

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.