/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 03/21/26(Sat)13:52:41 No.108423177

File: 1751875897536766.jpg (672 KB, 2048x1448)

/lmg/ - Local Models General Anonymous 03/21/26(Sat)13:52:41 No.108423177

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108416874 & >>108410115

►News
>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html
>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/21/26(Sat)13:53:02 No.108423180

Anonymous 03/21/26(Sat)13:53:02 No.108423180

File: __hatsune_miku_vocaloid_d(...).jpg (447 KB, 2048x1536)

447 KB JPG

►Recent Highlights from the Previous Thread: >>108416874

--MSA-4B outperforms GPT-4.1 and Qwen models in long-context benchmarks:
>108418758 >108418791 >108418819 >108418842 >108420734
--Advanced virtual companion setup with home automation:
>108417587 >108417605 >108417650 >108417676 >108417683 >108417727 >108417745 >108417769 >108417811 >108417879
--Mistral CEO proposes revenue-based content levy for AI companies in Europe:
>108417643 >108417668 >108417678 >108417740 >108417747 >108421003
--MistralAI CEO proposes AI content levy in Europe:
>108418980 >108420788 >108420874 >108420907 >108421234 >108421283 >108420826 >108420878 >108420879 >108420951 >108421015 >108421176 >108421248 >108421305 >108421306 >108421482 >108422209
--The End of Coding: Andrej Karpathy on Agents, AutoResearch, and the Loopy Era of AI:
>108422422 >108422476 >108422608 >108422615 >108422643 >108422670 >108422734
--Debating prompt format effectiveness for Literotica finetuning data:
>108417215 >108417294 >108417388 >108417663 >108417731 >108417818 >108417885
--Vulkan llama.cpp performance vs ROCm:
>108421311 >108421377 >108421508 >108421570 >108421613
--Phrase banning vs token banning in KoboldCPP and ik_llama.cpp:
>108421847 >108421854 >108421857 >108421873 >108421884 >108421914 >108421928 >108421965 >108421977 >108422023 >108422035 >108422080 >108422096 >108422118 >108421993
--Sarvam 105B benchmark results:
>108422282 >108422382 >108422388 >108422440 >108422451 >108422497
--Qwen 3.5 abliteration issues and Heretic uncensored alternatives:
>108418501 >108418584 >108418609 >108418621 >108418672 >108418686 >108420443 >108418693
--DDR4 vs DDR5 RAM upgrade considerations:
>108419791 >108419799 >108419904 >108420101 >108420151 >108419888
--Tensor parallelism progress in llama.cpp and fork alternatives:
>108421401 >108421433 >108421476 >108421778
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>108417029

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/21/26(Sat)13:58:09 No.108423216

Anonymous 03/21/26(Sat)13:58:09 No.108423216

>>108423198
Required https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

Anonymous
03/21/26(Sat)14:04:08 No.108423255

Anonymous 03/21/26(Sat)14:04:08 No.108423255

>>108423177
Are the techniques that make newer models better being explained publicly?

Anonymous
03/21/26(Sat)14:05:16 No.108423264

Anonymous 03/21/26(Sat)14:05:16 No.108423264

>>108423255
>newer models better
??? Nemo still mogs anything newer.

Anonymous
03/21/26(Sat)14:12:58 No.108423316

Anonymous 03/21/26(Sat)14:12:58 No.108423316

Mikulove

Anonymous
03/21/26(Sat)14:14:05 No.108423322

Anonymous 03/21/26(Sat)14:14:05 No.108423322

Do you guys goon to your local models? is it even possible?

Anonymous
03/21/26(Sat)14:14:10 No.108423323

Anonymous 03/21/26(Sat)14:14:10 No.108423323

File: 5864469_1774116590516.jpg (312 KB, 2291x1141)

312 KB JPG

what am i doing wrong? kobold is working but sillytavern cant connect.
what do to?

Anonymous
03/21/26(Sat)14:15:19 No.108423335

Anonymous 03/21/26(Sat)14:15:19 No.108423335

File: dipsyRawr.png (2.08 MB, 1024x1536)

2.08 MB PNG

TMW

Anonymous
03/21/26(Sat)14:15:52 No.108423341

Anonymous 03/21/26(Sat)14:15:52 No.108423341

>>108423333
undress me >>108423323

Anonymous
03/21/26(Sat)14:19:11 No.108423368

Anonymous 03/21/26(Sat)14:19:11 No.108423368

>>108423323
Missing v1 or v1/ at the end of the API URL?

Anonymous
03/21/26(Sat)14:34:27 No.108423462

Anonymous 03/21/26(Sat)14:34:27 No.108423462

https://www.reddit.com/r/LocalLLaMA/comments/1rzyha4/new_ai_policy_by_white_house_us/
> 1. Protecting Children — Require age-assurance measures, parental controls, and safeguards against sexual exploitation and self-harm on AI platforms, while affirming existing child privacy laws apply to AI.

Anonymous
03/21/26(Sat)14:39:27 No.108423499

Anonymous 03/21/26(Sat)14:39:27 No.108423499

>>108423322
I think that was the first thing a lot of people on /lmg/ did after getting one running

Anonymous
03/21/26(Sat)14:50:23 No.108423553

Anonymous 03/21/26(Sat)14:50:23 No.108423553

>>108423322
I think the best part about it is when you're new to this and don't know how to prompt or what model you're even running.
The novelty combined with the challenge of making the pretty lady stop saying "What you're doing is highly inappropriate, let's respect each other's boundaries" is great.
And when you figure it all out, wherever your brain's reward center is, it gives you a LOT of funny chemicals.
Then you start seeing how dumb the small models are and how much the big ones like shivering and smirking. The magic is lost, you start giving them scripting and summarization tasks, wondering if your AI rig was ever worth the investment...

So yes, it's possible.

Anonymous
03/21/26(Sat)14:54:45 No.108423585

Anonymous 03/21/26(Sat)14:54:45 No.108423585

>>108423462
This is why photorealistic ai was a mistake.

Anonymous
03/21/26(Sat)14:56:54 No.108423596

Anonymous 03/21/26(Sat)14:56:54 No.108423596

>>108423462
>>108423585
I have written a legal disclaimer which states that every character I gen, regardless of the model, is always 21 years of age or older. Signed and stamped by me.
There's nothing what they can do to me in this case.

Anonymous
03/21/26(Sat)14:57:29 No.108423604

Anonymous 03/21/26(Sat)14:57:29 No.108423604

>>108423585
Meanwhile ZiT/ZiB just shit out CP like no tomorrow.

Anonymous
03/21/26(Sat)15:00:19 No.108423623

Anonymous 03/21/26(Sat)15:00:19 No.108423623

>>108423604
Photorealistic sloppers are in trouble.

Anonymous
03/21/26(Sat)15:03:10 No.108423646

Anonymous 03/21/26(Sat)15:03:10 No.108423646

Qwen3.5-27b (HauhauCS uncensored version) is extremely good. Unironically rivals GLM-4.6, worse in raw intelligence (but good enough usually), better in not being slopped and formulaic.

Qwen3.5-9b (HauhauCS uncensored version) is absolutely fucking retarded. Worse than Mistral-Nemo or Gemma 3 12b.

What gives?
>bro it's a smaller model of course it's worse
The gap is enormous and makes me wonder if something went wrong somewhere.

Anonymous
03/21/26(Sat)15:04:31 No.108423662

Anonymous 03/21/26(Sat)15:04:31 No.108423662

>>108423646
>makes me wonder if something went wrong somewhere.
Or the one that worked was a fluke.

Anonymous
03/21/26(Sat)15:04:57 No.108423668

Anonymous 03/21/26(Sat)15:04:57 No.108423668

>>108423623
You can't ban open source models

Anonymous
03/21/26(Sat)15:05:23 No.108423675

Anonymous 03/21/26(Sat)15:05:23 No.108423675

>>108423646
Sadly is ruined by shitty architecture, no context shift and no antislop sampler.

Anonymous
03/21/26(Sat)15:11:17 No.108423723

Anonymous 03/21/26(Sat)15:11:17 No.108423723

>>108423646
It's literally shit

Anonymous
03/21/26(Sat)15:25:24 No.108423845

Anonymous 03/21/26(Sat)15:25:24 No.108423845

Blessed miku thread.

Anonymous
03/21/26(Sat)15:27:40 No.108423863

Anonymous 03/21/26(Sat)15:27:40 No.108423863

File: Screenshot 2026-03-21 152433.png (89 KB, 299x332)

89 KB PNG

https://goombalab.github.io/blog/2026/mamba3-part1/

Anonymous
03/21/26(Sat)15:28:51 No.108423870

Anonymous 03/21/26(Sat)15:28:51 No.108423870

>>108423462
>>108423585
This isn't even about generation it's about making sure chatGPT doesn't tell kids to rope themselves.

Anonymous
03/21/26(Sat)15:39:13 No.108423933

Anonymous 03/21/26(Sat)15:39:13 No.108423933

>>108423198
>"Experiments" by a schizo called DavidAU.
Look at his model collection, choose one of the older ones with a really long name, then read the model's card.
Behold the magnificence.
"expanding" (upscaling) a smaller model into a larger one then Pretraining the shit out of it, essentially using the original model as a base for a whole new model, is legit.
It's just that you need to do proper pretraining with trillions of token, not do some qlora.
Look at SOLAR 10B from back in the day. It's upscaled from mistral 7B IIRC.

ah thx anon!
any 27b upscaled that is decent? feels like it q5km is just on the cusp of being amazing for a single 24gb gpu. I am testing putting vision and tts on secondary gpu

Anonymous
03/21/26(Sat)15:45:03 No.108423965

Anonymous 03/21/26(Sat)15:45:03 No.108423965

>>108420443
Benchmarks would work better, just way more expensive. For any correct answer, there are a multitude of token sequences that reach it (and moreso with reasoning).

Anonymous
03/21/26(Sat)15:51:54 No.108424015

Anonymous 03/21/26(Sat)15:51:54 No.108424015

File: uncensored_vs_vanilla.png (3 KB, 176x144)

3 KB PNG

>>108423646
You know why it seems to be retarded?
I can't enable reasoning in the uncensored 9B model. Vanilla 9B behaves much better when reasoning is enabled but now it seems like this uncensored version doesn't have that at all. What in the heck..
I wonder if 27B is the same.

Anonymous
03/21/26(Sat)15:53:56 No.108424028

Anonymous 03/21/26(Sat)15:53:56 No.108424028

>>108423870
When are they going to figure out that we should just ban kids? Kids make everything unsafe. Damn brats :anger_vein: :anger_vein: :anger_vein:

Anonymous
03/21/26(Sat)15:54:31 No.108424032

Anonymous 03/21/26(Sat)15:54:31 No.108424032

>>108424028
>we should just ban kids
yes, support your local ID laws, they're for exactly that

Anonymous
03/21/26(Sat)15:55:23 No.108424040

Anonymous 03/21/26(Sat)15:55:23 No.108424040

>>108424032
I don't mean "ban them from minecraft".

Anonymous
03/21/26(Sat)15:58:28 No.108424066

Anonymous 03/21/26(Sat)15:58:28 No.108424066

>>108423675
Why does antislop not work with Qwen 3.5?

Anonymous
03/21/26(Sat)15:58:44 No.108424068

Anonymous 03/21/26(Sat)15:58:44 No.108424068

>>108424032
No, parents are for that. ID laws are for the police state.

Anonymous
03/21/26(Sat)15:59:50 No.108424072

Anonymous 03/21/26(Sat)15:59:50 No.108424072

>>108424066
hybrid attention makes surgical context modifications impossible it's all one big block

Anonymous
03/21/26(Sat)16:17:49 No.108424222

Anonymous 03/21/26(Sat)16:17:49 No.108424222

>>108423462
Seems completely irrelevant since we don't even fucking get models from America anymore

Anonymous
03/21/26(Sat)16:31:19 No.108424342

Anonymous 03/21/26(Sat)16:31:19 No.108424342

File: 1768914022117126.png (347 KB, 870x516)

347 KB PNG

>>108424222

Anonymous
03/21/26(Sat)16:32:48 No.108424357

Anonymous 03/21/26(Sat)16:32:48 No.108424357

With Qwen3.5-27B in Kobold, uploading an image seems to perpetually disable thinking, with the model always creating an empty block. Same in the last two versions. Is it a model quirk or inference bug?

Anonymous
03/21/26(Sat)16:34:09 No.108424363

Anonymous 03/21/26(Sat)16:34:09 No.108424363

>>108423177
My local model collapsed when I trained it on another model

Anonymous
03/21/26(Sat)16:38:19 No.108424395

Anonymous 03/21/26(Sat)16:38:19 No.108424395

Working on getting an API service setup to host qwen3-coder so I can use it anywhere, has anyone done this and willing to provide an example?

Anonymous
03/21/26(Sat)16:42:46 No.108424430

Anonymous 03/21/26(Sat)16:42:46 No.108424430

>>108424072
Thanks. I was wondering why I was seeing occasional smirks.

Anonymous
03/21/26(Sat)16:42:50 No.108424431

Anonymous 03/21/26(Sat)16:42:50 No.108424431

>>108424395
I just ran llama.cpp + an ngrok tunnel.
Both locally and on a kaggle instance.

Anonymous
03/21/26(Sat)16:49:30 No.108424481

Anonymous 03/21/26(Sat)16:49:30 No.108424481

>>108424342
Did she really say this?

Anonymous
03/21/26(Sat)16:51:58 No.108424499

Anonymous 03/21/26(Sat)16:51:58 No.108424499

>>108424357
After more experimentation, it must be a bug in Kobold. Having uploaded an image wrecks thinking even if you start a new session. The program must be restarted to fix it.

Anonymous
03/21/26(Sat)16:55:30 No.108424525

Anonymous 03/21/26(Sat)16:55:30 No.108424525

>>108424481
Why it stands out

- It does not happen often
- But when it does, it is very visible because the user already gave:
- the problem
- the severity
- the priority shift
- the go-ahead

Anonymous
03/21/26(Sat)16:56:13 No.108424535

Anonymous 03/21/26(Sat)16:56:13 No.108424535

>>108424395
For when you're at home, just setting your ST or other frontend's host to 0.0.0.0 will let you access it anywhere in the house by going to your PC's IP and port. If you don't know what any of this means just ask your bot.

For doing it across the internet, I can recommend Tailscale Funnel. It's easier to set up than anything else I tried, while still offering the highest security. Like I was set up in a few minutes. The only issue is that on the free tier it may be choppy at times (so the text streaming is not smooth). I can deal with that so it's fine for me but maybe you are different. Also, on the free tier, it only allows you to network one port at a time, but that's fine if you only want one API anyway.

To set up Tailscale, just go to their website and follow the instructions. Then run
tailscale funnel 8029
or whatever port number you have your API on. This is what I did for Linux at least, idk what it's like on Windows.

Anonymous
03/21/26(Sat)16:58:18 No.108424549

Anonymous 03/21/26(Sat)16:58:18 No.108424549

>>108424535
>routing your shit in externally managed services
kill yourself

Anonymous
03/21/26(Sat)16:58:22 No.108424551

Anonymous 03/21/26(Sat)16:58:22 No.108424551

>>108424535
>moments before anon's machine was hacked

Anonymous
03/21/26(Sat)17:04:44 No.108424603

Anonymous 03/21/26(Sat)17:04:44 No.108424603

File: 1758568068811419.png (50 KB, 1202x313)

50 KB PNG

It's OVER

Anonymous
03/21/26(Sat)17:06:43 No.108424621

Anonymous 03/21/26(Sat)17:06:43 No.108424621

>>108424603
I read "nemo says".

Anonymous
03/21/26(Sat)17:07:40 No.108424627

Anonymous 03/21/26(Sat)17:07:40 No.108424627

>>108424603
accelerate a

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.