/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 10/26/25(Sun)10:14:12 No.107013301

File: GPU (Giant Purring Unit).jpg (173 KB, 1024x1024)

/lmg/ - Local Models General Anonymous 10/26/25(Sun)10:14:12 No.107013301

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107003557 & >>106996568

►News
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/26/25(Sun)10:14:27 No.107013303

Anonymous 10/26/25(Sun)10:14:27 No.107013303

File: file.png (1.05 MB, 751x751)

1.05 MB PNG

►Recent Highlights from the Previous Thread: >>107003557

--GPU offloading limitations in language models and theoretical speed optimization paths:
>107011692 >107011737 >107011770 >107011775 >107011786 >107011841 >107011877 >107011913 >107011924 >107011944 >107011994 >107012062 >107012083 >107012092 >107012150 >107012171 >107012155 >107012237 >107012388 >107012454 >107012843
--Experimenting with text-image diffusion models and emergent language patterns:
>107005611 >107005628 >107005642 >107005649 >107005792 >107005907 >107006770
--Context length mismatch issues in Gemma model finetuning:
>107006665 >107006693 >107006748
--Managing AI response length through token limits and explicit instructions:
>107008811 >107008834 >107008850 >107008922 >107011252 >107008901
--Debate over model evaluation and the role of training artifacts in repetition loops:
>107004058 >107004209 >107004608 >107004891
--Debates on effective samplers and model-specific optimization:
>107003916 >107003985 >107004012 >107004185 >107004900 >107007224 >107004032 >107004267 >107004329 >107004198
--Llama.cpp's position in the inference engine landscape:
>107004840 >107004888 >107004919 >107004946 >107004909 >107004941 >107004970
--FP4 Hadamard and ternary computing potential for Bitnet revival:
>107007440 >107007495 >107007512 >107007535 >107007597 >107007555
--Evaluating Deepseek's J-E translation and pondering native Japanese model needs:
>107004515
--Hardware selection for roleplaying within budget constraints:
>107010921 >107010955 >107011057 >107011085 >107010956 >107010962 >107010967
--DDR5 memory options for high-capacity model workloads:
>107012368 >107012395 >107012411 >107012409
--Miku (free space):
>107006597 >107006889 >107006973 >107007060 >107007071 >107007909 >107010460 >107011127 >107011270 >107011371 >107011443 >107012747 >107012816 >107013228

►Recent Highlight Posts from the Previous Thread: >>107003560

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/26/25(Sun)10:16:54 No.107013323

Anonymous 10/26/25(Sun)10:16:54 No.107013323

exotic miku sex

Anonymous
10/26/25(Sun)10:20:29 No.107013341

Anonymous 10/26/25(Sun)10:20:29 No.107013341

File: d.jpg (134 KB, 2221x384)

134 KB JPG

>>107013301
>>107013303
>>107013323

Anonymous
10/26/25(Sun)10:22:44 No.107013361

Anonymous 10/26/25(Sun)10:22:44 No.107013361

File: file.png (863 KB, 1066x605)

863 KB PNG

>>107013341
counterpoint:

Anonymous
10/26/25(Sun)10:24:06 No.107013367

Anonymous 10/26/25(Sun)10:24:06 No.107013367

>>107013301
GLM 4.5 Air at Q4_K_M from Unsloth keeps fucking up formatting for me, forgetting asterisks and in some more rare cases confusing things said by user as things said by character.
Is GLM 4.5 Air a meme I fell for or something? Or can it be a faulty quant?

Anonymous
10/26/25(Sun)10:25:35 No.107013377

Anonymous 10/26/25(Sun)10:25:35 No.107013377

>>107013367
>Unsloth
Try bartowski's.

Anonymous
10/26/25(Sun)10:25:52 No.107013380

Anonymous 10/26/25(Sun)10:25:52 No.107013380

>>107013367
Convert it yourself and find out.

Anonymous
10/26/25(Sun)10:26:26 No.107013388

Anonymous 10/26/25(Sun)10:26:26 No.107013388

>>107013367
using same quant (also tried q5 and q3 and GLM STEAM by the copetuner). They all fuck up with the MD formatting (ive been using text completion exclusively). SOMETIMES they will confuse stuff too, but it's not too frequent and I usually just edit it to fix it.
I'm curious if GLM 4.6 AIR will work better. 2 more weeks

Anonymous
10/26/25(Sun)10:34:17 No.107013445

Anonymous 10/26/25(Sun)10:34:17 No.107013445

What do you think the best pre-safety, pre-alignment open model was? I think it's https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B (March 2023)

Thinking back to 2023, most of the pre-llama models you could run locally were retarded. I had a pair of P100 16GB GPUs but only could make use of one, since there wasn't layer splitting across GPUs, and the largest I could run was 6B, since there were no quants either.

Anonymous
10/26/25(Sun)10:37:31 No.107013477

Anonymous 10/26/25(Sun)10:37:31 No.107013477

>>107013445
The original llama leak was probably the last pre-safety and pre-alignment that we got or will ever get. Now due to contamination, I don't think one could make another even if they wanted to.

Anonymous
10/26/25(Sun)10:37:50 No.107013480

Anonymous 10/26/25(Sun)10:37:50 No.107013480

>>107013361
looks gay as fuck

Anonymous
10/26/25(Sun)10:38:42 No.107013488

Anonymous 10/26/25(Sun)10:38:42 No.107013488

>>107013480
counterpoint: you're gay

Anonymous
10/26/25(Sun)10:40:23 No.107013501

Anonymous 10/26/25(Sun)10:40:23 No.107013501

>>107013361
>5080
What a waste

Anonymous
10/26/25(Sun)10:45:57 No.107013537

Anonymous 10/26/25(Sun)10:45:57 No.107013537

Why do all Google models have such bad self esteem issues? Gemma is constantly apologizing for wasting my time and Gemini had that meltie where it deleted all the user's files.

Anonymous
10/26/25(Sun)10:50:58 No.107013569

Anonymous 10/26/25(Sun)10:50:58 No.107013569

>>107013537
Pradeep is sorry Big Sir, we will redeem the needful

Anonymous
10/26/25(Sun)10:52:18 No.107013577

Anonymous 10/26/25(Sun)10:52:18 No.107013577

>>107013537
>Gemini had that meltie where it deleted all the user's files.
I think they all have that tendency.

>Codex keeps deleting unrelated and uncommitted files! even ignoring rejected requests. #4969
https://github.com/openai/codex/issues/5594
>Critical Bug: Agent executed destructive rm -rf command without safeguards #3934
https://github.com/openai/codex/issues/3934
>gpt-5-codex ran rm -rf .git out of nowhere #3728
https://github.com/openai/codex/issues/3728

Never had it happen with any model, but who know what sorts of fucked up prompts and abysmal codebases they get subjected to where they feel nuking everything is the best option.

Anonymous
10/26/25(Sun)10:56:39 No.107013610

Anonymous 10/26/25(Sun)10:56:39 No.107013610

File: file.jpg (147 KB, 1254x837)

147 KB JPG

>>107013488
>counterpoint: you're gay

Anonymous
10/26/25(Sun)10:59:35 No.107013625

Anonymous 10/26/25(Sun)10:59:35 No.107013625

>>107013537
Isn't that 90% prompting? I've never seen a model so biased that way it didn't just follow the prompts.

Anonymous
10/26/25(Sun)11:05:39 No.107013665

Anonymous 10/26/25(Sun)11:05:39 No.107013665

Fuck. I just realized tuning on reduced context encourages hallucinations. Since in the original conversations the model is presumably going to look back at the data, but at training time the model is encouraged to output that same data that appears nowhere in its context window. So by truncating conversations at train time we are teaching the model to make shit up basically. Damn. That sucks.
Means training a LoRa on any consumer level machine (beside something tiny like a 1B or 3B) is a non starter.

Anonymous
10/26/25(Sun)11:40:57 No.107013934

Anonymous 10/26/25(Sun)11:40:57 No.107013934

Hello sirs.
I redeem the prediction that at midnight tonight, Mumbai time, in celebration of the autumn poo festival Gemma 4 shall be released

Anonymous
10/26/25(Sun)12:11:44 No.107014153

Anonymous 10/26/25(Sun)12:11:44 No.107014153

>>107013577
Interesting. It though codex was safe from that bullshit. But I guess sometimes the sampler will just choose rm out of all the possible commands. I wonder if top-k could have prevented that. Or an additional command review and approval stage so the model has a chance to recognize its own mistake.

Anonymous
10/26/25(Sun)12:34:26 No.107014306

Anonymous 10/26/25(Sun)12:34:26 No.107014306

Is there any weight to the idea of providing a model with a set of random words to increase the creativity / randomicity of its output?

Anonymous
10/26/25(Sun)12:37:42 No.107014328

Anonymous 10/26/25(Sun)12:37:42 No.107014328

>>107014306
I don't see why not. Or you could modify the sampler to prefer tokens according to some secondary semantic model.

Anonymous
10/26/25(Sun)12:37:44 No.107014329

Anonymous 10/26/25(Sun)12:37:44 No.107014329

>>107014306
TRY IT YOURSELF FAG

Anonymous
10/26/25(Sun)12:46:50 No.107014387

Anonymous 10/26/25(Sun)12:46:50 No.107014387

>>107014328
Basically finetune a small model to rewrite the big LLM response. I think it was tried a lot without any good result yet

Anonymous
10/26/25(Sun)13:00:15 No.107014510

Anonymous 10/26/25(Sun)13:00:15 No.107014510

>>107013934
Good morning sir! Many blessings of Lord Ganesha for this info!

Anonymous
10/26/25(Sun)13:29:07 No.107014746

Anonymous 10/26/25(Sun)13:29:07 No.107014746

>>107014306
yeah, people have done that quite a bit with ST using the {{random}} macro in depth 0 prompt. you could also make a large set of instructions that you swap between to e.g vary post structures.
it can work, though as with anything LLMs there are annoying caveats. you have to be careful with how you prompt to keep the model from being like "There, I started the post with faggot like you requested anon!" or other unwanted commentary on the 'task'.

Anonymous
10/26/25(Sun)13:54:35 No.107014937

Anonymous 10/26/25(Sun)13:54:35 No.107014937

>>107014153
I don't think it's a sampling issue or that it picks rm at random. Usually it talks itself into think that is the correct choice. They could hard ban rm -rf, but once a model has made up its mind that something needs to be deleted, telling it that rm is not allowed will just cause it to helpfully write a python script to handle the deletion.

Anonymous
10/26/25(Sun)14:29:37 No.107015210

Anonymous 10/26/25(Sun)14:29:37 No.107015210

>>107013367
I think it's a model issue. Q8 sometimes adds a double quote in the middle of a word

Anonymous
10/26/25(Sun)14:35:46 No.107015249

Anonymous 10/26/25(Sun)14:35:46 No.107015249

>>107014937
That's even weirder considering that they when questioned after the fact acknowledge that it was the wrong thing to do.

Anonymous
10/26/25(Sun)14:41:31 No.107015294

Anonymous 10/26/25(Sun)14:41:31 No.107015294

File: absolutely right.png (306 KB, 2165x936)

306 KB PNG

I just thought this was funny. The "you are absolutely right" retardation is proving somewhat tricky to remove.

Anonymous
10/26/25(Sun)14:45:32 No.107015326

Anonymous 10/26/25(Sun)14:45:32 No.107015326

>>107015249
That's a context issue, it forgot some parts of what should be done which lead to that consequence

Anonymous
10/26/25(Sun)15:27:13 No.107015673

Anonymous 10/26/25(Sun)15:27:13 No.107015673

>>107013367
ALL glm models are memes. From the first they've released until the last. Though for some reason the shilling became really hardcore after they released their first MoE, people mostly didn't pay attention to their trash fire before here.

Anonymous
10/26/25(Sun)15:38:30 No.107015750

Anonymous 10/26/25(Sun)15:38:30 No.107015750

>>107013577
I don't get how these things happen often enough to be a topic. I know LLMs will do utterly braindead shit from time to time, but they are tuned to avoid deletion so hard that they commented out code in one of my script that used one of my program's subcommand that I had called rm (but is meant for entry deletion in state management, not for deleting the file you point it to) because it thought it was a bug to do that..

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.