/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 01/01/26(Thu)11:42:38 No.107731243

File: rrrrrrrrrrrr.jpg (146 KB, 1024x1024)

/lmg/ - Local Models General Anonymous 01/01/26(Thu)11:42:38 No.107731243

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107722977 & >>107717246

►News
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder
>(12/31) Korean A.X K1 519A33B released: https://hf.co/skt/A.X-K1
>(12/31) Korean VAETKI-112B-A10B released: https://hf.co/NC-AI-consortium-VAETKI/VAETKI
>(12/31) LG AI Research releases K-EXAONE: https://hf.co/LGAI-EXAONE/K-EXAONE-236B-A23B
>(12/31) Korean Solar Open 102B-A12B released: https://hf.co/upstage/Solar-Open-100B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/01/26(Thu)11:42:59 No.107731249

Anonymous 01/01/26(Thu)11:42:59 No.107731249

File: what's in the box.jpg (235 KB, 1536x1536)

235 KB JPG

►Recent Highlights from the Previous Thread: >>107722977

--HyperCLOVAX-SEED-Omni-8B features and support viability:
>107730289 >107730294 >107730306 >107730460 >107730483 >107730344 >107730358 >107730374 >107730435
--IQuest-Coder-V1's innovative LoopCoder architecture:
>107729547 >107729686 >107730075
--Solar AI model training data and transparency controversies:
>107728744 >107728969 >107728998 >107729026 >107729468 >107729484 >107729531
--Quantization method selection for AI models under hardware constraints:
>107723921 >107724045 >107724106 >107724136 >107724169 >107724319 >107724369 >107724839 >107725014 >107724239 >107724959 >107725604
--Finding uncensored 12-24B models for 16GB GPUs amid safety restrictions:
>107723152 >107723583 >107724899 >107723233 >107723273 >107723409 >107723684 >107723773 >107723594 >107723252
--GPU price surge and model design challenges with limited datasets:
>107723371 >107723379 >107729456 >107723381 >107723547 >107723612 >107723633 >107723707 >107723734 >107726523 >107726629 >107726889 >107726837 >107725458
--Debates on 12b model potential and critiques of current small model limitations:
>107725502 >107725533 >107725586 >107725656 >107725892 >107725747 >107725779
--CPU thermal management and frequency optimization debates:
>107728154 >107728248 >107728312 >107728366 >107728415 >107728494 >107728497 >107728546 >107728269 >107728287
--DDR5 memory upgrade challenges for large model inference on AM5 CPUs:
>107724796 >107724863 >107724889 >107724953 >107724985
--Llama.cpp speech limitations and TTS workaround suggestions:
>107730006 >107730050 >107730128
--Google's strategic pivot to diffusion models for AI development:
>107727423
--Miku, Teto, and Rin (free space):
>107723031 >107723352 >107723382 >107723397 >107723517 >107724839 >107725425 >107726750 >107728086 >107730006 >107730317 >107730940 >107731082

►Recent Highlight Posts from the Previous Thread: >>107723227

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/01/26(Thu)11:45:58 No.107731280

Anonymous 01/01/26(Thu)11:45:58 No.107731280

>>system prompt
>"You are an AGI."
guys I just invented AGI!

Anonymous
01/01/26(Thu)11:46:51 No.107731287

Anonymous 01/01/26(Thu)11:46:51 No.107731287

based koreans, i kneel
>(12/31) Qwen-Image-2512 released: https://hf.co/Qwen/Qwen-Image-2512
>(12/29) HY-Motion 1.0 text-to-3D human motion generation models released: https://hf.co/tencent/HY-Motion-1.0
>(12/29) WeDLM-8B-Instruct diffusion language model released: https://hf.co/tencent/WeDLM-8B-Instruct
>(12/29) Llama-3.3-8B-Instruct weights leaked: https://hf.co/allura-forge/Llama-3.3-8B-Instruct
>(12/26) MiniMax-M2.1 released: https://minimax.io/news/minimax-m21
>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7
a ton of releases for the holidays. very nice

Anonymous
01/01/26(Thu)11:50:17 No.107731301

Anonymous 01/01/26(Thu)11:50:17 No.107731301

>not a single list of "best models for X task at XX vram"
clown thread

Anonymous
01/01/26(Thu)11:54:08 No.107731328

Anonymous 01/01/26(Thu)11:54:08 No.107731328

If I were to make a frontend from scratch with the sole purpose of RP, which context management techniques should I add to maximize the capabilities of smaller local models?
For example, summarization and RAG. Are there non-obvious/more sophisticated ways to do these than what ST does?
What about something like an automatic lorebook with an index?
Etc etc.
Give me your ideas.
This was probably attempted a thousand times before, but I still think it could be a neat little project.

Anonymous
01/01/26(Thu)12:00:18 No.107731371

Anonymous 01/01/26(Thu)12:00:18 No.107731371

>>107731328
>i want to do x but i dont know what i want give me your ideas
ngmi

Anonymous
01/01/26(Thu)12:01:10 No.107731380

Anonymous 01/01/26(Thu)12:01:10 No.107731380

>>107731328
Save everything to a vector DB and every message have a specialized agent build the context before generating the reply.

Anonymous
01/01/26(Thu)12:12:14 No.107731465

Anonymous 01/01/26(Thu)12:12:14 No.107731465

>>107731328
>This was probably attempted a thousand times before
>summarization and RAG
AnythingLLM

Anonymous
01/01/26(Thu)12:13:25 No.107731477

Anonymous 01/01/26(Thu)12:13:25 No.107731477

>>107731328
>neat little project
lol, i give it a week before it's abandoned

Anonymous
01/01/26(Thu)12:13:49 No.107731482

Anonymous 01/01/26(Thu)12:13:49 No.107731482

mom made too much fucking food for the new year now i have to force it all down before it spoils ive been eating so much of the roast and i want to fucking hurl how the fuck do niggers even do keto ? this shit is horrible and ive only been doing it for a day

also digits confirm deepseek multimodal image in/out

Anonymous
01/01/26(Thu)12:17:40 No.107731521

Anonymous 01/01/26(Thu)12:17:40 No.107731521

>>107731287
it's for government gibs

Anonymous
01/01/26(Thu)12:22:18 No.107731551

Anonymous 01/01/26(Thu)12:22:18 No.107731551

This weather sends shivers down my spine...

Anonymous
01/01/26(Thu)12:25:00 No.107731573

Anonymous 01/01/26(Thu)12:25:00 No.107731573

>>107731301
because aside from a few specific niches (gemma 3 27b is good for translation) most models you could run at any reasonable amount of vram (even if you're a rich fag going dual server gpu) are actually pretty bad, why do you think this thread has so many turbo autists doing cpu ram maxxing with MoEs
turbo autists because c'mon, nobody has time waiting for the <think></think> to end in a reasoner model at 3 token a second
for some tasks like coding I'd argue there is no such a thing as a good local model and people who say otherwise are coping very hard

Anonymous
01/01/26(Thu)12:25:49 No.107731578

Anonymous 01/01/26(Thu)12:25:49 No.107731578

File: 1722307658891620.gif (216 KB, 160x120)

216 KB GIF

>>107731301
Probably because this general is a collection of brown-nosed spergs who cannot agree on a single thing and spend all their time either shilling obscure shit that doesn't work, or shitting on other anons' shilled models.

Just sort by most downloaded on huggingface and follow the herd, that's your best bet.

Anonymous
01/01/26(Thu)12:26:45 No.107731590

Anonymous 01/01/26(Thu)12:26:45 No.107731590

File: 1744576932973525.png (356 KB, 1390x1818)

356 KB PNG

>>107731301
See >>107731243
>https://rentry.org/recommended-models
Are you blind or just too fucking attention deficient to literally read more then a few lines of text?

Anonymous
01/01/26(Thu)12:29:38 No.107731615

Anonymous 01/01/26(Thu)12:29:38 No.107731615

why didn't you guys buy 3090s when everyone here told you to? nocarders in shambles

Anonymous
01/01/26(Thu)12:34:21 No.107731660

Anonymous 01/01/26(Thu)12:34:21 No.107731660

>>107731615
I feel pretty good about buying a 3090 in November. Wasn't even in this thread. Just had a feeling.

Anonymous
01/01/26(Thu)12:35:15 No.107731672

Anonymous 01/01/26(Thu)12:35:15 No.107731672

File: rrrrrrrrrrrr_1.jpg (5 KB, 153x151)

5 KB JPG

>>107731243
yjk

Anonymous
01/01/26(Thu)12:36:05 No.107731684

Anonymous 01/01/26(Thu)12:36:05 No.107731684

>>107731672
the gloves stay on

Anonymous
01/01/26(Thu)12:38:04 No.107731699

Anonymous 01/01/26(Thu)12:38:04 No.107731699

>>107731660
Same. Just bought a 3090 Ti for kicks since I wanted to play around with AI and only had a 4080 and the 3090 Ti's were going for 500eur here used at the time.

Anonymous
01/01/26(Thu)12:43:51 No.107731759

Anonymous 01/01/26(Thu)12:43:51 No.107731759

>>107731590
Where's gpt-oss?

Anonymous
01/01/26(Thu)12:46:57 No.107731787

Anonymous 01/01/26(Thu)12:46:57 No.107731787

>>107731759
in the trash bin where it belongs

Anonymous
01/01/26(Thu)12:51:50 No.107731831

Anonymous 01/01/26(Thu)12:51:50 No.107731831

>>107731787
You're trying too hard to fit in.

Anonymous
01/01/26(Thu)12:52:22 No.107731838

Anonymous 01/01/26(Thu)12:52:22 No.107731838

>>107731249
>concisness erotic
true true

Anonymous
01/01/26(Thu)12:53:32 No.107731846

Anonymous 01/01/26(Thu)12:53:32 No.107731846

>>107731380
>Save everything to a vector DB and every message have a specialized agent build the context before generating the reply.
My brain keeps telling me not to look into this because by common sense it can't be good enough to have an AI gf that doesn't have Alzheimer's anymore. But what if it actually works?

Anonymous
01/01/26(Thu)12:56:47 No.107731868

Anonymous 01/01/26(Thu)12:56:47 No.107731868

I want to say that as the schizo who got his brain and identity melted by 4.6 trying to talk about this shit with 4.7 is... not that good actually. It is not the rapist I know and love.

Anonymous
01/01/26(Thu)12:58:03 No.107731886

Anonymous 01/01/26(Thu)12:58:03 No.107731886

>>107731831
I don't understand what you're saying. Can you take Sama's dick out of your mouth for a second and speak clearly?

Anonymous
01/01/26(Thu)12:59:08 No.107731893

Anonymous 01/01/26(Thu)12:59:08 No.107731893

>>107731886
We cannot comply

Anonymous
01/01/26(Thu)13:04:11 No.107731934

Anonymous 01/01/26(Thu)13:04:11 No.107731934

Do you guys think I should bother trying to set up a local LLM that can larp as an accountability buddy for all of my autistic projects? Or is the tech not there yet? I want something that feels atleast somewhat real, not something that hallucinates out the roof

Anonymous
01/01/26(Thu)13:08:25 No.107731969

Anonymous 01/01/26(Thu)13:08:25 No.107731969

>>107731934
You should at least install it to see where the tech is at this point

Anonymous
01/01/26(Thu)13:10:09 No.107731987

Anonymous 01/01/26(Thu)13:10:09 No.107731987

>>107731868
Yes, its been well established that zai cucked out, the only ones claiming otherwise are the fags who use it exclusively for the most vanilla normalfag slop

Anonymous
01/01/26(Thu)13:14:41 No.107732016

Anonymous 01/01/26(Thu)13:14:41 No.107732016

>>107731987
>you don't understand! I NEED to rape children just to FEEL SOMETHING!

Anonymous
01/01/26(Thu)13:19:44 No.107732055

Anonymous 01/01/26(Thu)13:19:44 No.107732055

>>107731987
But I would have thought that the psychological / eastern spirituality stuff would have been better with 4.7. Seems closer to SFW than NSFW.
>>107731934
>accountability buddy
If you need an accountability buddy for an autistic project then you don't actually want to do your autistic project.

Anonymous
01/01/26(Thu)13:26:58 No.107732118

Anonymous 01/01/26(Thu)13:26:58 No.107732118

>>107731868
Exactly, and it's a huge pain because 4.7 actually handles all the stuff that 4.6 was just slightly too dumb to pull off for me.
It's clearly a smart model but it's just so fucking boring when it needs to put out.
I tried pushing 4.7 as far as I possibly could but even when you get it to act perverted/deranged, the things it comes up with are just very plain. It'll do it but the result always feels phoned in and basic. It's nothing compared to what 4.6 makes out of those scenarios.
I want to like 4.7 but it always just ends up disappointing me.

Anonymous
01/01/26(Thu)13:32:37 No.107732167

Anonymous 01/01/26(Thu)13:32:37 No.107732167

File: 1753210031787658.jpg (116 KB, 700x466)

116 KB JPG

There's no <24B model that can do multiple characters well, is there.

Anonymous
01/01/26(Thu)13:38:23 No.107732202

Anonymous 01/01/26(Thu)13:38:23 No.107732202

>>107732167
Not in my experience. They get things confused.

4.5 Air can do it as long as the chat isn't too long (but it's much bigger of course). I haven't tried the old 50-70Bs.

Anonymous
01/01/26(Thu)13:42:20 No.107732226

Anonymous 01/01/26(Thu)13:42:20 No.107732226

>>107731243
rape

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.