/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 01/12/26(Mon)20:40:48 No.107847320

File: __kasane_teto_utau_drawn_(...).jpg (892 KB, 1413x2000)

/lmg/ - Local Models General Anonymous 01/12/26(Mon)20:40:48 No.107847320

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107838898 & >>107834480

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/12/26(Mon)20:41:17 No.107847322

Anonymous 01/12/26(Mon)20:41:17 No.107847322

File: rec.jpg (181 KB, 1024x1024)

181 KB JPG

►Recent Highlights from the Previous Thread: >>107838898

--Paper: Prompt Repetition Improves Non-Reasoning LLMs:
>107841511 >107841558 >107841788
--Papers:
>107840737
--Test-time training and beam search potential in open models:
>107839993 >107840870 >107840929 >107840944 >107840948 >107841079 >107841107 >107841130 >107841188 >107841193 >107843956 >107844098 >107844297 >107844760
--Adapting Microsoft TinyTroupe for local multiagent simulation with koboldcpp:
>107840877 >107840941 >107841028 >107841046 >107841313 >107842113 >107843658 >107843909 >107844229
--Context caching and efficiency in SillyTavern/LLM interactions:
>107841026 >107841049 >107841057 >107841086 >107841105 >107841142
--AI character interface development with animation control features:
>107841569 >107841591 >107841609 >107841593 >107841614 >107841636 >107841685 >107841771 >107841794 >107844857 >107841645 >107841648 >107841651 >107841655 >107841751 >107841760 >107841789 >107841844 >107841925 >107842016 >107843335 >107843377
--Cost and hardware considerations for multi-3090 AI rig construction:
>107840180 >107840249 >107840309 >107840596 >107840633 >107840640
--RAG explained as document chunking and embedding for context augmentation:
>107841899 >107841939 >107842005 >107842027 >107844296 >107844327 >107844468 >107842015 >107842046 >107842099 >107842082
--AI flaws vs emotional simulation and 3D model tech discussion:
>107842172 >107843286 >107843328 >107843393 >107843528 >107843592 >107845182 >107845226 >107845255 >107846236 >107843907 >107844059 >107844099 >107844179 >107844262
--llama.cpp memory split regression issue after update:
>107840161 >107840177
--ik_llama.cpp PR adds customizable string/regex token banning:
>107843501
--Miku (free space):
>107840633 >107840665 >107842172 >107843286 >107843393 >107843911 >107845663 >107845698 >107846236 >107844824

►Recent Highlight Posts from the Previous Thread: >>107838903

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/12/26(Mon)20:44:28 No.107847349

Anonymous 01/12/26(Mon)20:44:28 No.107847349

>>107847336
>>107847336
>>107847336

Anonymous
01/12/26(Mon)20:46:46 No.107847369

Anonymous 01/12/26(Mon)20:46:46 No.107847369

>>107847320
I would drill kasane's tetos.

Anonymous
01/12/26(Mon)20:48:07 No.107847379

Anonymous 01/12/26(Mon)20:48:07 No.107847379

>>107847349
>Am I retarded?
Probably.
>where the fuck do you find the mmproj for mistral small 3.2 2501?
What stops you from making it yourself? Is it not supported?

Anonymous
01/12/26(Mon)20:50:21 No.107847396

Anonymous 01/12/26(Mon)20:50:21 No.107847396

>>107847379
It's not multimodal.

Anonymous
01/12/26(Mon)20:51:29 No.107847409

Anonymous 01/12/26(Mon)20:51:29 No.107847409

>>107847396
Yeah. I was just checking. He is retarded, then.

Anonymous
01/12/26(Mon)20:51:44 No.107847413

Anonymous 01/12/26(Mon)20:51:44 No.107847413

>>107847349
>Am I retarded?
Yes. Go to bartowski's 3.2 page, click the files tab and ctrl+f mmproj.

Anonymous
01/12/26(Mon)20:53:39 No.107847425

Anonymous 01/12/26(Mon)20:53:39 No.107847425

File: broken-tutu.png (9 KB, 632x73)

9 KB PNG

>>107847379
>>107847396
>>107847409
Either Broken-Tutu isn't actually using 2501 as it says or this description is pure LLM hallucination.

Either ways fuck ReadyArt.

Anonymous
01/12/26(Mon)20:54:46 No.107847435

Anonymous 01/12/26(Mon)20:54:46 No.107847435

>>107847425
Vision was added in 3.1+3.2. '2501' refers to 3.0 which does not have vision.

Anonymous
01/12/26(Mon)20:55:03 No.107847437

Anonymous 01/12/26(Mon)20:55:03 No.107847437

Reminder that you shouldn't use abliterations. Just don't be lazy and properly prompt with the BASED models.

Anonymous
01/12/26(Mon)20:57:03 No.107847444

Anonymous 01/12/26(Mon)20:57:03 No.107847444

>>107847320
fateto

Anonymous
01/12/26(Mon)20:58:21 No.107847454

Anonymous 01/12/26(Mon)20:58:21 No.107847454

>>107847425
>merge
You should have started there, retard. Next time link to the model.

Anonymous
01/12/26(Mon)20:58:41 No.107847458

Anonymous 01/12/26(Mon)20:58:41 No.107847458

>>107847320
Just got a used 3090 with 24GB VRAM
Any proper in depth guide to get LLM setup with image + sound generation?
I prefer to use deepseek if possible,
And this guide is shit, how does sillytavern communicates with koboltcpp? Is there configuration needed?
>ooba/koboldcpp as your backend
>sillytavern as your frontend
>go to huggingface and download nemo 12b instruct gguf. Start with Q4.
>load into ooba/kobold
>in sillytavern, select Mistral v3 tekken context >template and instruct template
>Temp 0.8
>MinP 0.02
>Rep Pen 1.2

Anonymous
01/12/26(Mon)21:05:55 No.107847497

Anonymous 01/12/26(Mon)21:05:55 No.107847497

>>107847458
>Just got a used 3090 with 24GB VRAM
Cool.
>Any proper in depth guide to get LLM setup with image + sound generation?
SillyTavern has options for both I think, but I don't use it. Just click on buttons until something happens.
>I prefer to use deepseek if possible,
kek
>And this guide is shit
>Is there configuration needed?
Yes. It needs to know where to connect to. Just use kobold's built-in webui until you know what you're doing to see if you even like these things.

Anonymous
01/12/26(Mon)21:07:05 No.107847503

Anonymous 01/12/26(Mon)21:07:05 No.107847503

>>107847425
Ok so, Broken-Tutu is actually on 2506 instead of the 2501 listed.

Anonymous
01/12/26(Mon)21:11:19 No.107847536

Anonymous 01/12/26(Mon)21:11:19 No.107847536

>2025
>Japanese LLMs still suck

Anonymous
01/12/26(Mon)21:13:44 No.107847552

Anonymous 01/12/26(Mon)21:13:44 No.107847552

>>107847503
Mistakes in the model card always bodes well for the quality of the model

Anonymous
01/12/26(Mon)21:14:47 No.107847561

Anonymous 01/12/26(Mon)21:14:47 No.107847561

>>107847536
Most japanese consumers are still using core 2 duo-era hardware, there's zero incentive for them to release models.

Anonymous
01/12/26(Mon)21:22:06 No.107847605

Anonymous 01/12/26(Mon)21:22:06 No.107847605

>>107847552
So far it's actually doing pretty good.

Anonymous
01/12/26(Mon)21:25:42 No.107847625

Anonymous 01/12/26(Mon)21:25:42 No.107847625

>>107847605
Load up regular 3.2 to cure the placebo effect

Anonymous
01/12/26(Mon)21:37:28 No.107847698

Anonymous 01/12/26(Mon)21:37:28 No.107847698

Lets compare other UIs you've tried, unless your all boomers stuck in your ways.

https://github.com/kwaroran/RisuAI
Risu is okay. I tried it cos it supported the charx format had multiple expression packs and auto replaced expression.png with the correct image in the pack. Nicer UI but less customization options. It lost my message on refresh tho, silly would never do that.

https://github.com/vegu-ai/talemate
Choose your own adventure style, uses agent style step by step actions, at 15 tk/s felt like ages to get to my turn. It has a mini auto generated memory didn't use it long enough enough to make use of it. Wasn't a big fan of of the style personally.

Anonymous
01/12/26(Mon)22:11:31 No.107847879

Anonymous 01/12/26(Mon)22:11:31 No.107847879

>>107847698
>at 15 tk/s felt like ages to get to my turn
That's the problem with agents. Anyone serious enough about llms already has a multigpu rig with shit t/s and will absolutely refuse to use small models. Those who aren't serious wouldn't bother with agents anyway

Anonymous
01/12/26(Mon)22:22:58 No.107847943

Anonymous 01/12/26(Mon)22:22:58 No.107847943

We need significantly better hardware to do agentic tard wrangling with the current models, or better models that don't require tard wrangling. Both options are years away. It's a very depressing hobby

Anonymous
01/12/26(Mon)22:31:07 No.107847974

Anonymous 01/12/26(Mon)22:31:07 No.107847974

>>107847458
you need 10 3090s in a single machine if you want to run a Q2 of deepseek.

Anonymous
01/12/26(Mon)22:31:49 No.107847978

Anonymous 01/12/26(Mon)22:31:49 No.107847978

File: ba74300e-78bd-4f6c-a081-2(...).png (1.25 MB, 762x1171)

1.25 MB PNG

Anonymous
01/12/26(Mon)22:33:40 No.107847989

Anonymous 01/12/26(Mon)22:33:40 No.107847989

>>107847458
Downlaod ollamma
ollama run deepseek-r1

Anonymous
01/12/26(Mon)22:39:24 No.107848023

Anonymous 01/12/26(Mon)22:39:24 No.107848023

Still GLMSEX
Still Nemo

Anonymous
01/12/26(Mon)22:43:28 No.107848041

Anonymous 01/12/26(Mon)22:43:28 No.107848041

>>107847978
sex with russian alcoholic miku

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.