/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107838898 & >>107834480►News>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107838898--Paper: Prompt Repetition Improves Non-Reasoning LLMs:>107841511 >107841558 >107841788--Papers:>107840737--Test-time training and beam search potential in open models:>107839993 >107840870 >107840929 >107840944 >107840948 >107841079 >107841107 >107841130 >107841188 >107841193 >107843956 >107844098 >107844297 >107844760--Adapting Microsoft TinyTroupe for local multiagent simulation with koboldcpp:>107840877 >107840941 >107841028 >107841046 >107841313 >107842113 >107843658 >107843909 >107844229--Context caching and efficiency in SillyTavern/LLM interactions:>107841026 >107841049 >107841057 >107841086 >107841105 >107841142--AI character interface development with animation control features:>107841569 >107841591 >107841609 >107841593 >107841614 >107841636 >107841685 >107841771 >107841794 >107844857 >107841645 >107841648 >107841651 >107841655 >107841751 >107841760 >107841789 >107841844 >107841925 >107842016 >107843335 >107843377--Cost and hardware considerations for multi-3090 AI rig construction:>107840180 >107840249 >107840309 >107840596 >107840633 >107840640--RAG explained as document chunking and embedding for context augmentation:>107841899 >107841939 >107842005 >107842027 >107844296 >107844327 >107844468 >107842015 >107842046 >107842099 >107842082--AI flaws vs emotional simulation and 3D model tech discussion:>107842172 >107843286 >107843328 >107843393 >107843528 >107843592 >107845182 >107845226 >107845255 >107846236 >107843907 >107844059 >107844099 >107844179 >107844262--llama.cpp memory split regression issue after update:>107840161 >107840177--ik_llama.cpp PR adds customizable string/regex token banning:>107843501--Miku (free space):>107840633 >107840665 >107842172 >107843286 >107843393 >107843911 >107845663 >107845698 >107846236 >107844824►Recent Highlight Posts from the Previous Thread: >>107838903Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107847336>>107847336>>107847336
>>107847320I would drill kasane's tetos.
>>107847349>Am I retarded?Probably.>where the fuck do you find the mmproj for mistral small 3.2 2501?What stops you from making it yourself? Is it not supported?
>>107847379It's not multimodal.
>>107847396Yeah. I was just checking. He is retarded, then.
>>107847349>Am I retarded?Yes. Go to bartowski's 3.2 page, click the files tab and ctrl+f mmproj.
>>107847379>>107847396>>107847409Either Broken-Tutu isn't actually using 2501 as it says or this description is pure LLM hallucination.Either ways fuck ReadyArt.
>>107847425Vision was added in 3.1+3.2. '2501' refers to 3.0 which does not have vision.
Reminder that you shouldn't use abliterations. Just don't be lazy and properly prompt with the BASED models.
>>107847320fateto
>>107847425>mergeYou should have started there, retard. Next time link to the model.
>>107847320Just got a used 3090 with 24GB VRAMAny proper in depth guide to get LLM setup with image + sound generation?I prefer to use deepseek if possible,And this guide is shit, how does sillytavern communicates with koboltcpp? Is there configuration needed?>ooba/koboldcpp as your backend>sillytavern as your frontend>go to huggingface and download nemo 12b instruct gguf. Start with Q4.>load into ooba/kobold>in sillytavern, select Mistral v3 tekken context >template and instruct template>Temp 0.8>MinP 0.02>Rep Pen 1.2
>>107847458>Just got a used 3090 with 24GB VRAMCool.>Any proper in depth guide to get LLM setup with image + sound generation?SillyTavern has options for both I think, but I don't use it. Just click on buttons until something happens.>I prefer to use deepseek if possible,kek>And this guide is shit>Is there configuration needed?Yes. It needs to know where to connect to. Just use kobold's built-in webui until you know what you're doing to see if you even like these things.
>>107847425Ok so, Broken-Tutu is actually on 2506 instead of the 2501 listed.
>2025>Japanese LLMs still suck
>>107847503Mistakes in the model card always bodes well for the quality of the model
>>107847536Most japanese consumers are still using core 2 duo-era hardware, there's zero incentive for them to release models.
>>107847552So far it's actually doing pretty good.
>>107847605Load up regular 3.2 to cure the placebo effect
Lets compare other UIs you've tried, unless your all boomers stuck in your ways.https://github.com/kwaroran/RisuAIRisu is okay. I tried it cos it supported the charx format had multiple expression packs and auto replaced expression.png with the correct image in the pack. Nicer UI but less customization options. It lost my message on refresh tho, silly would never do that.https://github.com/vegu-ai/talemateChoose your own adventure style, uses agent style step by step actions, at 15 tk/s felt like ages to get to my turn. It has a mini auto generated memory didn't use it long enough enough to make use of it. Wasn't a big fan of of the style personally.
>>107847698>at 15 tk/s felt like ages to get to my turnThat's the problem with agents. Anyone serious enough about llms already has a multigpu rig with shit t/s and will absolutely refuse to use small models. Those who aren't serious wouldn't bother with agents anyway
We need significantly better hardware to do agentic tard wrangling with the current models, or better models that don't require tard wrangling. Both options are years away. It's a very depressing hobby
>>107847458you need 10 3090s in a single machine if you want to run a Q2 of deepseek.
>>107847458Downlaod ollammaollama run deepseek-r1
ollama run deepseek-r1
Still GLMSEXStill Nemo
>>107847978sex with russian alcoholic miku