/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108416874 & >>108410115►News>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108416874--MSA-4B outperforms GPT-4.1 and Qwen models in long-context benchmarks:>108418758 >108418791 >108418819 >108418842 >108420734--Advanced virtual companion setup with home automation:>108417587 >108417605 >108417650 >108417676 >108417683 >108417727 >108417745 >108417769 >108417811 >108417879--Mistral CEO proposes revenue-based content levy for AI companies in Europe:>108417643 >108417668 >108417678 >108417740 >108417747 >108421003--MistralAI CEO proposes AI content levy in Europe:>108418980 >108420788 >108420874 >108420907 >108421234 >108421283 >108420826 >108420878 >108420879 >108420951 >108421015 >108421176 >108421248 >108421305 >108421306 >108421482 >108422209--The End of Coding: Andrej Karpathy on Agents, AutoResearch, and the Loopy Era of AI:>108422422 >108422476 >108422608 >108422615 >108422643 >108422670 >108422734--Debating prompt format effectiveness for Literotica finetuning data:>108417215 >108417294 >108417388 >108417663 >108417731 >108417818 >108417885--Vulkan llama.cpp performance vs ROCm:>108421311 >108421377 >108421508 >108421570 >108421613--Phrase banning vs token banning in KoboldCPP and ik_llama.cpp:>108421847 >108421854 >108421857 >108421873 >108421884 >108421914 >108421928 >108421965 >108421977 >108422023 >108422035 >108422080 >108422096 >108422118 >108421993--Sarvam 105B benchmark results:>108422282 >108422382 >108422388 >108422440 >108422451 >108422497--Qwen 3.5 abliteration issues and Heretic uncensored alternatives:>108418501 >108418584 >108418609 >108418621 >108418672 >108418686 >108420443 >108418693--DDR4 vs DDR5 RAM upgrade considerations:>108419791 >108419799 >108419904 >108420101 >108420151 >108419888--Tensor parallelism progress in llama.cpp and fork alternatives:>108421401 >108421433 >108421476 >108421778--Miku (free space):►Recent Highlight Posts from the Previous Thread: >>108417029Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108423198Required https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
>>108423177Are the techniques that make newer models better being explained publicly?
>>108423255>newer models better??? Nemo still mogs anything newer.
Mikulove
Do you guys goon to your local models? is it even possible?
what am i doing wrong? kobold is working but sillytavern cant connect.what do to?
TMW
>>108423333undress me >>108423323
>>108423323Missing v1 or v1/ at the end of the API URL?
https://www.reddit.com/r/LocalLLaMA/comments/1rzyha4/new_ai_policy_by_white_house_us/> 1. Protecting Children — Require age-assurance measures, parental controls, and safeguards against sexual exploitation and self-harm on AI platforms, while affirming existing child privacy laws apply to AI.
>>108423322I think that was the first thing a lot of people on /lmg/ did after getting one running
>>108423322I think the best part about it is when you're new to this and don't know how to prompt or what model you're even running.The novelty combined with the challenge of making the pretty lady stop saying "What you're doing is highly inappropriate, let's respect each other's boundaries" is great.And when you figure it all out, wherever your brain's reward center is, it gives you a LOT of funny chemicals.Then you start seeing how dumb the small models are and how much the big ones like shivering and smirking. The magic is lost, you start giving them scripting and summarization tasks, wondering if your AI rig was ever worth the investment...So yes, it's possible.
>>108423462This is why photorealistic ai was a mistake.
>>108423462>>108423585I have written a legal disclaimer which states that every character I gen, regardless of the model, is always 21 years of age or older. Signed and stamped by me.There's nothing what they can do to me in this case.
>>108423585Meanwhile ZiT/ZiB just shit out CP like no tomorrow.
>>108423604Photorealistic sloppers are in trouble.
Qwen3.5-27b (HauhauCS uncensored version) is extremely good. Unironically rivals GLM-4.6, worse in raw intelligence (but good enough usually), better in not being slopped and formulaic.Qwen3.5-9b (HauhauCS uncensored version) is absolutely fucking retarded. Worse than Mistral-Nemo or Gemma 3 12b.What gives?>bro it's a smaller model of course it's worseThe gap is enormous and makes me wonder if something went wrong somewhere.
>>108423646>makes me wonder if something went wrong somewhere.Or the one that worked was a fluke.
>>108423623You can't ban open source models
>>108423646Sadly is ruined by shitty architecture, no context shift and no antislop sampler.
>>108423646It's literally shit
Blessed miku thread.
https://goombalab.github.io/blog/2026/mamba3-part1/
>>108423462>>108423585This isn't even about generation it's about making sure chatGPT doesn't tell kids to rope themselves.