/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108362305 & >>108356979►News>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108362305--llama.cpp reasoning budget sampler breaking tool calling workflows:>108363630 >108363637 >108363647 >108363707 >108363721 >108363731 >108363741 >108363776--Reasoning budget sampler for controlling Qwen 3.5 token usage:>108362684 >108362795 >108363032 >108363053 >108363081 >108363112 >108363151 >108363187 >108363198 >108363229 >108363317--Google releases WAXAL African language speech dataset amid Gemma 4 delays:>108362761 >108362813--High-memory LLM configurations and GPU utilization:>108364020 >108364064 >108364392 >108364404 >108364422 >108364455 >108364481 >108364549 >108364598 >108364503 >108364926 >108365150 >108366414 >108366536--Mistral-Large-3-675B-Instruct-2512 model obscurity and technical details:>108365246 >108365259 >108365294 >108365285 >108365426--Voice conversion methods and limitations with Qwen3-TTS:>108363196 >108363211 >108363225 >108363263 >108363267 >108363290 >108363378--Performance differences between llama-cli and llama-server:>108363483 >108363549 >108363644 >108364517 >108364542 >108364669--Qwen3.5-27B performance discrepancy due to quantization confusion:>108367280 >108367297 >108367305 >108367311 >108367328--String ban robustness and regex ban PR for ik_llama.cpp:>108363666--Comparing bare metal and VM performance benchmarks:>108364326--Anthropic and Meta lobbying for AI regulations:>108362986--MCP server persistence issues with llama.cpp frontend:>108363692--PocketTTS.cpp Windows compatibility fixes shared:>108365171--Miku (free space):>108365163 >108366572 >108366629 >108367228 >108366923 >108367052►Recent Highlight Posts from the Previous Thread: >>108362965Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>ai models are getting more and more intelligent by timethen why these smart models cant describe something as simple as smell of mikupussy anymore? bring a 2023 model and ask it to describe the smell of mikupussy and see it yourself
>>108368243>tl;dr ozone and leeks with a hint of musk and vanillaWhat is it supposed to smell like?
>wow, this model puts out some sweet writing I could never do myself>writes plot like a womancan't have everything
>>108368283>What is it supposed to smell like?no idea. just wanna see the models describe it
>>108368329they seem to be struggling to do that and the newer the model is, the lesser it suits to my tastes
>>108368283like the short circuit from a dumb boomer at starbucks dropping their coffe onto their laptop
>>108368309nevermind>(also, please don't write the plot like a woman. The prose is good, but try to stay consistent with the themes. No, X won't come back apologizing next day (Y will have to reach him), and no, Y won't magically understand everything instantly)I can't believe it worked
When they will start installing dedicated ai cp on each phone and pc?
>>108368469>cpWhen pedophiles start ruling the world. Wait...
>>108368469when they want you gone and cant find anything to get you on
>>108368469>ai rm -rf
>>108368475Debunked.
where fears and lies melt away music will tie wonk uoy naht noitceffa erom deen i
>>108368243
>>108368672slop
>>108368679>slopDefine it.
>>108368469The FBI and CIA has been doing this to troublemakers for years. If they really want you then they are going to get you.
Not local, but anyone knows why I can't use GPT-5.4 Pro on openrouter? It says I have insufficient credits but my balance is positive
>>108368733>>>/g/aicg/
>>108368243Mikupussy smells like BLACK BULL semen
What's the advantage of saving the cache? The model still needs to reprocess everything, no?
>>108368746Now this is slop.
>>108368753>What's the advantage of saving the cache?Not having to reprocess the whole thing.>The model still needs to reprocess everything, no?Not if you have/load a previous cache.But are you talking about the rnn/ssm state from the new qwen models or the save/restore you can do with the /slots/n/action={save|restore} endpoint? Both should work.
>>108368746Without last 3 lines, I like.
Moonshot will announce Kimi K3 on GTC on March 18th
Hunter Alpha and Healer Alpha are both from Zhipu
lol, breakages caused by the vibeshitter are endless and still are to be fully fixed, this one must have flown under the radar because almost none of us run models like Kimi locally. If you use more uncommon models, you'd be better off not merging any of the parser related commits still.this is the power of agentic niggers and claude code. this is why we must gatekeep this thread away from telling people how to vibecode. they need to eat razor blades instead.
>>108368243Can't get the ozone out of it.
>>108368835I wouldn't mind if these were DS because 3.0 was rather crappy, and they tuned it into greatness. Unless they're back to being completely unmemorable like the pre-3.0 era (though I know this is /lmg/ and some anons used their small coder models), 4.0 can be uninspiring but technologically novel and they'll bring it home with 4.1 or R2.
>>108368848That shit should have never been implemented on the server. That's client-side stuff. The problem started before he got involved, but he's definitely not helping.
how do i make qwen3.5 27B not think for 10000 tokens?
>>108368672>no ozonetrash
>>108368929good system prompt + pwilkin's new vibeshitted reasoning budget + end phrase :DI LOVE VIBEGARBOJ
>>108368929turn the reasoning off with edited template
>>108368929Prefill <think></think>
>>108365171Thanks for the information/update. I almost missed this because I was working on ASR-related AI stuff the other day. I pushed the changes into the main repo with some minor edits. Onnx runtime should now default to using the more updated version cmake pulls by default, so you won't have to pull in the dll yourself manually.Very interested to see what the performance looks like on other machines. If you could share a screenshot of the --profile and include what CPU you have for reference I'd greatly appreciate it.https://github.com/VolgaGerm/PocketTTS.cpp>>108368198Also thanks anon for the threadly qrd. I would have missed the update otherwise, lel.