/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108362305 & >>108356979►News>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108362305--llama.cpp reasoning budget sampler breaking tool calling workflows:>108363630 >108363637 >108363647 >108363707 >108363721 >108363731 >108363741 >108363776--Reasoning budget sampler for controlling Qwen 3.5 token usage:>108362684 >108362795 >108363032 >108363053 >108363081 >108363112 >108363151 >108363187 >108363198 >108363229 >108363317--Google releases WAXAL African language speech dataset amid Gemma 4 delays:>108362761 >108362813--High-memory LLM configurations and GPU utilization:>108364020 >108364064 >108364392 >108364404 >108364422 >108364455 >108364481 >108364549 >108364598 >108364503 >108364926 >108365150 >108366414 >108366536--Mistral-Large-3-675B-Instruct-2512 model obscurity and technical details:>108365246 >108365259 >108365294 >108365285 >108365426--Voice conversion methods and limitations with Qwen3-TTS:>108363196 >108363211 >108363225 >108363263 >108363267 >108363290 >108363378--Performance differences between llama-cli and llama-server:>108363483 >108363549 >108363644 >108364517 >108364542 >108364669--Qwen3.5-27B performance discrepancy due to quantization confusion:>108367280 >108367297 >108367305 >108367311 >108367328--String ban robustness and regex ban PR for ik_llama.cpp:>108363666--Comparing bare metal and VM performance benchmarks:>108364326--Anthropic and Meta lobbying for AI regulations:>108362986--MCP server persistence issues with llama.cpp frontend:>108363692--PocketTTS.cpp Windows compatibility fixes shared:>108365171--Miku (free space):>108365163 >108366572 >108366629 >108367228 >108366923 >108367052►Recent Highlight Posts from the Previous Thread: >>108362965Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>ai models are getting more and more intelligent by timethen why these smart models cant describe something as simple as smell of mikupussy anymore? bring a 2023 model and ask it to describe the smell of mikupussy and see it yourself
>>108368243>tl;dr ozone and leeks with a hint of musk and vanillaWhat is it supposed to smell like?
>wow, this model puts out some sweet writing I could never do myself>writes plot like a womancan't have everything
>>108368283>What is it supposed to smell like?no idea. just wanna see the models describe it
>>108368329they seem to be struggling to do that and the newer the model is, the lesser it suits to my tastes
>>108368283like the short circuit from a dumb boomer at starbucks dropping their coffe onto their laptop
>>108368309nevermind>(also, please don't write the plot like a woman. The prose is good, but try to stay consistent with the themes. No, X won't come back apologizing next day (Y will have to reach him), and no, Y won't magically understand everything instantly)I can't believe it worked
When they will start installing dedicated ai cp on each phone and pc?
>>108368469>cpWhen pedophiles start ruling the world. Wait...
>>108368469when they want you gone and cant find anything to get you on
>>108368469>ai rm -rf
>>108368475Debunked.
where fears and lies melt away music will tie wonk uoy naht noitceffa erom deen i
>>108368243
>>108368672slop
>>108368679>slopDefine it.
>>108368469The FBI and CIA has been doing this to troublemakers for years. If they really want you then they are going to get you.
Not local, but anyone knows why I can't use GPT-5.4 Pro on openrouter? It says I have insufficient credits but my balance is positive
>>108368733>>>/g/aicg/
>>108368243Mikupussy smells like BLACK BULL semen
What's the advantage of saving the cache? The model still needs to reprocess everything, no?
>>108368746Now this is slop.
>>108368753>What's the advantage of saving the cache?Not having to reprocess the whole thing.>The model still needs to reprocess everything, no?Not if you have/load a previous cache.But are you talking about the rnn/ssm state from the new qwen models or the save/restore you can do with the /slots/n/action={save|restore} endpoint? Both should work.
>>108368746Without last 3 lines, I like.
Moonshot will announce Kimi K3 on GTC on March 18th
Hunter Alpha and Healer Alpha are both from Zhipu
lol, breakages caused by the vibeshitter are endless and still are to be fully fixed, this one must have flown under the radar because almost none of us run models like Kimi locally. If you use more uncommon models, you'd be better off not merging any of the parser related commits still.this is the power of agentic niggers and claude code. this is why we must gatekeep this thread away from telling people how to vibecode. they need to eat razor blades instead.
>>108368243Can't get the ozone out of it.
>>108368835I wouldn't mind if these were DS because 3.0 was rather crappy, and they tuned it into greatness. Unless they're back to being completely unmemorable like the pre-3.0 era (though I know this is /lmg/ and some anons used their small coder models), 4.0 can be uninspiring but technologically novel and they'll bring it home with 4.1 or R2.
>>108368848That shit should have never been implemented on the server. That's client-side stuff. The problem started before he got involved, but he's definitely not helping.
how do i make qwen3.5 27B not think for 10000 tokens?
>>108368672>no ozonetrash
>>108368929good system prompt + pwilkin's new vibeshitted reasoning budget + end phrase :DI LOVE VIBEGARBOJ
>>108368929turn the reasoning off with edited template
>>108368929Prefill <think></think>
>>108365171Thanks for the information/update. I almost missed this because I was working on ASR-related AI stuff the other day. I pushed the changes into the main repo with some minor edits. Onnx runtime should now default to using the more updated version cmake pulls by default, so you won't have to pull in the dll yourself manually.Very interested to see what the performance looks like on other machines. If you could share a screenshot of the --profile and include what CPU you have for reference I'd greatly appreciate it.https://github.com/VolgaGerm/PocketTTS.cpp>>108368198Also thanks anon for the threadly qrd. I would have missed the update otherwise, lel.
What an absolutely worthless thread we have today. I hope blacked miku spam returns to show mikutroons their place.
be the chang you want to xi
>>108368283That is a trick question. The vocaloid's pussy is actually a dick. The riddle demonstrates how deeply ingrained gender roles are in society, often causing people to assume that a long green haired person is actually a woman when in reality it is a troon.
>>108369090kek
anyone use local for real work and not just fucking around?
>>108369108I used to use it for RP, nowadays it's mainly for personal information (like some law stuff, or finance stuff since I'm investing) obviously with web search / fetching.For my own projects free tiers fo gemini are usually enough (gemini pro / flash), never ran out of flash usage.For actual work at my company, we have the company provided Amazon Q with sonnet 4.6 (no opussy because they're big nosed sadly)
>>108369108GLM 4.7 is perfectly fine for programming
>>108369108I've been using it recently for asking stupid programming-related questions and generating example snippets.I copy-pasted a Javascript SSE parser out of it, which isn't really complicated but it's less thinking to read and fix the solution (e.g. the AbortController was instantiated, but not plumbed through to fetch) than to write it from nothing.It's debatable whether you could call anything I do "real work", though.
>vibe-ported Qualcomm charge control from Android to Linux using Qwen3.5-35B-A3Bwish me luck, my phone about to turn into Galaxy Note 7
>>108369180cellphones sure would be more useful if you could just boot linux on them
>>108369180You should use 27B unless you REALLY need speed.
>>108369206It is such a weird thing how dense model fetish was created by frivolous 3090 purchases.
>>108369205You can though you just need specific phones
>>108369245yeah but I mean all of them like you can install it on a pc instead of windows, the list of phones that exist vs ones you can run linux on is microscopic
>>108369205yeah, the Android kernel support model is so retarded.Thankfully a few older SoCs are pretty well supported on upstream Linux, you can boot mainline Linux on them, even stuff like GSM, GPS and hardware acceleration work. They are all still buggy though, so close yet so far from making it a daily-drive'able phone.>>108369206nah, i'm on 1060
>>108369180>Qualcomm charge controlWhat are you doing, exactly. Are you trying to get wireless charging controls working on your desktop or something? I don't get it.
My demo of Moonshinev2 ASR.https://files.catbox.moe/t5tr26.webm
>>108368825Do you think it's Hunter or Healer? Gotta be Hunter, right?
>>108369287Cool
>>108369273disabling charging after reaching a certain percentage to not wear down the battery. Linux already has current control for this Qualcomm charger, but there's a separate on/off charging bit that never got implemented (but it is used by Oneplus Android Kernel) that from that i've read could disable battery charging entirely and allow to power the SoC without funneling all the power through the battery first.Should prolong the battery life if it really works like that. Batteries for older phones are a commodity, original replacements are still sold by Oneplus, but they are all new-old stock from 2020 that already sit at 0% at some warehouse and degrade.
>>108369289>>108368835
talking head sota?
>>108368825God I wish we got an upgrade to K2.5 that fixes its abhorrent writing style. Right now I'm stuck between GLM5 for writing and K2.5 for image recognition/vision. If they fix K2.5's stupid ADHD style of writing, it'd be close to endgame for me.
>>1083688354.9 pls. Just a different slop profile and about 500% less determinism please.