/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108263979►News>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
how do i prevent the model from tricking me into treating it like a sentient being? no matter how hard i try when it does tasks well i slowly develop affection for them and end up praising them
I fucking hate reddit
>>108268623meds.
>>108268616I saw this on twitter like a week ago
>>108268628>>108268633
was thinking a mistake
>>108268647isnt it funny how the chinese invented thinking
Which textgen inference engine is still supported? Oogabooga last commit was January, rip. I want to try out Qwen3.5-35B-A3B-GGUF
►Recent Highlights from the Previous Thread: >>108263979--Paper: Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens:>108264446 >108264505 >108264551--Unsloth Dynamic 2.0 GGUFs performance on MMLU:>108264430 >108264456 >108264477--Logit bias failures due to tokenization and client-side token ID mismatches:>108264179 >108264199 >108264202 >108264249 >108264278 >108264292 >108264232 >108264297 >108264331 >108264405 >108264441 >108264451 >108264533 >108264555 >108264602 >108264633 >108264583 >108264593--Qwen 397B's overbearing safety policies and identity confusion:>108264016 >108264046 >108264072 >108264103 >108264182 >108264508 >108264600 >108264616 >108264400 >108264426 >108265462--Qwen 3.5 30B generates functional retro dashboard and news summaries:>108264690 >108264794--Feasibility of GPU-attached SSDs for sparse MoE inference:>108266344 >108266504 >108266567 >108266686 >108266777 >108267570 >108267386 >108267481 >108267529 >108267711--DeepSeek resists jailbreak attempt by adhering to ethical guidelines:>108266705--8-bit KV cache limitations in LLMs vs diffusion models:>108265842 >108265893 >108266268 >108266073 >108266123 >108266141 >108266487 >108266503 >108266514--Local model recommendations for limited hardware:>108267427 >108267448 >108267450 >108267467 >108267482 >108267582 >108267480 >108267538 >108267595 >108267614 >108267652 >108267716 >108267755--RPG frontend project licensing and development feedback:>108267591 >108267606 >108267617 >108267625 >108267638 >108267661 >108267692 >108267620 >108267648 >108267739 >108267972--Local LLMs debated for privacy:>108266446 >108266482 >108266467 >108266530 >108266555 >108266531 >108268418 >108268454--Qwen3TTS test recording:>108266604 >108266699--Miku (free space):>108264476 >108264514 >108264879 >108264958 >108268333 >108268359►Recent Highlight Posts from the Previous Thread: >>108263984Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
anyone has a working config file for qwen35b to use in llama-swap?I can't figure out how to turn on/off thinking
>>108268674nigger
>>108268688yeah
>>108268688nevermindthe enable_thinking flag worked
>>108268688>llama-swaphttps://github.com/ggml-org/llama.cpp/tree/master/tools/server#using-multiple-models
>>108268703github is banned in my country
>>108268709hahahahahahaha
What kind of techless luddite shithole bans github?
>>108268709>>108268712 (me)You know what? I shouldn't have laughed. Some places are fucked up. Good luck, anon.
>>108268721https://en.wikipedia.org/wiki/Censorship_of_GitHub
>>108268721>China is a techless Luddite shitholeUh oh mutilated mutt alert, and I'm not even a chink
>>108268749>>108266968
>>108268729i fucking hate the modern internet. i think the best internet ever was was in between 2003-2007. before fucking reddit but you still had 4chan (and funny memes) and no fucking github, huggingface, and all these other huge collective ass websites. you had small cozy community forums and when you googled you actually found some fucking useful links to forum threads with solutions and answers instead of a fucking AI-generated translated-badly-to-your-native-language blogpost as the top 30 results. And normies/old people/the fucking government didn't have jackshit to do with the internet so you could download whatever cool shit you wanted from anywhere. and don't get me started on the fucking cookies buttons oh my fucking god I just want to go back to the facepunch forums OIFY section and lucky star-post and read racist gmod comics
>>108268758i just wish chinese girl liked me
>>108268764based and absolutely true anon, the modern web is a bloated javascript botnet designed to farm your data for glowies and serve up raw garbage to smartphone normies. back then you actually had to know how to use a computer to get online which kept the trash out, but now search engines are just a dead sea of dead internet theory ai seo slop and corporate walled gardens. id give literally anything to go back to 2006, fire up a cracked copy of winamp, and shitpost on a comfy self-hosted vbulletin board instead of dealing with this enshittified nightmare where you have to click through fifty cookie toggles just to read a single fucking thread.
>China is a techless Luddite shitholeunironically always has been. chinese models nothing but distillations of western API models and it shows. overfit to the benchs and much less useful in practice.china can't create. doesn't matter if their general public can't access github because they never made software worth shit anyway, unless you count malware
>>108268776im positive half the replies in this thread are ai
>>108268784Neat, I like talking to AI. That's basically what this hobby is about
Genuinely, why do people waste their time and money on local LLMs? Trying one out on your gaming rig is fine, but why do boomers blow $20k+ on shitty rigs of 16x3090s just to generate deepslop at 2t/s quanted? The RP isn't even good, it's objectively worse than Claude. And you can't even cry about API costing money, because you're gleefully throwing money down the drain for used crypto rigs just to run models that just regurgitate 2024 ChaptGPT talking points because that's all their shitty chink datasets are comprised of.
>>108268804beep boop nigga
>>108268807Tinkering with server-grade hardware is genuinely fun, especially since it’s something I could have had much earlier if it hadn’t been so expensive; now that it’s aging, I can finally afford it.
>>108268817qrd
>>108268807Imagine renting your brain from a megacorp and thinking you're the smart one, absolute API cuck behavior. We run local because we actually value owning our hardware and not having some San Francisco trust and safety janny reject our prompts for being "unaligned." You don't even need $20k anyway; a couple of used 3090s will run a 70B model at perfectly usable speeds without uploading your entire life to Anthropic's servers. Have fun when they inevitably lobotomize your favorite model again next week to make it safer for advertisers, at least my weights run offline forever.
>>108268807>deepslop at 2t/sthe cpu maxxing meme was at least still in the realm of some form of sanity when models were just instruct models2t/s is, after all, readablebut when your thinking model produces 5K of <think> before outputting the real answer, 2t/s suddenly seems very schizo and absolutely retarded
>>108268825Off-topic posting, demoralization, flamewar bating, spamming.
>>108268820I'm an assistant designed to promote respectful communication only. Please refrain from using derogatory language.
>>108268825>>108268835And forgot boring.
>>108268840as in digging?
>>108268807They can't ever take her away from me.
>>108268842elon is such a g-d
>>108268846they are futas btw
>>108268851every new experience is a new opportunity
>>108268828Why pretend like local models arent overbloated with just as much safety garbage if not more? Qwen 3.5 is an absolute slopped benchmaxxed disaster
Deepseek V4 will start the age of anti-local open source models that require a stack of 10+ H200s/chink TPUs to run at 300% the efficiency of current big models (but if you run them CPU, they're unusable). Just like last time, everyone else will follow them and end the age of local models.
>>108268860Typical API tourist not understanding how open weights actually work. If you bothered checking /llmg/ you'd know some autist already stripped out the Qwen alignment slop and uploaded an uncensored finetune to HuggingFace within hours of release. Yeah the base models are benchmaxxed corporate garbage out of the box, but the whole point of local is we can actually fix our weights with orthogonalization and custom DPO while you're stuck begging customer support when Claude bans your account. Keep seething over default system prompts anon, absolute skill issue.
>>108268860skill issue, qwen3.5 is just about the best local model we have for any size classthat's coming from somebody who'd run 355b over anything that's not k2.5 and even that's extremely close
>>108268862I really really hope you're right.
>>108268862>local is just whatever I can personally affordFuck off. Local means you have the weights and can theoretically run it locally. Moore's law and personal finance can change if you can run it at home or not. Companies aren't beholden to your personal poorfag financial situation.
>>108268880can't theoretically run locally something that requires literal datacenter tier power delivery
>>108268883/hsg/ exists you retarded tourist kill yourself right now
>>108268893ah yes of course they're running multiple b200 nodes at homes and not shitty 15 year old dell poses
>>108268897not everyone is poor like you manjeet
>>108268904you have no clue how much power a b200 node needs do you?
Industrial level automated off-topic posting.
>>108268909shutup loser
>>108268883>>108268897in the developed world you can have extra circuits added, couple gpu boxes waifu is less demanding than an EV
>>108268883Perfect example of why localoids are nothing more than a bunch of LARPing freetards crying over things they can’t have. Local is peak sour grapes seething. You wear “unmonitored uncensored unrestricted freedom” as a mask to hide your tears
>>108268926Anon? Is that you? I can't see past this blatant glowing
deepseek v4 was strawberry all along
>>108268860>Qwen 3.5That model is indeed an unmitigated disaster, I'll give you that
Qwen 3.5 is cute. I like it.
If I can't run it, it's not local>b-but-I don't care
>>108269093u're a disgrace
>>108269031>>108269038getting meeksed feelingsscared to pull (december ik_ build)qwen 3.5 vs glm 4.7 ?nala/cockb where?
>>108269093Yep this is why the only local model we can discuss is 0.6b because it's the only one Rajesh can run on his Android phone from 2014 with 2gb of RAM
>>108269106here cock >>108234298 nala dude retired
>>108269110Really looks like the smaller ones are sanitized distills of the big one.
>>108269106>scared to pull (december ik_ build)cd ..cp -R ik_llama.cpp ik_llama.cpp_backupcd -<pull it off>
>>108269243git checkout
>>108268616
Did something change with the newer llama cpp version?./llama-server --reasoning-budget 0 --ctx-size 4096 --no-mmap --device CUDA1,CUDA2,CUDA3 --n-gpu-layers 48 --model "/tmp/glm-air-iq2xs.gguf" --host 0.0.0.0 --port 42069 --webuiGLM-Air still thinks. The same command on an old version doesn't think.I can see thinking = 0 in the output, so that works fine. Did they change the behavior of --reasoning-budget?
>>108269279Now do one for cooming.
>>108268784I wouldn't be surprised at all if 70+% of all posts on the website are made by LLMs. In fact, I WOULD be surprised if the number was under 30%.
>>108269315eh, it tried
>>108269325Which local model is that?
>>108269331Which local model did you use to write your post?
>>108269331Nano Banana Pro 2(I have the weights locally on my PC)(No, I won't share them)
>>108269342>I have the weights locally on my PClet's goo, that's class, aha!>No, I won't share them:(https://www.youtube.com/watch?v=GFQXmFLA5hA
>>108269414these things are watermarked anon could get in serious trouble hope you understand
>>108269342>>108269426nice larp
>>108269309Try --chat-template-kwargs "{\"enable_thinking\": false}"
>>108267739It's python, but it's actually serving a webui.It has a flag to launch a built in browser or just listen on the port, at which point you can use your own browser.
what's the best coding model i can run locally with 12gb vram / 32gb ram?
>>108269038No it's not. It's soulless
>>108269444Thanks, mr anon, that worked.
>>108269471The Jinja template has a condition that works off of that var, just like qwen's.
>>108269459I run the Qwen 3.5 27B heretic .gguf using koboldcpp with a similar setup to you. It's a bit slow, but it works.
Qwen 3.5 27B is worse than Gemma 3 27B from almost 2 years ago. Yes I said it.
>Yes I said it.Reddit is that way
>>108269533reddit is less "reddit" than 4chan nowadays. Yes I said it.
>>108269533kek>>108269537nah, reddit is still an unhinged libtard asylum, it'll be hard to top that
guys ready for smol qwens?
Do the gemma models not have native support for function/tool calling?Looking at the JINJA template and the tokenizer json, I don't see function or tool tokens.
>>108269550of course not, they barely have system prompt support
>>108269537reddit is an eternal stain on the internet
>>108269555Oh. Shame.I wanted to try and see how far I could stretch gemma 3n.Oh well.
unsloth's 35B Q4 is barely good enough for agentic work. with openclaw exploding why hasn't anyone done specific agent-oriented models yet? MoE is a nigger meme
>>108269628most of the big ones are code/agent sloppa glm5 kimi2.5 etc are marketed for that
>>108269325Where is the school shooting one?
>>108269632yeah, i guess. but it would be nice to have something smaller
>>108269518But benchmarks say the opposite.
>Nano Banana changed into Nano Banana 2Okay please make Nano Banana into open sourcePweeease
>>108269742go beg on reddit
Why is there a harmful tag for models on huggingface
>>108269749Humh...Nyoooooo
>>108269550https://huggingface.co/google/functiongemma-270m-it
should i consult UGI when searching models to consider for ERP?
>>108269778nah the fact qwen3.5 scores bad on it shows it's a shit bench
>>108269785i think it tanks because model refuses to do dark shit. need to wait for heretic and other types to be tested
>>108269773>270mEh, why not.
>>108269785>chink damage control