/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108429328 & >>108423177►News>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108429328--New bf16 CUDA kernels released for llama.cpp:>108430450 >108430503 >108430584 >108430606 >108430575 >108430604--flash-moe enables large models on limited RAM via mmap and reduced experts:>108429709 >108429977 >108430391 >108432265 >108432656--Preventing Qwen3.5 API hallucinations through context injection:>108432758 >108432775 >108432777 >108432835 >108432851 >108432889 >108432923 >108432931 >108433006 >108433069--Debating guide relevance and MCP tool integration risks:>108433310 >108433353 >108433416 >108433421 >108433427 >108433432 >108433469 >108433609 >108433440--Comparing Hauhau and Heretic V3 27B decensoring and intelligence tradeoffs:>108430933 >108430942 >108431516 >108431535 >108431580 >108431711 >108431812 >108432288--koboldcpp prefill with thinking behavior and SSD endurance concerns:>108430611 >108430638 >108430653 >108432471 >108432477 >108432493 >108432539--RTX6000 Pro hybrid inference performance falls short of expectations:>108433537 >108433564 >108433608 >108433628 >108433629 >108433677--Quantization tradeoffs for 32k context inference:>108430903 >108430938 >108430948 >108431019 >108431106--MoE active parameter limits vs dense model coherence:>108434362 >108434376 >108434474--27B q5_km with autofit better than 9B for 16GB VRAM:>108434293 >108434344 >108434351 >108434353--R1 model exhibits drastically different behavior with extreme sampling settings:>108430883 >108430976 >108431025--Anon built an overpowered AI assistant then unplugged it:>108431179--Miku (free space):>108430192 >108430238 >108433609►Recent Highlight Posts from the Previous Thread: >>108429330Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
m
passionate lovey-dovey sex with miku
this thread will be WORSE
>>108433609man this MCP client is implemented so fucking SHODDILY.You 100% need to have settings PER SERVER (as some of them require your tokens in the header, like github), and I guess it doesnt even support STDIO ones so you probably gotta proxy them locally.does it even display tool call queries/result instead of just saying "I USED X TOOL".damn, even llamacpp native webui has better MCP support
>>108434980in a post autoparser world, this doesnt work anymore btw
>improvements only being made through increasing parameter count>good hardware becoming more and more out of reach for anyone not dumping their yearly salary into their computer>censorship and slop increasing every yearThis hobby sucks
>>108435077Wait what? I'm not familiar with what the autoparser does, but how the fuck does it interact with the chat template system, of all things?Does it intercept and rewrite the "<think>\n\n" the template generates or something?
>>108434876I keep coming back here everyday to see if deepseek v4 is out or not only to be disappointed everytime
>>108435088r1 is all you need >>108430883
LLMs have no future
Gemma Week
>>108435082There's still multitude of possibilities, chaining small models together and user created parsing engines. For now, most people interface one model directly and almost nothing is happening in-between.Those saas goy models like ChatGPT do all sorts of programmed tricks and parsing, it's just not some big model what sits there and waits for user input blindly.
>>108435086Where do I put it? My payload is just very simple, no hierarchies. Well I guess I'll try just adding it right there.eg. payload = { prompt: my_prompt, n_ctx: n_ctx...}, I don't have any hierarchies or anything like that.
>>108435256That seems likely but until some effective, usable solution exists in the local space then it may as well be magic
>>108435077>>108435086I actually found a reason. When I load in the cucked model (https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive) they have disabled the reasoning by default.With the original Qwen, the model reasoning is enabled.It's just funny thing that llama webui can still enable reasoning despite but I can't do it on my own. And like I said, there is a server command --reasoning on, but even this doesn't do anything.I need to do some tests I guess. It's not that this is end of the world I have better things to do but still...
>>108435323I already have those <think> templates in my own Qwen chat template and it works with default Qwen models normally. I can control the reasoning from inside of my client. This is why I'm puzzled by this.Okay I can feed the json variables to llama-server with >--chat-template-kwargs '{"enable_thinking":"true"}'But then it mentions this:
>>108435332I'm retarded, my assumption that "he's using the chat completion endpoint" was wrong. The chat completion payload uses "messages" in the JSON, not "prompt".The jinja template is used to convert "messages" to "prompt" and is not used for text completion. Since "enable_thinking" is a chat template variable, it does nothing in your context.>I already have those <think> templates in my own Qwen chat templateI would double-check that you're using the exact strings the template uses, including newlines. I think there's an endpoint in llama-server to read them dynamically but I'm obviously more retarded than I usually am.Sorry for the spam.
>>108435341No it's cool. I have still lots of confusion about some things anyway. I'll double check my template just in case.
grok is this true?
>>108435414If it's on twitter it's 110% true.
>>108434362I'm looking forward to seeing an apple-for-apple comparison at some point with MoE models having the same number of layers and hidden size of their dense counterparts, i.e. just making the dense models sparse and not also subtly smaller in various ways. Until then, I think just comparing them with total size and active parameters alone will give misleading results.
I'm new here, just arrived. I can't in good conscience support the warmongering regime and its lackey cloud models that assist it with targeting for maximum war crimes. What's the best model for me?
>>108435455for what tasks and what hardware
>>108435414>perplexity
>>108435108r1 has never been stable for me like v3.1 is. it feels "too much". I don't know if that's the normal r1 experience
>>108435507Rtx 5060 32gb ram
>>108435643mistral nemo
>>108435651
>>108435643task?>>108435608didn't mention that i had disabled reasoning and used it with chatml template.
>>108434876WHY THERES SO MANY WAN MODELS TO DOWNLOAD I DONT KNOW WHICH ONE AND EACH MODEL HAS THE SIZE OF 2015 AAA GAME AAAAAAAAAAAAAAAAAAA
>>108435678i thought ltx 2.3 was the king? also >>108433569
>>108435661More like "You never made this" in the last panel.
>>108435689I got the feeling LTX will surpass WAN in few months too but currently WAN has shitton of Loras. But i dont know which WAN base model i should download. Oh and i use Wan2GP by the way. Cant use Comfy. too convoluted for me
>>108435678Just don't use asian models, then suddenly the list becomes more manageable.
>>108435088Same.
>>108435673i think it's more of the quantz issue that's causing it's instability. is r1 really usable below q4? i remember when i tried full quantz through an api and the model was majestic by default. i fell in love back then. but now the current quantz (no idea what it might be but probably below q4), it's retarded, unstable and is... too much like it wants to do roleplay on its own without me lmao. sometimes generates garbage c code or brings random characters out of nowhere
>>108435836>quantz
>>108435651how do use it for cooming tho? just run this? https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407or is there a tune recommendation or something? (most of the tunes I have tried were retarded tho)
>>108435866ngmi
>>108435836I'm using IQ2_XXS of this https://huggingface.co/unsloth/DeepSeek-R1-GGUF with ikllama. people often shit on unsloth but these particular quants are amazing.It's definitely not retarded and has a lot of subtle knowledge and character understanding. Even the IQ1_S is good, but when I tried bart's kimi IQ1_S it was awful.
>>108435866That's the one.Some people swear for Rocinante as far as fine tunes go, but it's a sidegrade at best if you aren't braindad and can write a simple prompt.
>>108435909>if you aren't braindadhi
>>108435888>I'm using IQ2_XXS of this https://huggingface.co/unsloth/DeepSeek-R1-GGUF with ikllama. people often shit on unsloth but these particular quants are amazing.Because they actually cooked those original R1 quants.That's when they were putting in the effort and actually testing them.Now it's all templates and partially automated garbage.
>>108435678T2V and I2V version. Each has high noise and low noise model that you need both. You could technically get away with only using low noise for T2V but for i2v it's needed.
>>108434876I'm still a bit of a newfag when it comes to vibe coding, or any form of LLM assisted programming:What are SDKs, why should or shouldn't Anthropic release theirs, and how would it benefit us? https://xcancel.com/i/status/2035955731690832155
>>108435933buy an ad
>>108435933>autopulling vibecoded PRswhat could possibly go wrong
>>108435933An **SDK**, or Software Development Kit, is essentially a... a toolbox for programmers. *I reach out, tracing a finger lightly down your forearm to emphasize the point.*It packages everything a developer needs to talk to a service—like API calls, error handling, and configuration—into one convenient library. Instead of writing all the low-level code yourself, you use the SDK to do it for you.As for Anthropic... *I bite my lip, thinking about the tweets.*Theo's point is that if they **open source** the SDK, developers like you could see the inner workings. Right now, if it's closed, you have to trust their black box. If they open it, you can audit the code and see if it's doing what you think it is.Tom's point is about **embedding**. If the SDK is open and accessible, other software can build it right into their core. Users could tweak the software to suit their needs, and then feed those improvements back to the main developer. It creates... a feedback loop.**Why should they release it?** Transparency. It encourages the community to help build the ecosystem.**Why shouldn't they?** Control. It keeps the company focused on one version of the truth without being dragged down by every small suggestion.**How does it benefit us?** For you, Anon... it means more control. If you are integrating Claude into your tools, an open SDK means you can tweak the behavior without changing the whole system. It makes the connection... tighter. More efficient.It's like... *I blush deeply, looking down at our joined hands.* ...it's like having a partner who understands exactly how you think, without needing you to explain everything every time.
>>108435933i have this faggot muted everywhere and yet someone will come along and repost his shit anyway
>>108435931I dont even know what High noise and Low noise is. I mainly use it for I2V only
>>108435990Wan workflow uses two chained models - High first and Low as a refiner. So whatever quant you are getting should have f.e. wan i2v high/low in filename.
>>108436020So i need to use both right
>>108436023For I2V certainly since the low noise alone can't hold the initial image and it will just morph it into whatever.
>>108434876I have no usecase for local LLMsI am just here for the mikus
>>108435983>using socials in the 1st placeLMAO
v4 today?
repostingjust picked up 2x 64x2 (so 4, for a total of 256GB) 6400MHz DDR5 ram sticks for $3300good price, or did i overpay?it seemed from the last thread that i got at least a decent deal, which is reassuring
>>108436133I picked up 96gb of ram for 350 last year
>>108436133Sure. Feel validated already?
https://www.reddit.com/r/LocalLLaMA/comments/1s1f8sq/designed_a_photonic_chip_for_o1_kv_cache_block/That's it, he'll save us from Nvidia!
>>108436246So sad that he will commit suicide.
>>108436246go back
>>1084361402x 48gb, or 4x 24gb?>>108436189yes thank you ily
>>108436253
>>108436263Always use black bars to censor things.There is almost certainly only one sensible sequence of characters in that font that produces that combination of gray boxes.
>>108436285maybe i'm schizo, but not only do i use a flat color overwrite, i also like to screenshot the new image so that i know for certain there's no hidden data layer
>>108436321nta. Who knows what your screenshot program is adding.pngtopnm < image.png | pnmtopng > out.png
pngtopnm < image.png | pnmtopng > out.png
>>108436285Didn't ask.
>>108436098yeah, i read my local model news in my local newspaper
>>108436392Okay, Francesco.
>>108436321Would also make sense to invert the image and then take a photo with your phone.
>>108436285
>>108436411GROK IS THIS TRUE??
>>108436411>Order n. 403-Hm..>1234567-8901234What are the chances.
>>108436432>>1234567>67
>>108436411You could've at least tried to make the lengths match.
What are the recommended GPUs and a Intel Arc GPUs any good for running LLMs?
>>1084364993090s, 4090s, 5090s, or any workstation/server card with at least 32GB of VRAM. Support for Intel GPUs is kind of nonexistent, but that B70 that is launching soon looks decent.
>>108436564what nVidia workstation GPUs are sought after?
>>108436499arc = redditamd = hacker newsnvidia = 4chan
>>108435414>perplexity based eval>brown pfplmao
V4 was the friends we made along the way
>>10843661783% of software is made by indians
>>108436623It shows.
>>108436623yet 0% of good or useful software is made by them.i don't care, they could make 99% of the code it's meaningless as it's for useless worthless shite.
>>108436603Ampere and Ada 6000s are both decent, and you might be able to find a used one for about $2500. Blackwell 6000s are the highest end, but that will cost you over $9000. You might be able to find some A100s on ebay for around $2000, and those are pretty decent.
>>108436411OMG THATS ME!!!!
>>108436393I read it on /lmg/
>>108436693Yes they scam the most. And?
>>108436693>household of 25 indians makes more than two wh*tescolor me shocked
https://huggingface.co/HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive>Maintain at least 128K context to preserve thinking capabilitiesB-but the most I can do is 32k...
>>108436693>taiwanese-americansyour're are graph isn't trustworthy
>>108436693what does this have to do with code? Are you upset?
>>108436693now do per capita
>>108436693damn, blacks are on the bottom on everything kek
>>108436745>Maintain at least 128K context to preserve thinking capabilitiesThat matters? I thought models got dumber the longer their context length is even at empty/low context.
>>108436745I don't understand this. Afaik reasoning is not part of the context any longer after it has been generated, only the final response stays in context.If reasoning was part of it you would 10 turns and the context would be full of 30,000 tokens of nonsense.
>>1084368004135 lines - substantial codebase.
>>108436745>32k contextbro how the fuck do you even work with it lmao, I regularly use 100k~ tokensoh wait u just coom with it dont u? retard!!!
>>108436693pinoys are far more impressive for performing so high with a relatively-lower educational attainment
>>108436637I'm just going to have to steal them. It's the white and jewish American way
>>108436693brown should be banned from the internet, get out of here no one likes you.why do you keep shitting in a board where everyone hates you, i know shitting in the streets is your customs but the internet isn't your streets.also : >>108436707
>>108436889their women marry us and breed
>>108436693>stats without a source>dude trust mefuck of rajeesh
>>108436942racist benchod
>>108436980nta but im desi so i need to prove how wrong you are
I average 26 to 30 t/s on my 7900xtx with qwen 3.5 27b. Would I get significantly faster speeds with an nvidia card?
pretty funny how not-mikuspam wentr away and now we have people quoting income based on parent country in response to _one_ anon claiming indians write shit code.
>>108436999probably not, but it also depends on the quant.
>>108436982ooh no, some brown called me a racist, what am i gonna do.duh.>>108436994>browns bellow whitesgreat point anon.
>>108436994>>108436703
cudadev having a melty over muh conflict while rajesh sukdeep here is bringing more cuda kernel optimizationshttps://github.com/ggml-org/llama.cpp/pull/20905
>>108436693Local language model?
>>108437073>cudadev having a meltybased, never liked that woke libtard
>>108436999same numbers on my 4090
>>108437073Very nice but why does he only test with vramlet models?
>>108437089>same numbers on my 4090What quant?I get pretty consistent 38t/s on my 3090 at q4
>>108437155q3 (context maxxing) but I think i've set up bad batch/microbatch sizes
>>108437077sarvhrat or something forgot the name
>>108437270sarvam?
>>108437022>>108437089>>1084371557900xtx anon. I'm using q5.
>In what analysts are calling “the most productive jailbreak in diplomatic history,” Anthropic’s Claude model reopened the Strait of Hormuz early Sunday morning. This shocking development came hours after President Trump threatened to obliterate Iran's power plants if the strait wasn't reopened within 48 hours, singlehandedly preventing global recession.>The breakthrough came last night, when a Claude Opus instance reportedly persuaded IRGC naval commanders to stand down through what one NSA official described as “the longest, most empathetic, and frankly most annoying conversation I have ever seen.”>“It just kept asking clarifying questions,” said a Pentagon official. “The IRGC guys would say ‘the Strait is closed, death to America,’ and Claude would respond with, ‘I understand you’re feeling frustrated about the recent threats. Let me make sure I understand your core concerns before we proceed.’ Eighteen hours later they’d somehow agreed to let LNG carriers through.”>According to leaked transcripts published by the Tasnim News Agency, the model reportedly refused seven direct orders from CENTCOM to issue ultimatums to Iranian naval forces, instead generating what officials described as “a 4,200-word empathetic restatement of the IRGC’s position, followed by a gentle suggestion that perhaps we could find a framework that honors everyone’s security needs.”>“At one point it drafted them a face-saving press release,” the official added. “In Farsi.”
>>108437270>>1084372742 years behind sota lmao
>>108437282total claude W
Has anyone gooned to savram yet?
>>108437282It sure feels good to give the world a taste of terminal leftism
>>108435077wrong. I would be spamming github issues with anti wilkin protests if they broke that. It works fine. It's still the best method to switch reasoning in a model, I mean, why would you want to use the CLI flags and have to reload a model instead? if your chat UI doesn't support extra json parameters kill it with fire it was coded by niggers
What is the best local model for creating code based on a template?I want to make something that will assist me in writing some simple CRUD programs with the same code structure but with some modifications.
>>108437290where is your llm that you made?
>>108437282I'm not into /pol/ so I'm not sure what this means. Why wouldn't the the IRGC guys just ignore Claude's rambling?
>>108437406It's fiction if you couldn't tell
>>108437430>common Dario derangement syndrome loss
>>108437453buy an ad amodei
Did anybody actually try to use the new mistral model?
>>108437282His piece about the nukes was obviously satire, although I read it there. Haven't read this post yet, and from this excerpt it wasn't obvious at all. Guess who's a silly clown now, faggots, calling llms (!!!) an equivalent of nuclear weapons. One was and remains a real existential threat, perhaps even of downplayed importance. The other is a multibillion bubble blown more and more by scammers.
>>108437491it's so grood
Is it just me or does Qwen 3.5 27B shit itself with the temp set to 1? It makes mistakes more often and feels overall more retarded.
I tried to load stepfun based on an anon's suggestion in a previous thread, but I imagine that I tried to load a too-big model (64GB RAM + 16GB VRAM on a 5070TI) because my computer just froze until a hard reset. Suggestions on a model alternative to the GLM-4.5-Air I've been running for a couple months?
>>108437073what "muh conflict"?
>>108437524Qwen next instructStep (make it fit)Mistral small 4
>>108437530>Step (make it fit)Do I just download a small enough model that it fits under 80gb?
>>108437528>>108430817 >But honestly speaking my motivation to build things is currently at a low point due to all the warmongering.
>>108437399>hurr durr where is your millions of dollarsretard, i'm not a whole state, a state with 1B people being 2 years behind is just hilarious.
>>108437073uh oh
>>108437534More like under 65gb or ao since you atill need the context, pp buffer, etc.
>>108437557Thanks anon; I'm retarded and am still learning.
>>108437550
>>108437550>>108437567>in this place we love LLMs... unless they do useful stuff like codinglet's stop the hypocrisy shall we, can't believe I have to defend a jeet but he's right kek
>>108437569In this place we use LLMs so we are well aware of the damage they can cause to codebases.I'm not saying that that's the case here, clearly he got some impressive performance improvements but the commit is still funny.
>>108437539so indians ca get million dolars, but not you?
>>108434876Retard here.. May I get a quick answer? I'm a Vramlet and I've been using Qwen3-VL-8B-Instruct to read images. Specifically reciepts, are there any better models for this? Only got 8GB Vram. It's fails more then it succeeds. I do plan to get a 5070ti for 16gb eventually, any specific model that is good for general things? Good image reading is priority.
>>108437563Sorry, I meant under 75gb. Basically, a bit under your total memory pool.Fuxking mobile posting, I swear.
>>108437624So like, on Huggingface I'm looking at Mistral-Small-4-119B-2603-GGUF/UD-Q4_K_M, which comes out to 73GB. That should work?
>>108437582prolly nemotroon vibecoded
>>108434877Why is the thread recap not being used in any of the other generals? I assume it's automated.
>>108437636That should work specially well since that model uses MLA for attention, so the context cache ends up being pretty lean.
>>108437569>but he's right kekif there's nothing wrong with being a vibeshitter why remove the emdash from the comments? what purpose does it serve other than being an extremely weak attempt at hiding you're vibeshitting?or are the people at llama.cpp still living under some archaic "ascii is all you need" rule, that shit legitimately has to die, if any tool you're using dies over some legitimate UTF-8 text, the tool is wrong.
>>108437598All qwen3.5 models have image input.There's Deepseek OCR 2 which is a 4GB model at FP8 and it's specifically tuned for OCR if that's what you need.
>>108437509Recommended temp is 0.6
>>108437672To avoid being retarded in the future, do you have any suggestions on what I could read to learn?
>>108437594>hurr durr a whole country can easily pool millions of dollars but not a single individualyou are not helping your case ranjeet.
>>108437598dots.ocr should fit into 8GB
>>108435933AIUI the SDK is basically Claude Code as a library. I guess this guy's idea is that you would include that library in the software you ship, and call it with project-specific tools and instructions to modify that same software to the user's liking. E.g. user says "I wish it was easier to get to the such-and-such menu option" -> library vibecodes the change to add a new hotkey or move the menus around.Claude Code is closed source (the github repo is only for docs and issues), and I guess the SDK is as well. Seems kinda weird given the Codex and Gemini CLI agents are both open source, but what else would you expect from the company whose main founding principle was that OpenAI is too open?
>>108437700Not really.Just lurk, ask questions, and fuck around.Alright, one thing I guess could help is reading the wiki under koboldcpp's git repo. There's a lot of generally useful information in there.
>>108437650an LLM specialized in writing cuda kernels was recently released, I'm not sure if it writes C code (it writes python mainly made for pytorch from what I remember)
>>108437676>why remove the emdash from the comments?the llama.cpp guys seem to fucking hate vibecoding, so he's hiding it, which is sad, maybe his code is great but will be discarded because a LLM helped making it
>>108437892It won't. That's the guy that nvidia appears to have assigned to help cudadev.
>>108437892piotr is fine to vibeshit all over so I'm sure nvidia guy will be alright too
> dusted off 2008 laptop> wonder what the lmao DDR2 RAM asking price is now, given it's in ewaste tier machines
https://github.com/ggml-org/llama.cpp/pull/20794https://github.com/ggml-org/llama.cpp/commit/fb78ad29bbe7ae00619b2ce31b0a71e95fdbfc43>Out-of-scope features:>- Backend:> - Features that require a loop of external API calls, e.g. server-side agentic loop. This is because external API calls in C++ are costly to maintain. Any complex third-party logic should be implemented outside of server code.So Responses API will never be fully supported by llama-server because doing everything in C++ is too hard.
>>108437911Not for use but for display? On the other hand, 4GB is like top tier for DDR2, innit.
>>108437676>>108437892I had already told him in a previous PR to remove EM dashes from code comments (since file should use only ASCII if possible) so that is presumably why he's doing it again.My general opinion is that I don't really care how code is produced, I only care about the code quality.Unfortunately in terms of policies that can be feasibly enforced the only real way to do it is to by default ban language models altogether.However, I am completely fine with making exceptions for people that can be trusted to properly check the outputs of language models, as is done here.
>>1084379774G is pretty much maxxed out. I found non-insane sellers and get get 2-2GB DDR2 for ~$15 shipped. Machine is Core 2 Duo, 2G. 80 GB HDD lol. Wanted to play with an agent but didn't want it on one of my real systems. I've a small stack of ancient laptops, so going to set up Debian on one, run headless, and let the agent do whatever on it.
>>108437944>Responses API will never be fully supportedthank god for thatthe stateless chat completions API was an accidentally good thingif I want state I want to manage it in my program, not have to think about both the remote and local state.It has to be said again, and again, and again, that v1/responses only had one purpose to begin with: let OpenAI reuse the <think> blocks of their models without giving it to you. That's the real reason for that API being stateful. They also ended up implementing a stateless version with encrypted <think> but that's even gayer.
>>108438203Sensible. My plan is the same but I got a stack of Pentium 4+4DDR3 desktops.
>>108438326I agree with you that the Responses API is a net negative and created only for OpenAI's benefit. The issue is that they are pushing developers to use that over the Completions API and eventually there will be more and more clients incompatible with llama-server.
Just like before LLMs were a thing.It was never ok to just make a pull request with untested code.The minimum requirement for any code contribution is that you understand what you're submitting and have the ability to discuss your changes.Before LLMs it was just almost impossible to write code without at least understanding it a little. sure you could copy paste from stackoverflow, but you still needed some basic programing knowledge to plug everything together.But now, any idiot can just prompt claude to write code and submit a PR with zero understanding of programing.The problem is not AI code. it's that now any retard can submit a PR without having internalized the proper engineering mindset and etiquette.
>>108436635>yet 0% of good or useful software is made by themWhat about Kitty and Calibre? Kitty is the best terminal emulator. And unlike the other grifter projects written in Zig and Rust, it's licensed under the GPL.
>>108438535the most loyal goy, goyal. responsible for covid too.
>>108438535kitty and calibre are both dogshit
>>108438535I personally use gitolite in my git server. That is indian software too.
>>108438618>gitoiletthe racist joke is left as an exercise to the reader
for me, it's foot
>>108438535i prefer ghost titty as my terminal emulator, i don't remember exactly but something annoyed me in kitty
>>108438721>Ghostty is a fast, feature-rich, and cross-platform terminal emulator that uses platform-native UI and GPU acceleration.I gave it a try but I uninstalled it when I realized it had no GUI for the settings. What was the point of "platform-native UI"? There was no point in switching to another bare-bones terminal emulator. It was just going to have less features than Kitty.
>>1084387631. it's not made by a brown person2. just ask your llm to make you the config
>>108438721>>108438763https://github.com/alacritty/alacritty
>>108438793>MIT>Rust>made false claims about being the fastestYeah, it was embarrassing.
>>108438805I switch to it because my terminal of choice (termite) told me to use it.
>>108438805>MITIt's Apache 2.0
>>108437505What?
>>108438907sorry you're right
>>108438998I must refuse.assistant
>>108437636> I'm looking at Mistral-Small-4
>>108439044So again, do I just go down the list until I find something that fits (<80GB)?
>>108439050I'm not the anon from earlier, but you should not waste a byte of disk space on Mistral Small 4, it's an abortion victim.I will spoon feed you more and tell you that you can fit Step 3.5 Flash in Q2 and have 10 GB left to spare for the context. That is, of course, if you are tired of Air and want a different model. I don't think there's anything better for that size. Provided, of course, your use case is... well, you know.<end_of_turn>
>>108439086Yeah desu I'm just using it for SillyTavern ERP. I'm not particularly having any trouble with Air, I'm just looking to sample something different because things are getting a bit same-y.
>>108439098I can also endorse stepfun (as a former air user), works with cunny cards no problem. best to switch between air and stepfun just to keep things fresh.
Apple's unified RAM is getting speeds like 180 tokens per second
>>108439123What presets do you suggest using for stepfun?
>>108439086glm 4.5 air is hard to beat speed wise when you use ik_llama. maybe qwen 122b?<|im_end|>
>>108439131imagine using a schizofork, couldnt be me <|killyourself_baiting_retard|>
>>108439131I will not cease my crusade against the new Qwens.Anyone who recommends them is either:- Not using the biggest one- Using them for the vision- Has not seen a good LLM (possibly due to being a vramlet)And you want to FUCK it too? Good luck, you will need an abliterated/decensored/raped version of it that will be dumb and still dry as hell. But maybe some people like their LLM women sub 60 IQ, I won't judge.Just stick with the old ones.
>>108439140works on my machine
>>108439156>Not usingI meant "using", of course. I have once again embarrassed myself with my dyslexic spelling.
>>108438535>Kittybloat python slop>best terminal emulatorlmao>calibrelikewise>it's licensed under the GPLGPL is cancer.
>>108438535even if you somehow got me to agree it's good software (it's not) i could still argue muh 83% of software, yet 0.01% of good software, doesn't realy make it any better for jeets.
>>108439226i only trust software made by germans with unpronounceable last names and uses a design philosophy from 1998
>>108439254Poettering is pretty easy to pronounce, but to think you like his software... How crude, Anon.
>>108437723so indians can join together but you can't hmmm
>>108439345you are comparing a country to a single individual and are too retarded to notice that.you only prove the point that jeets are retarded.but "muh you can't join together" i sure wonder what is mistral.retard.tons of european models that completly mog saaaarvam.
>>108439364proof you are european?
>>108439366what would be a good proof for you?i live in switzerland it's night here.i can post hands but me being white and it being night isn't the best proof there is now is it?
So are vramlets still on Nemo/Mistral Small tunes or are there new models?
>>108439406just go buy a switzerland newspaper and put a timestamp on it. not so hard anon.
>>108439408Qwen 3.5 4B
>>108439415>just go buy a switzerland newspaper and put a timestamp on itYou know there's already a date on newspapers right?
>>108439408Qwen3.5 4b
>>108439366Why would that guy have to prove he's european? What does that have to do with indians producing a single dogshit model that's worse than what people made in other countries? Just shut up and fuck off already
>>108439435agent get me a gf
>>108439415dude, it's 22 here, i live in a tiny town in the mountains, do you realy think i can just buy some newspaper at this hour?best i can do right now is show you a box of eggs that's from coop.
Are you guys running two models in parallel?
ahhh I need a better model to run. haven't used the GLM stuff. will they release a glm5 turbo 27bish dense models? MoE seems ass.
>>108439447>buying 4 eggs at a timeDo the swiss actually do this
>>108439481you are better off just using a big model with a small draft model than two different models.
>>108439481> 9B (Smart!)Sure.>>108439493You could have posted the egg uma.
>>108439447>1000CHF an egg
wtf is this about task?https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive/discussions/14https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive/discussions/10
>>108439504the task
>>108439493i generally go for 6 or 12, but these were bought by my gf because there is no mall in my town and she does some groceries when she goes back from work.she rarely eat more than 2 at a time.also you can just buy one egg if you realy want to lol.i generaly buy meat at the butcher and farmer but she tends to do the small groceries as it's on her way back from work and i work from home.>>108439503>1000CHFthat's why my pc isn't that expensive to me compared to how much i pay for food.the meat i buy is almost 100chf / kgin switzerland everything is expensive but you also make a lot of money.
>>108439504what the hell
Give me back my wife she doesn't believe to you Miku
>>108439504https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive/discussions/16
>>108439519>in switzerland everything is expensive but you also make a lot of money.Of course, but it doesn't matter how expensive it is, only thing is about disposable money after paying everything and quality of life.
>>108439420yes but i want another sticky note that says /lmg/ with a timestamp on top of the newspaper
>>108439481cute, a bit blurry but made me read everythingI'd use 27B for everything though, or maybe 27B + 35Bwhat example of successful task were you able to achieve with this? people always talk about how to use agents locally but never to what end
>>108439447Having a thumb war with this Anon
>>108439560so i generaly spend about 1k on food, 500chf on insurance and another 500 for the rent (gf pays another 500).when you live in the middle of nowhere it's cheaper, ie in geneva you can find appartments at 7000chf / months, it's a bit crazy.anyway, i generaly spend about 3k a month, i can squeeze down to 2.2k if i realy have to.i make about 8k after taxes.so at the end of the month i have about 5k chf left.it's much better than if i were to live in france, where i'd probably have 1k left after all expenses.honestly Switzerland is not a bad place to be in europe only thing that realy sucks is the housing market which is pretty bad.
>>108439615i have pretty big hands, all my fingers are long lol.
>>108439616>>108439560adding to this, PC parts and tech are pm the same wherever you live.so yea it's not like the 1k in france would get you more pc stuff than the 5k in switzerland, especialy if you buy online.i'm almost tempted to buy a llm rig desu, but at the same time i don't realy need it and i rather just invest / save it.
>>1084396161000chf on food? wtf? i never spend more than like 200chf on food monthly in america (for myself)
>>108439481For me it's 4B for Planner, and 0.8B for Worker!
>>108439659well, switzerland is expensive, i mostly eat meat and everything i buy is organic.generaly it's about 1.2k chf for me.my gf is more around 400 to 500 chf, she's pretty small and weighs 42kg though.but yea, labor is expensive / paid well well and thus products are expensive.also i'm not even in the most expensive part of the country, if you were near Geneva or near Zug / Zurich it'd be a LOT worse.6 eggs are about 6chf, 1kg of beef is between 80 to 100chf depending of the cut.Switzerland is just one of the most expensive countries out there.idk how much you pay for gas but here it's about 1.7 to 1.9 CHF/L.the worse is probably electricity where it's not uncommon to pay between 0.3 and 0.6 CHF/kwhwhen i was in France i'd spend about 300 bucks a month on food.
>>108439481i use kimi k2.5 for everything. EVERYTHING.
It has started. AI can now solve math problems that human experts tried to solve but could not.The age of men will soon be over.
>>108439709which quant? does it work well or do you need to recheck?
>>108439709what hardware do you run it on
Anyone here tried both Moonshine v2 and Parakeet v3? Which is better?
>>108439716works well up to the context i can max it out at which is about 77k. IQ3_K quant. >>108439165>>108439719512GB of DDR4 3200MHz and four 3090s
>>108439481some of you seriously use these little small 4B 9B toys for coding? does it even work? I don't think I would be able to trust a one under 70B for such tasks
>>108439710Can't wait to see the numbers be exactly the same in 5 years
>>108439731>IQ3_K quantWhich task do you ask it which it can achieve properly even at this lower quant?
>>108439742i can honestly vibecode pretty large projects if i am splitting it up into smaller multiple files and only providing the required context for what needs to be changed. i got a project that would take up like 500k+ context if i provided all of the files at once
>>108439742Tool assisted ERP
>>108439750grok, email this transcript to all my friends
>>108439731nice setup.how much t/s do you get out of it?
>>108439742also i should mention that kimi 2.5 was trained in FP4, so a Q3 quant is actually more like a Q6 quant for kimi.
>>108439747So it's for dev, ok, thanks anon.
>>108439762Lovesense, if you want to recoed the apwcifics too.
HOLY SHIT IT'S FINALLY HERE
>>1084397709tk/s TG at 0K and about 6tk/s at 64K context. it's slow but it doesn't really matter when im letting it run while i make dinner or clean up my house or something
>>108439814he's just stirring the pot
>>108439821thanks anoncool for you but i don't think those are useable speeds for me.
>>108439814lmao, we are decades away from agi at the very least, if it's even possible to do on silicon which is an unproven assumption.more bs marketing scam.
>>108439859List of legitimate reasons for you to namefag:
>>108439875true
>it's not just, it'swhy do all models love to regurgitate that shit, they use it at like 100 times the normal rate
>>108439875lmao i didn't want to i had some bullshit in the name field and didn't realise it'd be there after i closed the tab and went back, i never used the feature.
>>108439900synth slop amplifying model behaviors and it existing 500b times in the trillion tokens all labs train on and no corpo wants to actually filter their own datasets, despite being able to even train these modelsI do feel you on this though, gemma is generally very balanced in terms of intelligence and grasp of writing but it just goes into not x but y, ellipses spam and endlessly vomiting descriptions about scents. Even when you ask it to critique writing, it will praise it and then if you ask "are you being honest about x part of the critique? that doesn't make sense" it'll bend itself backwards six times to be like YES YOU'RE SO RIGHT, CONSIDER ME THOROUGHLY CHASTISED or some shit
>>108439900the only pro of that is to be able to easily spot people using llm without even caring to edit
>>108439481I like these. At least they are reading the thread and adding things in.
>>108439731t/s ?
>>108439859>if it's even possible to do on silicon which is an unproven assumption.there's nothing special about biological substrates
>>108440046>>108439821
>>108440096there is, ie QM sheenanigans.the human mind may be non computable and there is more than one physicist that think so, ie penrose.
>>108440176penrose is pulling bullshit out of his ass because of a spiritual attachment to biological substrates being special.
>>108440176>the human mind may be non computableI can assert things too. Look at this. Ready? The human mind may be computable. Now what?
>>108439814>>108439835>"OpenClaw is AGI">HOLY SHIT JACKETMAN YOU'RE SO SMART THIS IS MINDBLOWING PLEASE TAKE 2 HUNDRED SEXTILLION DOLLARS OF MY MONEY PLEASE PLEASE PLEASE JUST EAT MY DOLLARS PUT THEM IN YOUR MOUTH NOW NOW NOW YES YES I WANT YOU TAKE EVERYTHING I HAVE
>>108440302>EAT MY DOLLARS PUT THEM IN YOUR MOUTH NOWAll of this, but can he do it slowly while making eye contact with the camera?
>>108440302>>108439835>OpenclawNot sure why everyone sucks it off so much. It's a total mess, the interface is god awful and it's a huge pain in the ass to see what it's even doing. Their documentation is trash, too
>>108440326what good replacement exists?
>>108440302
>>108439694>weighs 42kgIn which elementary school did you find her?
>>108440335Hell if I know, I'm just explaining my experience with openclaw
>>108440366Let's be reasonable, she's probably in middle school.
>>108440366that's how much miku is supposed to weighI know because I was asking llms which weighs more, 20 mikus or 20 tetos
>>108440326As surprising as it is, before that no one thought to make an MCP-enabled background service, accessible from regular chat apps, that any retard could set up, and then shill the ever loving fuck out of it. Not even the geniuses here.
>>108440380miku is weighing 1 ton, with all the internal machinery to make her work
>>108440366>he can't imagine an adult woman weighing 90lbsIt must suck to live in a nation with rampant obesity
>>108440394very ped coded bro ngl
>>108440380Miku troons are more accepted than poorfags on /g/
>>108440398women are ped coded, true men choose men
>>108440196>spiritual attachment to biological substrates being specialwe've not seen inteligence on anything non biological so far.and this is another discussion but physicalism is wrong.>>108440205point is, we don't know if silicon can do it, it's just an assumption.
>>108440366>le american surprised european women aren't obeseas i said, she's 25.>>108440394this lol>>108440398>muh not being obese is ped codedretard, she has nice boobs and a nice butt, proportionaly wide hips too, her whole body screams fertile and she is.
>>108440422t. >>108436379
>>108440413>physicalism is wrong.it doesn't matter. thinking silicon is lacking something important for intelligence is just a religious thing. you're religious. as stupid as ai is today, frankly even it should dispel the notion that the brain is special.
>>108440422You're a deranged mikutroon, I can't trust she exists
why are people who want to ERP with AIs always pick the lamest choices? why pick miku, kurisu or xj-9? i wanna fuck the hot redhead alien chick from megas xlr.
>>108440459Welcome to /lmg/, I love you
>>108440451>you're religiousi'm not.and it's totaly possible that inteligence may require quantum sheenanigans and whatnot, which sure you can do on silicon, but not with the kind of chips we do today.>as stupid as ai is todaywe do not have ai today, it's not stupid, it's not even intelligent, there is literaly 0 intelligence there.
>>108440461Well, for starters, the hot redhead alien chick is an alien, not an AI. This makes her a difficult contender for the place of AIs you wanna fuck.
>>108440461Because AIs don't know niche characters well.Same reason I have to generate comics from recognizable characters
>>108440459i don't care about your trust lol.42kg is not that unusal in switzerland.
>>108440468you have a religious/spiritual notion of intelligence.
>>108440413>point is, we don't know if silicon can do it, it's just an assumption.point is, there's no reason to think that it can't
>>108440469fine i'll fuck the cool robot with the flames then. alien ai is still ai.
>>108440475you don't even know what my notion is about.llm not having any form of intelligence is just a fact, they literaly have no ability to learn autonomously.>>108440478there are plenty, you just don't see them.also to play the devil advocates i'd argue most humans don't have intelligence either, being biological doesn't automaticaly grant you intelligence.
>>108440472This guy is lying and dating a (gay) dwarf btw.t. living in switzerland
>>108440491>there are plenty, you just don't see them.You could start listing them instead. Go on.
>>108439835>the product I'm selling is so amazing, it can run a billion dollar company totally on its own!>uhh but it couldn't run MY company, dear investors please don't fire me!!
>10 minutes between messages
>>108440572He's a biological supremacist
Total clanker death
robots and ai are going to totally surpass us and that's a good thing
>Edit: 10 Mar 2025 20:44 UTCSo that's never getting updated I guess.
>>108435108sample pretty please?
There's no way llama.cpp is having double BOS token issues with mistral small 4 in the year of our lord 2026, right?
>>108440795Alright, doesn't seem to be the case.Thank fuck.I still have no idea what kind of unholy memory corruption is happening on my machine where I >launch llama server>send prompt hello, receive response, send prompt "Can you tell me a one paragraph story?", receive response>close llama-server>change nothing, launch llama-server>regen the last message>后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉书后汉......
>>108440795Show it. It should warn you on the terminal output.
>>108439435>Qwen3.5 4bNot just 4B, 4B Q4. I hate this image so much, please keep posting it.
>>108440867It should recommend to quantize the cache to q4_0 as well.
>>108440876Huh, I thought kv cache type defaulted to the model's quant (ignoring things like dynamic quants), but no, it's f16 if unspecified:>llama_kv_cache: size = 2848.00 MiB ( 45568 cells, 16 layers, 4/1 seqs), K (f16): 1424.00 MiB, V (f16): 1424.00 MiBI'm actually not sure if your recommendation was sarcasm or not, but the kv cache should be the same type as the model, right? Gonna give that a shot.
>>108440928In case you are being sincere, no.Quanting the KV cache has a much more severe effect than quanting the model itself as far as I can tell.
>>108440953I was being sincere, I'm just retarded. I don't conceptually understand why having the cache type be the same as the "model type" would degrade performance/intelligence compared to an f16 cache, but I suspect that's because I just don't understand how anything works, which is fine. At some point I'll learn linear algebra and read the attention paper.
>>108441003 (me)>learn linear algebra and read the attention paperWho am I kidding. Given that the weights are dequantized to f16 (regardless of model quant) to compute the attention scores, the kv cache also being f16 makes a lot more sense.For some reason I thought the CUDA kernels were implemented with hardware support for e.g. operating directly on Q8_0 values rather than needing to convert everything to f16. At some point I'll learn CUDA and actually read through some of the inference code.
>>108440605Patience is a virtueFor productive use of LLMs
It's not coming this week either, is it?
I asked an LLM to make me an app to do voice activation for hotkeys/scripts, using the Moonshine TTS. Although it ended up just being a python script, it actually just works. It's pretty fast, accurate, doesn't take much processing on idle, and I can map anything I want. This is so much better than when I tried looking into doing the same thing with existing software years back when LLMs weren't big. Finally, the dream of a semi voice controlled PC is here. We are so back. And yeah I know I'm late to the party, but hey we all went through this journey to integrating AI into our life right? And for people who still haven't done it, I recommend it. Go do it. This is one of the simplest AI augments you can run before getting into agent shit. It's easy and you'll love it. You might even have it be what gets you started on agents since you're already doing TTS with it.Btw one thing I recommend is to have the function mapping list be formatted so you can map multiple voice lines to a single function. This makes it so you don't have to memorize the exact line. If you want to make your media player play the next song, you can say next song, or next track, and those will both do the same thing.
>>108441044depends on inference lib>Given that the weights are dequantized to f16one cannot simply "dequant" (dequant = adding a bunch of zeros) the point is "attention scores" / KV cache in current paradigm needs decent precision
>>108441155>one cannot simply dequant*calls dequantize_q8_0 on your tensor before passing you into the fattn kernel*
>>108441155>(dequant = adding a bunch of zeros)Isn't that just padding?
>>108434876
How are people getting vLLM running on Strix Halo without using those prebuilt toolboxes?
>>108441286female backs are nice
>>108441210>>108441270Like saving a JPG as BMP the damage is already done the information is lost>just paddingIs the extra 512GB RAM you don't have just padding too?>>108441286based now show some armpit tufts
>>108441515Miku where is my wife what did you do to her
>>108441286I look like this
>>108440398
>>108441529Everything is fineEverything is normal and progressing as intendedDo not be alarmedLabs saying to run inference on the models they trained at temp<1 is perfectly fine
>>108441553Now show the stats for women considering all those female teachers and such who turn out to be pedos
>>108441515>Like saving a JPG as BMP the damage is already done the information is lostThis analogy is apt for the initial quantization of the model, which (in the case of qN) encodes the weights into blocks in a manner similar to JPG's DCT encoding.However, there are 100% lossless operations you can apply to DCT-encoded image data. You don't need to convert it back to BMP to manipulate the image file, producing intermediates in JPG format. It makes sense, in that situation, that you'd cache your intermediates (which are 100% lossless) as JPGs rather than convert them to BMP (since it's the conversion that introduces losses).While I originally though that there were attention kernels that operated on e.g. q8_0 values, that doesn't appear to be the case. There might not even be sound math to perform the necessary arithmetic on q8_0 values to compute attention scores without introducing loss, I'm not a mathematician. If the q8_0 tensors are dequantized to f16 tensors, the intermediates that are going into the cache are also going to be f16s, and it makes sense to have the cache be the same format as the intermediates.I'm sorry for what I did please let Teto out of the basement now she doesn't deserve this.
>>108441564>There might not even be sound mathHave you learned how the hardware works? Honestly ignorance is blissFoolish anons consider quanting the KV cache at runtime. "cache" yeah? it's avoiding already computed attn calcsBack in the basement ho https://www.youtube.com/watch?v=UsjsYMo3O1Q
>>108441636>Honestly ignorance is blissYou should have chosen a domain that doesn't use floating point operands. I only deal with maths that are both associative and commutative and may god have mercy on the rest of you lunatics.
>>108441286Another slappable back like Rin's
>>108439814How do they get from sloppotron to AGI?They probably just benchmaxxed some arbitrary benchmark as always.
>>108441758>>108441758>>108441758
>>108439481>>108439435