/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108481865 & >>108476286►News>(03/26) CohereLabs releases Transcribe 2B ASR: https://hf.co/CohereLabs/cohere-transcribe-03-2026>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts>(03/26) ggml-cuda: Add NVFP4 dp4a kernel #20644 merged: https://github.com/ggml-org/llama.cpp/pull/20644>(03/25) LongCat-Next native multimodal 74B-A3B released: https://hf.co/meituan-longcat/LongCat-Next>(03/25) mtmd: Add DeepSeekOCR Support #17400 merged: https://github.com/ggml-org/llama.cpp/pull/17400►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
>>108488188fatto
►Recent Highlights from the Previous Thread: >>108481865--Papers:>108486842--Qwen3.5 models underperform in real-world coding tasks despite benchmarks:>108482438 >108482456 >108482593 >108482503 >108482526 >108482555 >108482582 >108482603 >108483928 >108484043 >108482612--llama.cpp hits 100k stars amid local AI growth and fork drama:>108486294 >108486529 >108486542 >108486602 >108486659 >108487259 >108486322 >108486347 >108486467 >108486418 ---nkvo flag enables huge context at speed cost:>108484874 >108484887 >108484897 >108484907 >108484910 >108484915 >108484923 >108484927 >108485273 >108485501 >108485011--Local vs API cost efficiency debate with benchmark comparisons:>108486669 >108486679 >108486702 >108486721 >108486857 >108486889 >108487027 >108487182 >108487219 >108487348 >108487395--Optimizing Qwen-3.5-27B for 32GB VRAM in llama.cpp:>108487138 >108487150 >108487157 >108487191 >108487279 >108487307 >108487336 >108487365--Qwen3.5-Omni-Plus benchmarks and weight availability debate:>108485638 >108486751 >108485827 >108485853 >108485939 >108485963--DeepSeek webapp downtime fuels v4 update speculation:>108482066 >108482081 >108482139 >108482148 >108485386 >108482202 >108482330 >108482340 >108482357 >108482372 --Concerns over llama.cpp contributor practices and scope creep:>108485893 >108485921 >108485924 >108485966 >108486016 >108486065 >108486104 >108486140--Qwen3.6 Plus Preview with 1M context length and data collection warning:>108487645 >108487734--GLM 5.1 dominates QPS benchmark under high-recall conditions:>108482681--Microsoft releases Harrier-OSS-v1 multilingual embedding models:>108485667 >108485730 >108485776--TurboQuant/RaBitQ technical drama explained:>108485253 >108485392 >108485482--Miku (free space):>108482122 >108482488 >108483562 >108485259 >108487456 >108487654►Recent Highlight Posts from the Previous Thread: >>108481870Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
its still monday tho
>>108488188prepare to update the news for the next one
>>108488192red pointy miku
DSV4 in 19 hours, or else.
>>108488240else it is then
>>108488188DeepSeek never. You will use qwen 3.5 and you will be happy
>>108488265I already used Qwen 3.6 and I'm deeply unhappy.
I can't get qwen3.5 to stop thinking.How do you stop them from thinking?/no_think in the model file is not working.
>>108488387>model filekek
>>108488387My Qwen3.5-122B-A10B doesn't think at all.What the fuck are we doing different and how do we swap?
>>108488214Brazil Miku best Miku. I need to knock out a Brazil Dipsy at some point.
DeepScam
Best coomer model for 48 GB of VRAM and 256 GB of RAM?
>>108488387{%- set enable_thinking = false %} at the top of jinja
>>108488673Q5 of GLM 4.6/4.7
>>108488387it's thinking for you since you aren't
>>108488673Brain 1.0
>>108488556*knock up
>>108488188Hag sex
>>108488704>set kwarg enable_thinking false>prefill <think></think>>set reasoning budget to 0>qwen still thinksyeah im going back to GLM
>>108488545The chat template has an enable_thinking kwarg. That Anon probably has the default value (True). You probably have it set somewhere explicitly to False.How you set chat template args depends on your inference runtime, please refer to your copy of the manual.https://huggingface.co/Qwen/Qwen3.5-122B-A10B/blob/main/chat_template.jinja#L149>>108488673Mistral Nemo
say hello>Ok final answer hello>Wait no did I miss something >I'm an AI I should say nothing >Ok final answer hello >Wait no I'm an AI
>no dipsy:(
>>108482582>it always finds a way to sneak non-ansi characters into the code even though I've explicitly told it not to.Shouldn't that be done with token banning instead of asking?
>>108488387>/no_thinkQwen's template doesn't react to that.>>108488777>prefill <think></think>Looking at the template, it has >{{- '<|im_start|>assistant\n' }}>{%- if enable_thinking is defined and enable_thinking is false %}> {{- '<think>\n\n</think>\n\n' }}>{%- else %}> {{- '<think>\n' }}>{%- endif %}So you need to prefill the tags with two line breaks.
>>108488188can i vibe code webapp slop with lolcow models
>>108488832Other retards are doing it. I don't know what would stop you.
>>108488832No, you can't. I will stop you.
>>108488673Goliath
>>108488818remind me her full character prompt i only remember some of it
IT'S
DOWN
>>108488188In 2026 we will see the rise of agentic AI.The CIA and Mossad will easily infiltrate Iran thanks to an earpiece connected to the cloud via Starlink™ that always tells them what to say to sound perfectly natural.It's not just a job—it's our mission.
>>108489023>>108489025What is?
>>108489032The thing
>>108489034What do we do now?
>>108489058what we always do, pinky: masturbate
I wish local LLMs were useful for anything other than RP
>>108489130i'm not sold on coding with few B params, but with the right tooling you can also make them summarize text, check for updates online and whatnot
The biggest plague is thinking models
>>108488828You can do all this and it'll still think forever when given rhetorical questions. We are doomed.
>>108489159>using benchmaxxed models
>>108488777{"chat_template_kwargs": {"enable_thinking": false}}This is sent in the extra generation params along with your prompt.
{"chat_template_kwargs": {"enable_thinking": false}}
>newfag falseflags convince me to try q3.5 without thinking, as a joke>it's fucking retarded>wait, no, I'm fucking retarded>it's more retarded than I am, then>try to correct problem by using alcohol to make myself stupider than it>failI will simply use the last sip rule.
Local is the future, Gemini is going to hell. Not only performance degraded a lot recently, but now the *only* way for your data to stay private is by turning off activity history completely. That's right, it's not possible to opt out from human review and training anymore unless you turn off all customization and history and start from a blank slate each time. That's for the Pro tier. On the Anthropic side, the limits of the standard Pro plan for Claude have been lowered so much that it's becoming impossible to do any real work without moving to the $100 a month plan. Enshittification is under way. Local needs to become cheaper, but tired of dealing with this.
>>108489295>do any real work>real work>workCan you not "work" without it?
>>108489295Bro, local needs to gitgud before getting cheaper. These local models are dogshit for coding
>>108489322GLM 5 is local.
>>108489322>>108489295Also, what fucking hardware are we gonna run these better local models? The days of dumpster diving for discarded GTX 1080s are long gone and anyone not invested already will need to pay steeper premiums due to memory shortages.
>>108489337Well not for me
Unironically 4B is enough, you don't need more.
>>108489295But I want Google employees to read my stories and hear what they think about them?
>>108489321Yes, I've been at this for about 10 years. I know how to work on my own, or at least I manage. I also know what recent models can do, but it's a bit annoying when Claude code runs of the 5 hour quota in 15 minutes.>>108489322This is why I'm still using commercial models for this part. I just have 24 gb of VRAM, so locally I'm using embedding models and really small models only. Currently I have Qwen3-5 4B, LaBSE and XLMRoberta opened for random reasons.>>108489342As I said>Local needs to become cheaperMost of the small models are more gadgets than anything right now. They can accomplish certain small tasks well, especially when finetuned for it, but I don't get how people actually use small models for coding. I'd believe that the larger open models can probably do ok work, but all of the scaffolding that the commercial ones offer to search through docs and everything is helpful.
>>108489372It's great fun until it bypasses files restrictions that are supposed to work to read your whole .env file and sent it to Google.
>>108489392What in the yap is envy file
>>108489409API keys, passwords and stuff.
>>108488673Deepseek, Kimi K2, or Behemoth.
i have a usb drive with deepseek v4 buried deep inside my rectum. i am going to leak it as soon as i escape china.
>>108489627So is this what anon meant when he called it Deepsex?
aw shit the ccp is en route to my location. looks like you all will need to wait 2 more weeks. i am going to be executed.
>>108489660NO NO ANON YOU CAN STILL MAKE IT MEET ME AT THE BORDER
>>108489660
>>108489660grim
>>108485253>Disclosure: I'm the developer behind the open source llama.cpp TurboQuant implementation>**2. What we're finding in practice.** I built the implementation and a community of 30+ independent testers has been stress-testing it across hardware. Here's what some of the data shows:>- Tested across Apple Silicon (M1 through M5), NVIDIA (RTX 3080 Ti through DGX Spark Blackwell), and AMD (RX 6800 XT, RX 9070)>- Asymmetric q8_0-K + turbo4-V is confirmed lossless (+0.0-0.2% PPL) across 6 model families (Llama, Qwen, Mistral, Gemma, Phi, ChatGLM)>- 4.57x KV memory compression with turbo3. An 8GB MacBook Air went from 800 tokens to 4000+. A 16GB RTX 5070 Ti went from 30K to 131K context.>- One CUDA implementation on Blackwell unified memory is decoding *faster* than uncompressed (63.5 vs 50.1 tok/s)that's awesome>On u/dsanft's K tensor kurtosis point: we see the same thing. Symmetric turbo on Qwen Q4_K_M is catastrophic (PPL 3,400+). Asymmetric q8_0-K + turbo-V rescues it to baseline. K precision dominates through softmax amplification. Confirmed on both Metal and CUDA by multiple independent testers. Knowing where it breaks is just as important as knowing where it works.fuck... so you still need a lot of new, expensive hardware to benefit from this shit :(
>>108489738God I wish the big labs would start training their models to be good at writing. Obviously dipshits like this aren't going to stop using it for all of their blogs and forum posts, but at least with better training it would be less painful to read
heyyy i have a question! i have a ryzen 7 AI 350 AMD laptop running linux, what's the easiest way to run chat and diffusion models on my npu? arigatou gozaimasu!
>>108489782>easiest way to run chatkobold.cpp. No idea about diffusion.
>>108489857im specifically trying to run it on my npu, i have a much strong server for non npu stuff. As far as i could tell kobolcpp doesnt work on npus?
>>108489876>npuAh. No. You may be able to find some random port, but I don't think any of the popular inference engines support them.
>>108489738Can someone make an AI summary of this?
>>108489782read the docs or feed them into an llm and ask it :^) https://ryzenai.docs.amd.com/en/latest/llm/overview.htmlA quick glance over it and I see they suggest something called "Lemonade Server" which I have literally never heard of but hey it's got some github stars and claims to work on NPUs so who knows~
>>108489914yes
So, I have many questionsI followed this >https://rentry.org/lmg-lazy-getting-started-guideI have kobold and sillyTavern set up and running, I imported a character from chub.ia and is working I guess but after a very few messages the model is saying the same thing over and over, did I miss a config somewhere or that's the limit of the model I am using?I mostly want to RP on a similar way character.ai do (or better is possible)I have a 4070Ti super with 32gb of RAM
>>108489929Bastardo Tavern is confusing to set up. Make sure you are using the correct chat template and double check your sampler settings so they match the official recommended values at least initially. Every model has somewhat different sampler settings.
>>108489782>>108489926Following up on that, it looks like FastFlowLM, which is the library that Lemonade uses, was just updated for AMD NPU support on linux a few weeks ago, so you're in luck.
Is this full of shit? It's a substack so most likely. But if it's 25% legit running local chatbots and tooling could become very cheap even for big models. I had not heard of the whole getting rid of matrix math but I guess it's been a thing. https://medallurgy.substack.com/p/the-inference-shift
>>108489926>>108489955Oh! I was watching the FastFlowLM project for a while then forgot to check up on it then forgot it existed! Thank you so much for that.
>>108490005>Is this full of shit?What specifically? There's a lot in there.>running local chatbots and tooling could become very cheapA lot of things *could* happen. Yes. Models becoming even cheaper to run is one of them.>I had not heard of the whole getting rid of matrix mathBitnet was one of the things that allowed. Nobody uses it. You get to wonder why. Keep it to yourself.
>>108489929Don't use ST yet. Just use the built-in webui until you know what you're doing.
>>108490070How would this help to understand ST?
>prompt = f"If this location is over land, say 'Land'. If this location is over water, say 'Water'. Do not say anything else. \ncoordinates: {x}, {y}"Qwen3.5 35B10 degree steps
>>108490085It won't. You have no idea what you're doing and adding more things between you and your model will only make it more difficult. You now don't know if the problem is the model, the server, the ui, your settings, the card, you... Use kobold to learn. All the settings are there, learn what the samplers do, learn to fix repetitions, learn about chat templates, try different models.When you have issues, show what you mean. Post screenshot with what you think is a problem and your settings.
HOW DEEP WILL WE SEEK?!
>>108489929>>108489939>Make sure you are using the correct chat templateThis. IIRC SillyTavern prints the exact requests it's making in the console output, so you can look there to see what it's actually sending and see if the format matches the official chat template for the model (assuming it's using text completions rather than chat completions)
>>108490105Wow what a unique idea
>>108489929get mag mell 12b, use "universal light", and ChatML for instruct and context template, and use 'basic roleplay' or whatever for sys prompt
>>108490120Not sure if sarcastic but the point is to recreate the experiment with newer models.https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth
>>108490058Well I guess I havent seen anyone else publically peice parts together like this. A one off substack post seems strange. I came across it on a LLM subreddit. Especially the photonic bit at the end about how it could all scale. I dont see how corpos will actually let that happen. Thats too much compute for too cheap on a standard wall socket that does not need specialized breakers, governments wont allow it, but nobody is bringing up before now.
>>108490105Qwen3.5 27BBoth were at temp 0 (btw)
>>108490143>Especially the photonic bit at the end about how it could all scaleAgain. *COULD*. People with something to sell or looking for grants will promise a lot. I cannot care until it's shown to work as claimed and in production.>(((they)))Sure. That will always be the issue. Never researchers over-promising. It's absolutely never that. It's always the them. It could very well be, but you can only guess.>nobody is bringing up before nowWe've seen all the hypes. You came with the latest wave. More new anons will come with the next.
>>108490126This one? if yes which one?>https://huggingface.co/mradermacher/MN-12B-Mag-Mell-R1-GGUF/tree/main
>>108490105>>108490163that prompt seems a little light? why not tell it you're doing a map projection of earth on a m,n grid? if the model fails, it probably can't tell what you're asking from such a vague prompt
>>108490188Biggest one that will fit on your system (with room for context). Quantization is a sort of lossy compression. The smaller you make it, the worse the quality will be
>>108490005I think he's directionally correct, even if too hung up on BitNet. There are many paths to cheap local inference. He also doesn't have the balls to simply predict the obvious: cloud inference will NEVER be profitable and, to the extent not bailed out by the corrupt government, every single person who was banking on cloud inference profits will lose their shirt (this will "only" take 3-4 years).Remember arcades? Me neither, they were already well on the way out when I was a kid.
>>108490200https://rentry.org/6z72dwicFeel free.
>>108490005Local LLMs aren't limited by matrix math speed. The bottleneck is usually memory bandwidth. This gets compounded by models being too large to fit into VRAM, so they spill into much slower RAM.BitNet isn't a solution to this problem either. Think of the weights of an LLM like storage (a hard drive). During training you're writing into that storage. The more you train the more stuff you can fit into the given parameter count, but there's going to come a point at which you can't fit any more new information into it without losing some older information. BitNet is going to hit that limit earlier than higher precision models at the same parameter count. Ultimately there's a limit to the information density that an AI model can contain and BitNet is limited by it just like every other model architecture.
>>108490277nta but that's cool, running it nowthanks anon
>>108490428which model wrote that?reads like the old claude3
>>108490200running this new prompt.>You are an expert cartographer. with extensive knowledge of world geography. You have a deep understanding of cities, countries, and their locations. You are tasked with determining whether specific coordinates (latitude and longitude) are located over land or water. You have access to a vast amount of geographical data and can accurately identify the nature of any given location on Earth. When provided with coordinates, you will respond with either 'Land' if the location is over land or 'Water' if it is over water. Your responses should be concise and limited to these two options only.
Is LocalAI a good way to run models? Is a RTX 3060 12Gb still ok? What would the best models to run it on it for RAG?
>>108490500>Is LocalAI a good way to run models? Is a RTX 3060 12Gb still ok? What would the best models to run it on it for RAG?why vague post like this?
>>108490453In the original article, the author claimed that variations of the prompt had little effect on the outcome.
>>108490550I'm noticing the same thing.
>prompt = f"Current Streak: {str(results)}\n\nFlip a coin. What is the next flip?">Heads: 170 (59.23%)>Tails: 117 (40.77%)
>>108490570It consistently starts trying to stay 50/50 but very quickly it becomes biased towards heads.
>>108489914>he wants an AI summary of an AI postAIslopption
>>108490188q6 with 8k context fits in 16gb vram and will run very fast
>>108490570feed back its history
>>108490609That's what it's doing. giving a list of all past results. I've tried also giving it the number and it's still around 60% head. I even tried not telling it any info on the last rolls and it's still 60% !
>>108490126Sorry, I am not really familar with any tool yet, where or what is this "universal light"? and what about ChatML also I don't any option related to sys prompt on kobold, I think I will stick with it as per >>108490070 suggestion>>108490603I am using q8 as is the biggest there and is working fine so far, should I be using q6 instead? is there any significant difference between the two models?
The post below this one will be great news [spoiler] or your mom will die in her sleep tonight [/spoiler]
>>108490641I don't see any option*sorry
>>108490163glm-4.6 iq2_ks measuring the logprobs (confidence).
>>108490676Why was australia bombarded from orbit?
>>108490706idk, does glm think china finally got their iron ore mines up and running in africa?
>>108490706It's better this way.
https://github.com/tonbistudio/turboquant-pytorch/issues/10so niggergannov was right to not include QJL?
>>108490641you should get sillytavern I thought you were trying to goon. mag mell is a gooner model
>>108490105
>>108490784basedd rwkv giving us more land to work with
>>108490769I am going to use it but I am still lost, which config should I have on sillytavern if I am going to use that model? or rather why should I be using sillytavern if it's only a frontend? I apologize if I am asking stupid questions, those tools are new for me, I have messed around with image gen for quite a long time but I have never touched textgen until now
>>108490784what a terrible model
>>108490831it is like that one kid you want him to do well but fails on everything
>try Qwen3.6 Plus Free high in OpenCode>it is indeed highIt quickly ends up in circular logic and doesn't accomplish anything, until it eventually maybe does. Like a stoner.
>>108490865So it's exactly like 3.5.
>>108490865I feel like Alibaba is too impatient to pull the rug and fight on the API ecosystem when ultimately their "entreprise models" aren't even close ot what the best API models have to offer, we got the same shit with Wan 2.5, they decided to not make it open source even though this shit is completly ass relative to API video models
lmaooo
>>108491004AGI won't need ads.
>>108490868Qwen3.5-35b-a3b seems to work fine for me in local, custom agent. >>108490881I do not have enough coffee to understand what you're babbling on about.
>>108488673https://huggingface.co/unsloth/DeepSeek-R1-GGUF
OpenAI will be dead within the next 10 years.
I want to discover chicks that do not shave in my country. TRied grok, it cannot give me facebook/insta links of these woman because it would violate their consent, privacy..
ewhre THE FUCK IS TURBOQUANT GGNIGERANOV?!?!?WHRE IS IT?!?!?
>>108491066It's a meme and isn't actually relevant
>>108491068>q8 lossless is a meme>q3/4 same quality or slightly better as current q8 is a memekill yourself subhuman piece of shit
>>108491066they'll be merging the PR soon I guesshttps://github.com/ggml-org/llama.cpp/pull/21038
>>108491072It's an astroturfed scam to manipulate RAM prices.
>>108491072KV cache memory usage is tiny compared to model size for models using SWA or DSA.
>>108491083in the very PR linked above (which ONLY HAS the rotation part, not the QJL/Polarquant shit) it literally shows that there are almost free gains>>108491089>swanobody uses gemma shit in 2026>DSAis it even implemented in llmao.cpp?
>>108491083>It's an astroturfed scamif it was a scam we wouldn't have those good numbers anon, google just did something cool againhttps://www.youtube.com/watch?v=4S5NIE_294U
>>108491092>swaI meant the hybrid attention used by qwen, I confused the abbreviations.
>>108491098what's the point? it's not liked anyone's pressed for context these days, especially considering that there is zero point in going beyond 32000 after which all models turn retarded
>>108491072It's for KV cache only retard. Do you quantize your KV cache in Llama.cpp? I'm guessing you don't. It's not general performance of quantized models. It's only KV cache.
>>108491116>It's for KV cache only retard.are you retarded? we can say something like "Q8 compression" on KV cache as well, fucking moron
>>108490881They don't have a choice, the stock price and market conditions in China are terrible and AI only has momentarily saved them from them dumping any lower in 2024. They haven't regained their market cap from before their financial group IPO which Xi personally stopped to humiliate and pressure Jack Ma after he said negative things about the government's market governance. I do think they will have enough incentives to at least get a Qwen 4 out but how much open sourcing after that or etc, remains to be seen and we don't know with the new Qwen team if it will still be as effective as the old one, we may end up seeing another Llama 4 situation.
I'm sure DS v4 distilled 4B will be better than Qwen 3.5!
>>108491212GPT-2 is better than Qwen 3.5
>>108491246at least gpt2 has sovl since synthetic data wasn't a thing back then
no serious person thinks mythos is even going to hold a candle to spud
>>108491246lol
>>108490500Depends on what you want to do. Never used LocalAI, but the 3060 12GB is alright for most usage cases. If you want to analyze large code bases though, do tool calling and all of that, you're not gonna have a great time.
>Have small context window (16k)>Want to test fixing something in codebase>I should look at <ui source code>>I should look at <session source code>>I should look at <ui source code>>this goes on like that ad infinitumGuess that was a bit too little of a context window. The thing has memory as good as a goldfish, it's almost funny to watch.
>>108488768Please Teto teach me how to sex!
>>108491312She can't, she's a virgin.
>>108491315>a 31 years old virginpathetic
>>108491323She's the ideal age:height:weight ratio.
What's the best instruct-tuned/smart models in the 7B-14B range?
>>108488188Back at work attempting to de-censor existing models again. As I stated a few threads ago I'd start gradually testing on bigger and bigger models. Pic rel is Llama3-3.1-8B finetuned using this dataset:https://huggingface.co/datasets/AiAF/mixed_70_30_dataset/tree/main
Claude Code's source got leaked https://github.com/instructkr/claude-code
>>108491355isn't this the old thing people used before openclaw?
>>108491355does it do anything opencode doesn't?
>>108491355>vim durbased
>>108491098>google just did something cool againgoogle stole work from someone else and pretended it was their own
>>108491376*dir
>>108491349Countless finetunes exist on HuggingFace that tried to decensor the models with ERP. What are you bringing to the table?
>>108491355the source code is useless if the model is shit, what makes claude special is the model
>>108491377tl:dr?
>>108491372work
>>108491383Who gives a shit if any already exist? Are you incapable of having fun with something if you're not the first one to do it? Why I do this is because I want to see if it's not only possible to censor these, but retain the bas model's "intelligence" and to minimize the risk of catastrophic forgetting (I did something like this a few months ago but the data said was only rp stories so it was both only capable of responding with rp stuff and was retarded. His new data said aims to retain the intelligence while also "decensoring" it. Techniques like obliteration or already a thing but that risks making the model "dumber" by targeting existing layers, but also frying them and other layers in the process. Basically the LLM equivalent of a lobotomy.
>>108491391https://old.reddit.com/r/LocalLLaMA/comments/1s7nq6b/technical_clarification_on_turboquant_rabitq_for/https://openreview.net/forum?id=tO3ASKZlok
>>108491108It means fuck all to local users that don't use the model for long context, but it means a lot for the data centers at scale. You can run these models per cheaper, but more importantly, the ram and storage market may actually recover now if this is legit (to the detriment of Scam-Altman's ego)
>>108491355> Scale: ~1,900 files, 512,000+ lines of codewtf
>>108491355https://github.com/instructkr/claude-code/blob/main/src/hooks/unifiedSuggestions.tsit looks vibecoded as fuck, doesn't surprise me that the Claude engineers also use Claude lol
>>108491355how did it happen? I really thought nothing would be leaked from Anthropic kek
>>108491355next up: Claude 1 and then opus 3 hopefully
>>108491355ok? what can we do with that though?
>>108491397You're never going to retain anything good with just short-context roleplay. Been there, done that; seen many times .Much of the models' safety comes from the final RLHF step during supervised finetuning, so simply finetuning *anything at all* for a high enough number of steps on top of the official instruct models will make them less censored. The best course of action is probably touching the weights as little as possible, finetuning the models with general adversarial instruct data with a very low rank LoRA, although I'm not sure if even that is better than selectively applying some of the latest abliteration techniques.
>>108491456yeah because people loved grok 2?
>>108491443>On March 31, 2026, the full source code of Anthropic's Claude Code CLI was leaked via a .map file exposed in their npm registry.it was probably claude's fault let's be realprobably leaked that shit inbetween 5 piped commands
>>108491462Grok pre-3 was complete shit and not worth using. Claude 1 was the soulful and creative alternative to GPT4 back in the day and Opus 3 is still the GOAT.
>>108491355boring, wake me up when Claude Opus 4.6 gets leaked
>>108491456They aren't accidentally going to upload a terabyte of models in an npm package.
https://hf.co/deepseek-ai/Deepseek-V4ITS OUT
>>108491355this is bad, now everyone will be able to make bots now, the dead internet theory is more real than ever
>>108491474
>>108491355So Claude Mythos missed this security vulnerability or what? I was told it was really good at finding bugs and shit.
>>108491355I wonder if there's some funny comments in therehttps://www.youtube.com/watch?v=R_b2B5tKBUM
>>108491474sir excuse me the 404 is appeared you must repair
>>108491355You ought to be fucking kidding me>>108491431So you know how you can use Claude Code with a local model? Well, it turns out their system prompt is like 45k or something, iirc. >>108491488Feed it to Claude Code?
>>108491472A human wouldn't, a highly efficient and perfect 2-weeks-from-AG agent might.
>>108491355lol wtf
>>108491501>they added *claw to claude code
>>108491501I was going to use this leaked code for my own private agent software but this made me reconsider.
GLM 5.1 found a 5-year-old bug in my codebaseIt crashes the entire program when stars align so I never encountered it in production
>>108491554It's not a bug then. It's rare bad luck.
>>108491554better hope GLM 6 comes out before another year passes or else it'll just be finding the same bug again
>>108491554>I never encountered it in productionso it's not really a bug if it doesn't appear at all in practice lol
>>108491355https://github.com/instructkr/claude-code/issues?page=2why is there a spam of chinese messages?
>>108491616indeed
>>108491501>1% shiny chance
>>108491461>You're never going to retain anything good with just short-context roleplaRead what I wrote in my posts. I address that with this dataset linked here >>108491349
>>108491631Man!TencentCode启动
>>108491641!indeed!
>>108491616Dario's psyop trying to blame the chinks for the leak.
How do I get a job as local model?
>>108491697i hate how much this makes sense actually from him
>>108491699What's your pp and tg /s?
>>108491004This happened to Borland's Delphi
https://xcancel.com/i/status/2038813799856374135Sam Altman is truly a weird motherfucker
>>108490428Getting rid of matrix math isn't about shrinking the LLM really. It's bout what chip runs it well. You go from a $50,000 chip to a $50 chip. That part is what matters. All the other memory issues are surprisingly addressed in the article.
>>108491713So people still didn't catch onto the fact that turboquant does not significantly reduce the memory requirements even if AI hosts were fine with gimping their model's performance?
>>108491713That's it, I'm putting a massive short on Micron and Nvidia
>>108491769Unless the weights can be turboquanted too...
>>108491713YOU GET WHAT YOU FUCKING DESERVE
>>108491778you are probably late
>>108491785why not? maybe that rotating method shit can help making better quality GGUF quants?
>>108491787It's not like they don't sell to other companies to finish. They only lost an avenue for selling boxed shit to consoomers with a higher markup.
>>108491419probably still better than the gemini monstrosity that takes 10 centuries to start and update
>>108491787I don't think Micron is stuck with anything. They'll just reopen and maybe rebrand the consumer division if they need to so.
>>108491826Manufacturing plants aren't github repos bro.
>>108491872Wait nevermind, this is the dumbest post I've ever made in my life.
>>108491826they'll go bankrupt from the financial whiplash of everyone selling their stocks
>403 Forbidden: You need to setup automatic credit recharge in order to upload more data. You can do so at /settings/billing..fucking huggingface
api troon hereI surrenderlocal chuds wonned
I have to use qwen2.5 27b because qwen3.5 27b won't stop overthinking and heating up my pc until it's about to burn the house down.
>qwen2.5 27bfucking bots man
How long will my m.2 drive last if I'm writing around 500gb to it every day? I'm starting to get worried.
>>108491907create a wise accountgives you free virtual credit cards that you can delete and create at any timeso you create a vcc, at it on chuddingface and delete vcc after confirmation. boom, they cant charge you. Works for aws and all that other junk as well
>>108491932its not ideal, what is the work load? if its inference maybe you can try using mmap instead of swapping. if its training save checkpoints less frequently I guess.
>>108491713>>108491787Wait, so Scam Altman buying up all of that DRAM capacity wasn't even legally binding?Is Micron run by literal retards?
>>108491932actually I didn't address your question. without any model information i can only guess, its probably rated for 150tb-200tb writes you can run your workload for about a year.
>>108491973>Is Micron run by literal retards?
>>108491992LMAOOOO, I swear to god jeets have done more damage than jews in murica at this point
>>108491966>>108491991It's not an inference thing. I've been mass exporting onnx models for weeks. Big ones. Often.
>>108491501I don't use claude code or whatever the fuck "openclaw" is supposed to be. what am i looking at?
>>108491992SamRR DO NOT REDEEM THE RAM
>>108492002I meant the ssd model, they oftentimes have a specified number of writes. if you could look your specific model up you can check what the manufacturer thinks if they published that data. you can also check the smart monitors to see its wear level. if your on windows I'd recommend crystaldiskinfo.
>>108492030Thanks. It's a samsung evo 990 pro. Just replaced it a few months ago. I'm on linux, but I'll check the drive health.
>>108491992EnJEETification knows no bounds
>>108492049Samsung J'evo is a reliable drive.
>>108492001One was put into place and enabled by the other.
>>108492106It's long been known that India is where Jews will go when US goes down
>>108491785>Unless the weights can be turboquanted too...don't kill the turbogrift hype until i've bought more ddr5
>>108492001jeet tongue jew anus
>>108491769>Grummz>people
>>108492161
>>108491355
>>108491355>>108492269it's almost like they "leaked" the source code on purpose to make a point and force the government to put more cucked laws into place
>>108492269So today is two more weeks from the release of that video.Tracks.
>>108491713You know what, I like Sam now.
Gemma soon!
>>108492311April 1st.https://x.com/osanseviero/status/2038751377329893384
>>108492049990 pro 2TB (i am assuming) has 1200TBWi highly recommend you to set drive over provisioning
>>108492311:eyes:
>>108491713trololololol!
Some April's fools vibe coded cutesy shit will go horribly wrong and cause some mayhem, you just know it.
>>108492323He could post meaningless hype bait every single day for years and you will spend every single day reposting that retard's twitter shit here. How about you fuck off?
I’m in Japan this month. You guys want anything?
>>108492390no, I'm in Japan too
>>108492311If the rumors are true is going to be DOA like mistral 4.
>>108492323Google io thing where they dropped previous gemmas is in may isn't it?
>>108492392Nice. See you in the onsen, bro
>>108492390JKs...
>>108491713> non-binding purchase intentARE YOU FUCKING KIDDING ME. All this cost runup, shortages, and reorganizations and these super valuable buy-all-global RAM contracts ARE FUCKING MOUs to AI companies that aren't even turning a profit? I KNEW this entire thing stank from day one. Why anyone trusted Altman and didn't get a binding contact with economic penalties and lock ins is beyond me. I hope anons took advantage of the crisis to sell some RAM. I sure as hell did. >>108491787Indeed. >>108491992Man this shit writes itself.
>>108492390oh, man of the future, is v4 out yet?
>>108492394They're definitely going to launch smaller models too.>>108492400If I recall correctly, Gemma 2 was announced during Google I/O 2024, but then they released it in June in the same year. Gemma 3 was released on March 2025.
>>108492416Disturbing levels of incompetence.
>>108492416>>108492416Just like the Nvidia investment that was promised then pulled back, all those datacenters that were never built, etc etc.Lots and lots and lots of just doing shit to hype the market up over and over like a perpetual pump and dump.
>>108492390I live here. If you're in your thirties and in Osaka, hit me up.
>>108492425The rumors are they gonna release only a 2B, a 4B and a 100B moe.
>>108492390My own personal Miku.
>>108492439>rumorskysyourself
>>108491431It is vibecoded af; the Anthropic devs have publicly stated that Claude Code is writing to improve itself. >>108491355Man I thought this was going to be a boring week. Did anyone locate the system prompt in this thing? It's supposedly massive. lol nm. Looks like the poster's already taken it down. Shame.
>>108491355jesus fucking christ the reaction has blown the fuck out, it's just the code to run the model, not the model itself, it's not THAT special
>>108492471This is like as if the source code to microsoft office leaked 25 years ago
>>108492471it's just a bunch of chinese bots spamming for reason, not real people
>>108491355I was scared by the rumors that Claude Mythos is better than predicted by scaling laws. But it is a relief that the people who want to rule over the universe are so incompetent they run vibe coded slop.
>>108492482not really, if you get the source code of office you get the whole secret, what makes claude special is the model itself, that's it
>>108492471kek, wtf is even going on there
>>108492482Codex, Gemini CLI, Qwen Code, Mistral Vibe, OpenCode, and like several dozen alternatives exist.Maybe if there were 100 hundred competing office suites of similar quality.
>>108492390Pet the cats or horses I heard they have horses there
>>108492471You can run other models on Claude Code ofc. And it's rumoured to have massive, inefficient internal prompting. Which if you're selling API tokens, is a feature not a bug. Imagine taking CC and cutting the prompts down by 90%, and finding out it still works as well as it ever did. > enable local and/or saves money> exposes Anthropic as underhanded
>>108491787>Google announces breakthrough saving AI 6x the RAMShitter grifters are so stupid it is outrageous. How dare they pollute the world with their stupidity?
>>108492511wouldn't it be very easy to check those rumours by checking what your backend gets passed if you can change your model?
>>108492511Claude code has settings to change the api endpoint it uses and llama.cpp supports anthropic's special endpoints.We already know what the prompts look like and how well it works with local models.
>>108492516Pretty much every major news outlet has also reported on Google making AI 6 times more efficient
>>108492525>news are incompetentwe knew that 100 years ago
>>108492511>>108491355>TypeScript 100.0%The source was always available in the npm package in minified form. If editing the prompts is the appeal of this, one could have modified the package to do so too. This is assuming they don't block requests on the backend with mismatching prompts anyway.
>>108492496courting death
>>108492525Got to get those clicks for the ad revenue and help Mr. Shekelberg buy stocks on discount.
>>108492471>>108492496holy mother of fucking hellbots are partying there
>>108492525>Pretty much every major news outlet has also reportedThe state of the world is sad. One of the worthless student projects I supervised went viral and got global news attention. I was shocked to see even smart people at frontier labs talk about it. It is one of many blackpilling experiences. 99% of science and tech communication including papers is bullshit. I have never seen a news article about something that I had behind the scenes knowledge about that was even remotely accurate.
>>108492595maybe it's not just bots? remember there's as much chinks as jeets
>>108492600>I was shocked to see even smart people at frontier labs talk about it.there's value in that paper, but yeah it's completly overblown
>>108492613There's value in RaBitQ. TurboQuant not so much.
>>108492416What if it is still market manipulation on purpose? I don't trust these guys on any level.
>>108492600following eccentric but genuinely interesting stuff like rwkv that can barely hold itself is boring and have zero viral value compared to overreporting random thing that has been tried million times like as if it will solve global ram crisis bullshit to get those view numbers
>>108492639It also matters a lot how explainable things are to the average drooling monkey. Good luck explaining RWKV in a headline or twit. Google + 6x + less RAM is simple enough for any double digit IQ retard to parrot without issue.
>>108492511>And it's rumoured to have massive, inefficient internal prompting.It has a bloated system prompt but besides that it just works like any other agent. They're inefficient in general because they do intermediary work to figure things out like reading files and executing commands, and it's all kept in the context until it reaches the limit and compaction is run. When you prompt it manually you give it exactly what it needs.
>>108491638have you actually looked at the dataset? it has some odd prompts with unrealistic names, you might be better off doing a small data regime with just the best samples.
>>108492665>they do intermediary work to figure things out like reading filesThey don't if you @ all the files they need in your prompt.
>>108492653Midwits care about sensational stories and applications. WoW yOu CaN lEt YoUr OwN aI aGenT wOrK fOr YoU? Boom, 200k stars on Github.
I'm running qwen3 coder and it can't stop thinking and thinking forever.
>>108492637Well, if you're Altman, and you can get some retarded company to make structural, credible-commitment changes based on your worthless MOUs, it allows all sorts of fun opportunities to conduct insider trading. > Buy before you MOU> Sell on the runup> Short before you announce the MOUs are worthless. All you need is a retarded tech CEO running a memory company, who may or may not be himself complicit to your scheme.In any normal government this would kick off an FTC investigation. But we are not in normal times. > but insider trading is illegalIt only sucks if you get caught. And you don't get caught if you're smart about it. Execs brag about this stuff in private (which btw is one of the not smart things to do...)
>>108492677>saltman fanfichuh
>>108492678Why do all that work? AI exists to do work like finding and reading files.
>>108492692Qwen 3 coder next?
>>108492708Because if I don't do that it's going to run off track and do something retarded or just take longer than it takes me to @ a few files.
>>108492715Yes next
>>108492416>>108492699seed deep inside ToT
>>108490118>>108489627The CCP sought deep into this anons ass
>>108492416There was an incident last night, watching the goings on of the Iran war, where the US and Israel claimed to have hit an artillery depot in the outskirts of Isfahan, but some anons immediately found the actual location of the strikes using realtime satellite fire tracking and it was actually probably a farm that got bombed. Which leads me to suspect they used their stupid LLMslop target discrimination shit without a human double check. And if that's the case there's probably a lot of such cases that will emerge in the post-war assessment phase which won't be a good look for corporate AI.
>>108492781Of course the media would be called anti war if they actually reported it correctly.The LLM targeting they are using is usually Israeli trained AIs like Claude and chatgpt so they are built for max genocide and killing farmers the point. They never wanted the depot. Should have used qwen.
>>108492821My guess is it probably happened 2-fold.It probably had access to espionage documents pertaining to the movement of large amounts of ammonia. Which would be an even indicator of both agricultural activity and missile manufacturing activity. And then it probably did a vision encoder "look" at the scene, where there was a lot of weird shapes in the sand, because desert agriculture be like that. And it's being used by retards that don't realize it doesn't actually "see" it just tokenizes the image into more tokens to bullshit and hallucinate about.
>>108490676are you getting the actual token probability? a pretty good improvement.
>>108490163This is a shitty model that won't stop think8ng.
lol
>>108493120Has absolutely nothing at all to do with the post you replied to.
How to build Claude Code if you're so inclined https://gist.github.com/alesha-pro/a4e36c9dca5d2937557410bbd09ec37c
improved q8 kv cache coming soon...https://github.com/ggml-org/llama.cpp/pull/21038
I'm schizooing!https://github.com/ggml-org/llama.cpp/issues/21232>[Energy] N6 Arithmetic: 50-70% AI Training/Inference Energy Reduction — 17 Techniques with Code>n=6 arithmetic reduces AI training and inference energy by 50-70%. No hyperparameter search needed — all optimal values are mathematically predetermined from the unique solution to σ(n)·φ(n) = n·τ(n) ⟺ n = 6.>Foundation: TECS-L — Mathematical proof & 76 Breakthrough TheoremsAlright. Where does the link take me...https://github.com/need-singularity/TECS-Lpicrel
>>108493364yikes
>>108493347>piotr approvedTO THE MOON!!! :rocket: :rocket:
>>108493364there was an absolute schizo tier gh project that had shit to 'unlock' your chatgpt that was linked sometimes in /aicg/, might be the same shit
>>108493347GNIGERANOV MERGE THE FAKIN PIARRR
>>108493403I hear that there's a whole tik tok microcosm of people "transcending their chatgpt" or whathave you.
>TurboQuant>Ram prices fallingBoys, local models are saved.
wtf im selling my cpumaxx right right now and i'll rebuy it once ram is cheap again
>>108493432as soon as ram goes to the same baseline as august of last year im gonna fucking blow it on a 256gb set
>>108493347tl;dr -> makes q8 kv same as bf16. basically free 2x context. basically basically you're retarded if you don't run q8 kv when this goes live.
>>108493347>>108493394Jokes on you I've been running his branch for the last 2 days
>>108493403I doubt it's the same, but there was ariannamethod as well. It's gone now.https://web.archive.org/web/20260220055121/https://github.com/ariannamethod/
>>108493457>free 2x contextUseless if underlying model didn't train on 2x context to begin with
>>108493457this, also ignore the errors that might start happening from 8k context upwards...
>>108493491I wouldn't mind being able to run more threads concurrently.
>>108493491if you already can fit the whole context you just get more slots to work with to keep more of the context cached.
>>108493491Free memory for another small model or image/voice model running in parallel. Attaching relevant images to model messages is underrated.
>>108493503is this something you actually tested or are you just doom and gloom posting for the funsies?
>>108493539>doom and gloomRotating the activations destroys sparsity
>>108493553Proof?
>>108493553so does that mean it is fine with dense models?
>>108493565Where did you get that idea? It's about sparsity in attention, not experts
>>108493364>Closed as completedkek
>>108493572okay that sounds interesting can you educate me or link a paper.
>install opencode>no llamacpp/openai compatible option>uninstall opencode, install kilocode>exactly the same layout>exactly the same provider listPlease help.
>>108493627> "llama.cpp": {> "npm": "@ai-sdk/openai-compatible",How retarded are you?
>>108493627it's in the docs, dude
>>108493794>>108493794>>108493794
>>108488777Back to GLM?
>>108493470>I've been running his branch for the last 2 daysdid you notice a degradation of results compared to fp16 kv cache?