>Expanding context from 32k to 128k tokens can require 16× more attention compute>16× more memory for attention matricesSparing all the politics of why Anthropic lost their government contract, the result is that they lost billions of dollars and can't afford to run so many servers. Everyone one seems to be noticing, but a lot of users seem to be fairly ignorant as to why Claude has suddenly been gumped. This seems like a good opportunity to educate people about the quadratic scaling nature of how LLM works.Large language models built on transformer self-attention require compute and memory that scale quadratically with context length because every token must attend to every other token. Reducing available resources (GPU VRAM, compute throughput, or memory bandwidth) forces the system to reduce sequence length, model size, batch size, or attention precision to stay within hardware limits.
Because the attention matrix grows as O(n2), even small increases in context require disproportionately large increases in resources; conversely, cutting resources requires aggressively shrinking the context window to avoid exceeding memory and compute constraints. A smaller context window restricts how much prior text the model can reference simultaneously, which directly degrades tasks requiring long-range reasoning, document understanding, multi-step planning, and conversation continuity, resulting in a measurable reduction in overall LLM capability.This process of cutting resources to LLM has been referred to as "gumping", a reference to Forrest Gump. It describes aggressively reducing computational resources allocated to an LLM in a way that significantly limits its functional capability. The analogy comes from the character’s intentionally simplified cognitive portrayal; similarly, when GPU memory, compute budget, or context length are cut, the model must shrink its context window and operational capacity, resulting in reduced reasoning depth, weaker long-range attention, and overall diminished performance. Gumping is a last resort effort of a struggling AI company.
They don't care about free-tier plebs. Anthropic is the only AI company making large profits because a huge number of Enterprise customers are paying expensive subscriptions for Claude Code etc. Because when it comes to software dev it is massively useful and not a gimmick like most of AI. It's why the DoD picked Anthropic in the first place.
>>16928994For Yakub's sake go to >>>/g/ nigga
>>16929000You don't understand. The paid accounts are gumped. People are paying $100/mo and getting ripped off now. They're panicking.>>16929010/g/ is for gadgets and desktop ricing. Not actual computer science.
>>16928994>quadratic scalingbut isn't this only true in a naive sense? firstly transformers don't have to be n^2 anyway. also once you get models that are 'good enough' then many techniques can be applied recursively to improve the processing in closer to linear scaling, even when the transformer is n^2
>>16929045There is nothing in your post that can't be handled in >>>/g/
Who gives a shit what AI companies do with their products in the age of open-weight models?
>>16929045The "nerf" wasn't exactly a nerf. What happened is that a hell of a lot of people stopped using ChatGPT and switched to Claude because of the whole OpenAI agreeing to help develop weapons for the military thing. Anthrophic initially struggled to handle the load and had to rush to increase capacity.