[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/sci/ - Science & Math

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • Additional supported file types are: PDF
  • Use with [math] tags for inline and [eqn] tags for block equations.
  • Right-click equations to view the source.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: gump-AI.png (1.71 MB, 945x1424)
1.71 MB
1.71 MB PNG
>Expanding context from 32k to 128k tokens can require 16× more attention compute
>16× more memory for attention matrices
Sparing all the politics of why Anthropic lost their government contract, the result is that they lost billions of dollars and can't afford to run so many servers.
Everyone one seems to be noticing, but a lot of users seem to be fairly ignorant as to why Claude has suddenly been gumped. This seems like a good opportunity to educate people about the quadratic scaling nature of how LLM works.
Large language models built on transformer self-attention require compute and memory that scale quadratically with context length because every token must attend to every other token. Reducing available resources (GPU VRAM, compute throughput, or memory bandwidth) forces the system to reduce sequence length, model size, batch size, or attention precision to stay within hardware limits.
>>
Because the attention matrix grows as O(n2), even small increases in context require disproportionately large increases in resources; conversely, cutting resources requires aggressively shrinking the context window to avoid exceeding memory and compute constraints. A smaller context window restricts how much prior text the model can reference simultaneously, which directly degrades tasks requiring long-range reasoning, document understanding, multi-step planning, and conversation continuity, resulting in a measurable reduction in overall LLM capability.
This process of cutting resources to LLM has been referred to as "gumping", a reference to Forrest Gump. It describes aggressively reducing computational resources allocated to an LLM in a way that significantly limits its functional capability. The analogy comes from the character’s intentionally simplified cognitive portrayal; similarly, when GPU memory, compute budget, or context length are cut, the model must shrink its context window and operational capacity, resulting in reduced reasoning depth, weaker long-range attention, and overall diminished performance. Gumping is a last resort effort of a struggling AI company.
>>
They don't care about free-tier plebs. Anthropic is the only AI company making large profits because a huge number of Enterprise customers are paying expensive subscriptions for Claude Code etc. Because when it comes to software dev it is massively useful and not a gimmick like most of AI. It's why the DoD picked Anthropic in the first place.
>>
>>16928994
For Yakub's sake go to >>>/g/ nigga
>>
File: claude-gumped.png (159 KB, 933x742)
159 KB
159 KB PNG
>>16929000
You don't understand. The paid accounts are gumped. People are paying $100/mo and getting ripped off now. They're panicking.
>>16929010
/g/ is for gadgets and desktop ricing. Not actual computer science.
>>
>>16928994
>quadratic scaling
but isn't this only true in a naive sense? firstly transformers don't have to be n^2 anyway. also once you get models that are 'good enough' then many techniques can be applied recursively to improve the processing in closer to linear scaling, even when the transformer is n^2
>>
>>16929045
There is nothing in your post that can't be handled in >>>/g/
>>
Who gives a shit what AI companies do with their products in the age of open-weight models?
>>
>>16929045
The "nerf" wasn't exactly a nerf. What happened is that a hell of a lot of people stopped using ChatGPT and switched to Claude because of the whole OpenAI agreeing to help develop weapons for the military thing. Anthrophic initially struggled to handle the load and had to rush to increase capacity.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.