/sci/ - The slow death of LLM - Science & Math


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
The slow death of LLM 03/13/26(Fri)20:14:33 No.16928994

File: gump-AI.png (1.71 MB, 945x1424)

The slow death of LLM Anonymous 03/13/26(Fri)20:14:33 No.16928994

>Expanding context from 32k to 128k tokens can require 16× more attention compute
>16× more memory for attention matrices
Sparing all the politics of why Anthropic lost their government contract, the result is that they lost billions of dollars and can't afford to run so many servers.
Everyone one seems to be noticing, but a lot of users seem to be fairly ignorant as to why Claude has suddenly been gumped. This seems like a good opportunity to educate people about the quadratic scaling nature of how LLM works.
Large language models built on transformer self-attention require compute and memory that scale quadratically with context length because every token must attend to every other token. Reducing available resources (GPU VRAM, compute throughput, or memory bandwidth) forces the system to reduce sequence length, model size, batch size, or attention precision to stay within hardware limits.

Anonymous
03/13/26(Fri)20:14:56 No.16928996

Anonymous 03/13/26(Fri)20:14:56 No.16928996

Because the attention matrix grows as O(n2), even small increases in context require disproportionately large increases in resources; conversely, cutting resources requires aggressively shrinking the context window to avoid exceeding memory and compute constraints. A smaller context window restricts how much prior text the model can reference simultaneously, which directly degrades tasks requiring long-range reasoning, document understanding, multi-step planning, and conversation continuity, resulting in a measurable reduction in overall LLM capability.
This process of cutting resources to LLM has been referred to as "gumping", a reference to Forrest Gump. It describes aggressively reducing computational resources allocated to an LLM in a way that significantly limits its functional capability. The analogy comes from the character’s intentionally simplified cognitive portrayal; similarly, when GPU memory, compute budget, or context length are cut, the model must shrink its context window and operational capacity, resulting in reduced reasoning depth, weaker long-range attention, and overall diminished performance. Gumping is a last resort effort of a struggling AI company.

Anonymous
03/13/26(Fri)20:20:23 No.16929000

Anonymous 03/13/26(Fri)20:20:23 No.16929000

They don't care about free-tier plebs. Anthropic is the only AI company making large profits because a huge number of Enterprise customers are paying expensive subscriptions for Claude Code etc. Because when it comes to software dev it is massively useful and not a gimmick like most of AI. It's why the DoD picked Anthropic in the first place.

Anonymous
03/13/26(Fri)20:32:23 No.16929010

Anonymous 03/13/26(Fri)20:32:23 No.16929010

>>16928994
For Yakub's sake go to >>>/g/ nigga

Anonymous
03/13/26(Fri)21:13:26 No.16929045

Anonymous 03/13/26(Fri)21:13:26 No.16929045

File: claude-gumped.png (159 KB, 933x742)

159 KB PNG

>>16929000
You don't understand. The paid accounts are gumped. People are paying $100/mo and getting ripped off now. They're panicking.
>>16929010
/g/ is for gadgets and desktop ricing. Not actual computer science.

Anonymous
03/13/26(Fri)21:18:28 No.16929047

Anonymous 03/13/26(Fri)21:18:28 No.16929047

>>16928994
>quadratic scaling
but isn't this only true in a naive sense? firstly transformers don't have to be n^2 anyway. also once you get models that are 'good enough' then many techniques can be applied recursively to improve the processing in closer to linear scaling, even when the transformer is n^2

Anonymous
03/13/26(Fri)21:27:17 No.16929060

Anonymous 03/13/26(Fri)21:27:17 No.16929060

>>16929045
There is nothing in your post that can't be handled in >>>/g/

Anonymous
03/13/26(Fri)22:48:49 No.16929119

Anonymous 03/13/26(Fri)22:48:49 No.16929119

Who gives a shit what AI companies do with their products in the age of open-weight models?

Anonymous
03/14/26(Sat)01:06:02 No.16929198

Anonymous 03/14/26(Sat)01:06:02 No.16929198

>>16929045
The "nerf" wasn't exactly a nerf. What happened is that a hell of a lot of people stopped using ChatGPT and switched to Claude because of the whole OpenAI agreeing to help develop weapons for the military thing. Anthrophic initially struggled to handle the load and had to rush to increase capacity.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. Additional supported file types are: PDF Use T_eX with [math] tags for inline and [eqn] tags for block equations. Right-click equations to view the source.