Why doesn't a single mainstream LLM chatbot provider (GPT, Claude, Gemini, Grok) offer a rolling-memory feature for "infinite" context window?
Because it's misleading and makes models retardedGPT Codex auto-compacts summarize when you're about to run out of context and keeps going, sometimes it's halfway through a job and instantly gets dementia after compacting, also it automatically prunes its own logs as you go so sometimes it just forgets what you were doing a moment ago randomly
>>108571128Surely the summarizing/compression scheme can be gradually improved over time, with some A/B testing like the normal chatbot? I've been doing "manual" context window by asking it to summarize and copy-pasting into a new conversation, and it works so-so, I just wish I wouldn't have to do it manually.
>>108571118Cause then it's not infinite ? What type of buzzword gobbler are you ?
>>108571118Because LLMs are a shitty gimmicky and not real AI.
>>108571118Isn't that the main selling point of opus 4.6?
wdym OP? they don’t flatten context to save storage/money, they do it because too much context breaks the bot. that’s why it refuses to do any software task that isn’t replicated 10 million times on github.
>>108571138I just asked Claude Code to do a big step in my 1000 line implementation plan that while it was already at 80% context window usage, it auto-compacted 30 seconds in, ran for 15 minutes and produced a perfect result.I haven't had any problems with compaction so far
>>108571118LLMs cant into memory properly
>>108573739Why?
>>108574788because you have to pass it all back into the model with every token it generates along with the system instructions and whatever shit it's already outputted
>>108574788LLMs were created with attention mechanism as a core design. With smaller context window (memory) it can focus that attention well. With bigger memory just makes that attention is unable perform as well with risk of more noise with irrelevant data
>>108574844Not really.>>108574846You need good compression
>>108571118ChatGPT (the web interface) already has rolling window.Codex has compaction because they copied CC.The best is probably a mixed approach where you compact some of the context at the beginning and keep ~2/3 of the context rolling window. But AFAIK nobody has implemented that.If I had to choose I'd pick rolling window which is what the code assistant I've coded uses.
>>108571118The big providers do caching of input tokens. It works on prefixes. So, if you input ABCD and then you input ABCDE, the processing cost of A->D is very cheap and the only "new" work is on "E". So, ABCDE is actually cheaper to than BCDE, which needs to be computed from scratch.
>>108575434How do you even do that when people have different system prompts?
>>108571138Claude code does it using their retarded model, Haiku. Codex handles context compaction way better in my experience. It doesn't remember every detail but it remembers enough to recheck relevant files and continue without you having to babysit it.
>>108571118Because anyone who knows anything about LLMs knows big context windows result in worse LLMs performance.
>>108575615Without a big context window you get no performance at all because you can't hold all of the relevant information.
>>108575775Read about Ralph loops