/g/ - Why doesn't a single mainstream LLM chatbot provid - Technology

Anonymous

04/10/26(Fri)00:57:36 No.108571118

File: minsk-belarus-openai-chat(...).png (125 KB, 800x800)

Anonymous 04/10/26(Fri)00:57:36 No.108571118 Archived

Why doesn't a single mainstream LLM chatbot provider (GPT, Claude, Gemini, Grok) offer a rolling-memory feature for "infinite" context window?

Anonymous
04/10/26(Fri)00:59:33 No.108571128

Anonymous 04/10/26(Fri)00:59:33 No.108571128

Because it's misleading and makes models retarded
GPT Codex auto-compacts summarize when you're about to run out of context and keeps going, sometimes it's halfway through a job and instantly gets dementia after compacting, also it automatically prunes its own logs as you go so sometimes it just forgets what you were doing a moment ago randomly

Anonymous
04/10/26(Fri)01:01:58 No.108571138

Anonymous 04/10/26(Fri)01:01:58 No.108571138

>>108571128
Surely the summarizing/compression scheme can be gradually improved over time, with some A/B testing like the normal chatbot? I've been doing "manual" context window by asking it to summarize and copy-pasting into a new conversation, and it works so-so, I just wish I wouldn't have to do it manually.

Anonymous
04/10/26(Fri)01:11:16 No.108571160

Anonymous 04/10/26(Fri)01:11:16 No.108571160

>>108571118
Cause then it's not infinite ? What type of buzzword gobbler are you ?

Anonymous
04/10/26(Fri)01:46:48 No.108571253

Anonymous 04/10/26(Fri)01:46:48 No.108571253

>>108571118
Because LLMs are a shitty gimmicky and not real AI.

Anonymous
04/10/26(Fri)03:33:00 No.108571648

Anonymous 04/10/26(Fri)03:33:00 No.108571648

File: 1759296922705390.png (1.54 MB, 960x1152)

1.54 MB PNG

>>108571118
Isn't that the main selling point of opus 4.6?

Anonymous
04/10/26(Fri)04:25:02 No.108571828

Anonymous 04/10/26(Fri)04:25:02 No.108571828

wdym OP? they don’t flatten context to save storage/money, they do it because too much context breaks the bot. that’s why it refuses to do any software task that isn’t replicated 10 million times on github.

Anonymous
04/10/26(Fri)04:39:17 No.108571876

Anonymous 04/10/26(Fri)04:39:17 No.108571876

>>108571138
I just asked Claude Code to do a big step in my 1000 line implementation plan that while it was already at 80% context window usage, it auto-compacted 30 seconds in, ran for 15 minutes and produced a perfect result.
I haven't had any problems with compaction so far

Anonymous
04/10/26(Fri)10:26:16 No.108573739

Anonymous 04/10/26(Fri)10:26:16 No.108573739

>>108571118
LLMs cant into memory properly

Anonymous
04/10/26(Fri)12:48:23 No.108574788

Anonymous 04/10/26(Fri)12:48:23 No.108574788

>>108573739
Why?

Anonymous
04/10/26(Fri)12:55:13 No.108574844

Anonymous 04/10/26(Fri)12:55:13 No.108574844

File: 1775015256419157.gif (1.58 MB, 480x270)

1.58 MB GIF

>>108574788
because you have to pass it all back into the model with every token it generates along with the system instructions and whatever shit it's already outputted

Anonymous
04/10/26(Fri)12:55:26 No.108574846

Anonymous 04/10/26(Fri)12:55:26 No.108574846

>>108574788
LLMs were created with attention mechanism as a core design. With smaller context window (memory) it can focus that attention well. With bigger memory just makes that attention is unable perform as well with risk of more noise with irrelevant data

Anonymous
04/10/26(Fri)13:05:49 No.108574914

Anonymous 04/10/26(Fri)13:05:49 No.108574914

>>108574844
Not really.

>>108574846
You need good compression

Anonymous
04/10/26(Fri)13:50:00 No.108575271

Anonymous 04/10/26(Fri)13:50:00 No.108575271

>>108571118
ChatGPT (the web interface) already has rolling window.
Codex has compaction because they copied CC.
The best is probably a mixed approach where you compact some of the context at the beginning and keep ~2/3 of the context rolling window. But AFAIK nobody has implemented that.
If I had to choose I'd pick rolling window which is what the code assistant I've coded uses.

Anonymous
04/10/26(Fri)14:05:40 No.108575434

Anonymous 04/10/26(Fri)14:05:40 No.108575434

>>108571118
The big providers do caching of input tokens. It works on prefixes. So, if you input ABCD and then you input ABCDE, the processing cost of A->D is very cheap and the only "new" work is on "E". So, ABCDE is actually cheaper to than BCDE, which needs to be computed from scratch.

Anonymous
04/10/26(Fri)14:06:27 No.108575445

Anonymous 04/10/26(Fri)14:06:27 No.108575445

>>108575434
How do you even do that when people have different system prompts?

Anonymous
04/10/26(Fri)14:17:49 No.108575596

Anonymous 04/10/26(Fri)14:17:49 No.108575596

>>108571138
Claude code does it using their retarded model, Haiku. Codex handles context compaction way better in my experience. It doesn't remember every detail but it remembers enough to recheck relevant files and continue without you having to babysit it.

Anonymous
04/10/26(Fri)14:19:06 No.108575615

Anonymous 04/10/26(Fri)14:19:06 No.108575615

>>108571118
Because anyone who knows anything about LLMs knows big context windows result in worse LLMs performance.

Anonymous
04/10/26(Fri)14:34:20 No.108575775

Anonymous 04/10/26(Fri)14:34:20 No.108575775

>>108575615
Without a big context window you get no performance at all because you can't hold all of the relevant information.

Anonymous
04/10/26(Fri)14:45:16 No.108575876

Anonymous 04/10/26(Fri)14:45:16 No.108575876

>>108575775
Read about Ralph loops