I have a quad 3090 setup with qwen code 30b. With 30k context. I send one message with qwen code cli. I tell it to review my files and explain a project aa a test. It gets to 3 files before the context is out. This was with maybe 500 lines of code total. What gives? A chat bot can go on forever, but the moment I try to use an agent it's pretty much worthless. I'm at the point of trying to set up context and rag to try to mimic even a fraction of what gemenis app builder does. I don't want the cloud. I just want my llm on my hardware. Hundreds of gigs of ram and vram. I should be able to at least have it review some files. Is this shit just ass or is it me who is the brainlet?
first of all 30k context is absolutely miniscule for what you're asking of it, much smaller models can remain coherent with much bigger contexts. second of all the big chatbots have code that intervenes when the context fills up and summarizes what's happened so far. That's why after a while they will still forget who said what and gaslight the shit out of you.
>>107652113I mean the context goes like 16k 30k 60 then 128. Idk if 30k is really that small to review a few cs files.
>>107652098>quad 3090Ayo nigga gimme one you don't need all 4.
>>107652098yea this happens with legacy cards
>>107652098I guess you want to increase context to about what the remote LLMs would have.
>>107652098Try to see if you can find out how much of the context window it's using as it goes and/or what it's putting into the context window. I know the first is somehow possible as I know of tools that will show you that stat, but I'm not sure how to do it exactly since I use proprietary tools internal to the company I work for