memory prices coming back down to normal sooner than you think
>>108460578That's just the cache though. Newer models have already been able to do 1 GB per 10k tokens for cache. If it comes down further, great, but it's not going to change the biggest LLMs requiring 1.5 TB of VRAM to run.
>>108460591anon, can you tell the class where cache is being stored and how that might affect memory prices?
>>108460618It's a series of tubes
>>108460578>memory prices coming back down to normal sooner than you thinkhttps://en.wikipedia.org/w/index.php?title=Jevons_paradox
>>108460618On whatever medium happens to be faster than whatever you're originally loading it from. If you're loading it from memory the cache is on the CPU. If you're loading it from an NVMe the page cache is in memory. If You're loading it from network you could cache it on mass storage.
>>108460661It's going to get much worse in the future whenever LLMs get replaced by something better.
>>108460578As if that's going to fix anything. Even if memory use was made 100x more efficient, they'd just understand that they can keep their orders in place and use 100x as much bullshit.
>>108460578>sooner than you thinkless than two weeks?
>>108460578quantizing kv cache to 8 bit completely mindrapes the model and makes it hallucinate and go off the rails even worse than they do normally, 4 bit amplifies that even more. There is no way you're getting 3 bit KV quant with "no accuracy loss"
>>108460591it's KV cache not CPU/GPU cache, and you know it, you are one of these SK Hynix / Samsung bag holders screeching at your stocks dumping
>>108460578what if Jevon's paradox applies to this?
>>108460635Bags of sand
>>108461028Shoop do woop with milk and pennies, my mudkip
this paper is from q1 of last year and its effects are already in place
>>108460578Even if this works>Implying they won't run more and bigger shit on same hardware>Implying they'll stop buying up supply which sucks out oxygen from competition>Implying the plan isn't to bully out personal computing as a concept and force everything to be hardware as a service
>>108460591>>108460618so is this very good news or not???>>108460981>Jevon's paradoxyou know what, 100% a nothingburger. with these greedy techniggers we can only lose.
>>108461139It's not.
>>108460578does this mean anything for coomer image and video gen
i am begging people to stop reading pc gay men consumer blogs for ai news
>>108461139>so is this very good news or not???It's good news, but not something revolutionary for home use. The biggest benefit is when you run concurrent requests that all need their own context. Even at 1 GB per 10k tokens it adds up pretty quick. 10 GB for 100k tokens, but if you have 20 concurrent requests then that's 200 GB.
>>108461258>Even at 1 GB per 10k tokens it adds up pretty quick. 10 GB for 100k tokens, but if you have 20 concurrent requests then that's 200 GB.what? does this make any sense? I think he speaks gibberish like a tard and has no idea whatsup.
>>108460578That's just wrong. It means they will be able to do even more with the memory they already have, and can have.
>>108460578Wait... over UNQUANTIZED bits???What about comparison to the existing quantization?
>>108460981Jevon's paradox applies to everything that consumes resources.
>>108460578>google>not even 10xPrices are never coming down.
>>108460578okay can I locally run a 1 trillion qubit ai locally yet?
>>108461621>Jevon's paradox applies to everything that consumes resourcesEven me?Wait, every time when I get better I consume more.When my gf gets better penis, she wants more.This checks out. Ai could 10x tomorrow and it would only mean more porn and more slop generation. Humans are a virus.
>>108460578>google'sYeah no, the prices are gonna go up thanks to them.
>>108460661this>thanks to leds outdoor lighting is going to consume much less electricity>proceeds to install 100x more outdoor lights>oh for some reason outdoor lightning consume even more electricity than before