/g/ - Why are LLMs so heavy? - Technology

Anonymous

Why are LLMs so heavy? 05/02/26(Sat)20:37:39 No.108741087

File: Screenshot 2026-05-02 173028.png (165 KB, 1477x687)

Why are LLMs so heavy? Anonymous 05/02/26(Sat)20:37:39 No.108741087 Archived

I'm not an AI bro, just someone who likes fucking around with locally ran models. I have a decent laptop (Intel Core i9-14900HX, NVIDIA GeForce RTX 4060 Laptop GPU, 32 GB DDR5-5600), and it struggles with a lot of models (15 tok/sec for nvidia/nemotron-3-nano). Is there anything that can be done about this? And why are LLMs so heavy?

Anonymous
05/02/26(Sat)21:10:28 No.108741210

Anonymous 05/02/26(Sat)21:10:28 No.108741210

>>108741087
cause they are giant markov chains that represent a wide range of paths

Anonymous
05/02/26(Sat)21:31:23 No.108741270

Anonymous 05/02/26(Sat)21:31:23 No.108741270

>>108741087
that's a big model for 8gb vram. you should get a macbook with apple silicon and unified memory if you want to run LLM's on your machine.

Anonymous
05/02/26(Sat)21:43:34 No.108741312

Anonymous 05/02/26(Sat)21:43:34 No.108741312

>>108741087
30b is a big model for consumer hardware. try a 7-14b model. qwen3.6, qwen3.5, qwen3

Anonymous
05/02/26(Sat)23:42:59 No.108741832

Anonymous 05/02/26(Sat)23:42:59 No.108741832

>>108741087
>Why are Large Language Models so heavy
What the fuck do you think "Large" means nigga? Run a small or local model if you want something lightweight and portable. Lightning fast, small AI tailored to a specific domain are going to be the future anyways.

Anonymous
05/03/26(Sun)02:04:19 No.108742272

Anonymous 05/03/26(Sun)02:04:19 No.108742272

>>108741087
>Is there anything that can be done about this?
faster hardware, preferably not a laptop
>And why are LLMs so heavy?
heavy? get real, faggot. those sizes are reasonable considering how much data these models are able to vomit up IF your hardware is fast enough.

tl;dr: stop being poor and get good

Anonymous
05/03/26(Sun)02:19:33 No.108742325

Anonymous 05/03/26(Sun)02:19:33 No.108742325

File: 1597285826246.png (181 KB, 383x396)

181 KB PNG

>>108741087
I'd say 15 tok/second for a free model of that quality on a sub-$2000 mobile device is actually pretty impressive and you should be grateful

Anonymous
05/03/26(Sun)02:20:00 No.108742327

Anonymous 05/03/26(Sun)02:20:00 No.108742327

>>108741087
Because they get better as they get bigger, so naturally people will make them as big as they fit on enterprise hardware, not as big as they fit on your laptop.

Anonymous
05/03/26(Sun)09:51:00 No.108744194

Anonymous 05/03/26(Sun)09:51:00 No.108744194

Look man, they already can do a fucking lot.
I run this Gemma 4 E4B or some nonsense and the shit I'm finding is that it's working not much smaller on a piece of dogshit like the mac mini m4 16gb than what I used when I discovered deepseek (non-local). It works stupidly well for basic shit. I'm asking it "hey look, I took this picture, what do you see" and it's accurately telling me that my photographic skills are fucking terrible. I don't need someone on /p/ to tell me anymore and the best of all is, I didn't spend thousands, and I'm not running this online so this critique is kept private.

Anonymous
05/03/26(Sun)12:01:38 No.108745017

Anonymous 05/03/26(Sun)12:01:38 No.108745017

>>108741087
If Q4_K_M versions are too heavy for your hardware, try Q3_K_M. Perplexity will go up a bit, but it can be a worthwhile tradeoff if the model is just too slow.

Anonymous
05/03/26(Sun)12:43:55 No.108745308

Anonymous 05/03/26(Sun)12:43:55 No.108745308

File: 1748706665208209.jpg (118 KB, 1531x1068)

118 KB JPG

>>108741087
run full gpu offloads if you want speed, right now the qwen3.5 4B/9B are the best fit for you i think. they for example i typically use a Q3.5-9B with 64k context as it fully fits in my 4070 with some spare room for browser windows and shit like that in gpu memory

Anonymous
05/03/26(Sun)12:48:25 No.108745340

Anonymous 05/03/26(Sun)12:48:25 No.108745340

File: 1746514863022984.png (107 KB, 1600x1143)

107 KB PNG

>>108745017
4 bit are so popular because it's the exact sweet spot between size and performance in almost all models of the size a typical gaming pc can run (~32B and under)

Anonymous
05/03/26(Sun)17:00:20 No.108747019

Anonymous 05/03/26(Sun)17:00:20 No.108747019

Your prompt is ready. Speed weak, weights are heavy.
The model's making slop already, code spaghetti.

Anonymous
05/03/26(Sun)18:28:37 No.108747591

Anonymous 05/03/26(Sun)18:28:37 No.108747591

File: clueless.jpg (34 KB, 750x1000)

34 KB JPG

>>108741087
It's almost as if it's just bruteforcing shit.

Anonymous
05/03/26(Sun)19:06:03 No.108747817

Anonymous 05/03/26(Sun)19:06:03 No.108747817

>>108747591
>It's almost as if it's just bruteforcing shit.
Exactly.