[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


I'm not an AI bro, just someone who likes fucking around with locally ran models. I have a decent laptop (Intel Core i9-14900HX, NVIDIA GeForce RTX 4060 Laptop GPU, 32 GB DDR5-5600), and it struggles with a lot of models (15 tok/sec for nvidia/nemotron-3-nano). Is there anything that can be done about this? And why are LLMs so heavy?
>>
>>108741087
cause they are giant markov chains that represent a wide range of paths
>>
>>108741087
that's a big model for 8gb vram. you should get a macbook with apple silicon and unified memory if you want to run LLM's on your machine.
>>
>>108741087
30b is a big model for consumer hardware. try a 7-14b model. qwen3.6, qwen3.5, qwen3
>>
>>108741087
>Why are Large Language Models so heavy
What the fuck do you think "Large" means nigga? Run a small or local model if you want something lightweight and portable. Lightning fast, small AI tailored to a specific domain are going to be the future anyways.
>>
>>108741087
>Is there anything that can be done about this?
faster hardware, preferably not a laptop
>And why are LLMs so heavy?
heavy? get real, faggot. those sizes are reasonable considering how much data these models are able to vomit up IF your hardware is fast enough.

tl;dr: stop being poor and get good
>>
File: 1597285826246.png (181 KB, 383x396)
181 KB PNG
>>108741087
I'd say 15 tok/second for a free model of that quality on a sub-$2000 mobile device is actually pretty impressive and you should be grateful
>>
>>108741087
Because they get better as they get bigger, so naturally people will make them as big as they fit on enterprise hardware, not as big as they fit on your laptop.
>>
Look man, they already can do a fucking lot.
I run this Gemma 4 E4B or some nonsense and the shit I'm finding is that it's working not much smaller on a piece of dogshit like the mac mini m4 16gb than what I used when I discovered deepseek (non-local). It works stupidly well for basic shit. I'm asking it "hey look, I took this picture, what do you see" and it's accurately telling me that my photographic skills are fucking terrible. I don't need someone on /p/ to tell me anymore and the best of all is, I didn't spend thousands, and I'm not running this online so this critique is kept private.
>>
>>108741087
If Q4_K_M versions are too heavy for your hardware, try Q3_K_M. Perplexity will go up a bit, but it can be a worthwhile tradeoff if the model is just too slow.
>>
File: 1748706665208209.jpg (118 KB, 1531x1068)
118 KB JPG
>>108741087
run full gpu offloads if you want speed, right now the qwen3.5 4B/9B are the best fit for you i think. they for example i typically use a Q3.5-9B with 64k context as it fully fits in my 4070 with some spare room for browser windows and shit like that in gpu memory
>>
File: 1746514863022984.png (107 KB, 1600x1143)
107 KB PNG
>>108745017
4 bit are so popular because it's the exact sweet spot between size and performance in almost all models of the size a typical gaming pc can run (~32B and under)
>>
Your prompt is ready. Speed weak, weights are heavy.
The model's making slop already, code spaghetti.
>>
File: clueless.jpg (34 KB, 750x1000)
34 KB JPG
>>108741087
It's almost as if it's just bruteforcing shit.
>>
>>108747591
>It's almost as if it's just bruteforcing shit.
Exactly.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.