[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 3830497_orig.gif (5 KB, 351x276)
5 KB GIF
>"Dude, just use your own local AI! It's better and not censored!"

>actually listen to /g/ and install local AI (gemma4, 26b)
>try it out
>fans spin up like I'm playing Crysis in 2007
>several minutes pass
>its response is barely better than grok, nevermind Claude/Gemini/ChatGPT

lmao, I was bamboozled again
>>
>>several minutes pass
stopped reading there
>>
>>108885822
>26b
lol
>>
Specs? fastfetch? speccy (bleh)?
>>
local AI is only free if your daddy pays for your expensive gpu and electricity
>>
>>108885822
Seriously is there an open source model I with enough billions parameter that’s decent for either 24-32gb gpus?
>>
>>108885822
>several minutes
>on a 26b3 moe
maybe you should get a job so you can buy a computer from this decade instead of playing with ai
>>
>>108885822
"I don't know how to use computers"
>>
>>108885992
>>108886382
>"Run AI locally bro, just buy a $5000 card and do it yourself, bro."
>>
>>108885822
skill issue
>>
File: 1564953104693.jpg (209 KB, 1607x617)
209 KB JPG
does the model fit in vram? is there space left over in vram for context? if not youre doing it wrong
>>
>>108887897
Btw I did 0 research and now I'm whining on /g/
>>
File: 1748383026443316.jpg (16 KB, 260x282)
16 KB JPG
I have a RX 7900 XTX/64GB DDR5 RAM and the results aren't great on local AI either. Maybe only worth doing local AI with Nvidia GPU specific? Tried with gemma4 and phi4, could be bad models too I dunno fuck about the intricacies of AI, I just wanted to see if it was really better than online ones and it wasn't. Only plus was being able to run uncensored models which is okay I guess? Most prompts don't need to be censored anyway, so who cares.
>>
ive found the uncensored prompts to just be for loli and saying nigger, but will refuse racism against jews, it fills the context with <IM END>
>>
>>108888148
>listening to /g/'s recommendations doesn't count as research
>>
File: GGzQBrnaAAEI5TK.jpg (34 KB, 600x360)
34 KB JPG
>>108885822
>fans spin up like I'm playing Crysis in 2007
>>
Bigger the model, the better it is.
Depends on use case, but for me m2.7 at q4 is the minimum I'd ever use locally and that fits into about 180 gb of vram
>>
have you tried not having a dog shit GPU?
>>
>>108885822
>26b Moe
Lmao
What you even tried to do OP? 24b is fine for RP session or gooning but if you tried to vibrcode your dream game forget it, you need to fit in cards +120b model and still have space for context size
>>
>>108885822
You found us out! Clever OP we were trolling you
>>
>>108885822
>its response is barely better than grok, nevermind Claude/Gemini/ChatGPT

Getting around 60-70 tps with MoE models like Gemma 4 or Qwen 3.6, using q4 or iq4_xs quants.

Yes, responses are as good as or better than the 'frontier' models in a lot of categories. For high-stakes work, like legal drafting, I'd stick with frontier models. For coding, Gemma 4 or Qwen 3.6 are both very competent even at q4 quants tha will fit in 24 GB vram with full context (q8 quantized kv cache).

The dense models are slower but better quality overall, especially as context length grows.
>>
>have shit hardware
>use shit model
>get shit results
wow
>>108887897
A 3060 barely costs 300 bucks, bitch.
>>
>>108888183
What model did you use? With that amount of VRAM you should try Qwen 3.6 27B quanted at Q5, or Gemma 4 31B quanted at Q4. Anyways, it won't be better than cloud models, but it'll be yours to do as you please.
>>
>>108887897
>"Run AI locally bro, just buy a $5000 card and do it yourself, bro."

You can build a very good system with two 16 GB VRAM cards like 5060. The prices on 3090s is dropping, and one 3090 is 24 GB. They also support nvlink which lets you share VRAM between cards and can further boost performance. Even if you pay the overpriced $1500+ prices for 2 3090s you're still getting a far better deal than a single 48 GB card that will likely cost $5K+
>>
local AI is not there yet
trannies are lying to you again



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.