[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Click here to apply.


[Advertise on 4chan]


File: deepresize1.jpg (38 KB, 1200x801)
38 KB
38 KB JPG
when will I be able to run 120b+ open source ai models at home?
>>
Your government won't let you.
>>
>>106640370
When it's all obsolete.
>>
>>106640370
You could run deepseek r1 671b itself on a mid tier 1.5k gaming rig with a 3090 and 128gb of ram almost since the beginning
https://unsloth.ai/blog/deepseekr1-dynamic
The dynamic quants of huge models are very good.

And for tiny 120b models you can run them at great Q3_k_S quants just with 128gb cheap ram on a poorfag gaming rig

>inb4 muh speed
3-8t/s is far from slow for that money, retard
>>
>>106640370
When you buy two or three rtx 6000 pros. Stop being poor
>>
>>106642266
>Stop being poor
Just buy a H100, they're only a bit over $30k
>>
>>106640370
m4 ultra with 512GB of VRAM can run anything
>>
>>106640627
>with a 3090 and 128gb of ram
>3-8t/s
seriously doubt this, proof?
it's a 100gb+ model and you have 24 GB of vram, you need to constantly load the 76GB into the GPU which is bottlenecked by PCIe5's 64 GB/s
>>
Apple is the best bet right now due to unified memory, but there are specialized AI boxes coming out with lots of unified memory that cost even less.

Remember when having a discrete GPU was better than shared memory? No longer the case.
>>
>>106642535
Costs $10k, but given you can do anything you want fully offline at speed actually pretty reasonable. Anyone with monty and concerned about privacy should be doing that.
>>
>>106642577
you can't run the full model with it
1TB is needed for that
>>
>>106642558
>what are mixture of experts models
>>
>>106642580
If you buy two can you cluster?
>>
>>106642583
deepseek r1 671b is such a model?
>>
>>106642569
What boxes
When
>>
>>106642569
Apple is developing mobile HBM with much higher bandwidth than DDR and much lower power usage than desktop HBM
>>
File: 128GB APU.png (76 KB, 1022x472)
76 KB
76 KB PNG
>>106642796
>>
>>106642569
prompt processing speed is abysmal though
>>
>>106643280
128gb is not nearly enough
Mac can do 512gb
>>
>>106640370
I have a server with 400gb+ of RAM, given enough drive space I bet I could run it on CPU at a snail's pace
>>
>>106643353
Yes you can!
>>106643346
True these make no sense until they offer more RAM.
>>
>>106640370
When you'll stop being poor
>>
>>106643280
>amd
>ai
>>
>>106643360
Strip halo can't even support more than 128gb of ram
So we'll have to wait for new architecture
>>
>>106640370
Right now? Don't even need that expensive hardware.
>>
>>106642266
>>106643363
>>106643597
I'm not buying 4 $10k gpus. when can I buy
>>106643280
this cpu for my am5 motherboard?
>>
>>106643908
No it's a soc
>>
>>106643908
>I'm not buying 4 $10k gpus. when can I buy
M4 Mac Mini? Need cheaper? Discounted Xeon server.
>>
>>106643280
>random chinese mfg
no thanks. and why the fuck did AMD limit strix halo to 128? stop landing behind the curve. it needs 256 and a pcie slot for a gpu minimum.
>>
>>106642653
Yep.
>>
so what do you guys do with this thing? Generate porn or a personal search engine with no tracking?
>>
>>106647938
I don't use Deepseek specifically but LLMs are good at NLP so anything from OCR, light translation work, etc.
/lmg/ exists but it's mostly people ERPing.
>>
i just want a small form factor gpu that has 256 gb of vram is that too hard to ask for
>>
>>106640627
I have a 3090. You're telling me I just need 128GB RAM and I can run Deepseek? At which quantization?
>>
>>106649341
That anon is overselling it. The super low deepseek quants are braindead compared to q4 Qwen or GLM 4.5. It also drops to much slower once you start filling the context, after a few thousand words it'll crawl at like 2-4t/s.
I have a 7900xtx with 64GB DDR5 and I can run GLM4.5 Air at around 12t/s until I hit around 5k context then it starts dropping fast. Dual channel ram bandwidth is the main bottleneck. I could probably get double the speed if I had an epyc system with 8 or 12 channel ram.
You can do a lot with a 3090 and some fast ram but 128GB is pointless when you're already bottlenecked by bandwidth. 64GB is the sweet spot.
>>
>>106640370
When there's actual competition for nvidia
>>
>>106649532
I thought it might be something like that. I remember when R1 came out there was a lot of hype about "technically" being able to run it locally (but it was just retarded quants and distills).
Do you happen to have any recommendations for scientific and technical questions with search assist? I'm looking into Unsloth's dynamic quant of Gemma 3.
>>
>>106649532
>That anon is overselling it
>I have a 7900xtx with 64GB DDR5
lmao, you need 128gb ram and nvidia gpu, retard
and the quant is not braindead, not that you would know when you cant run it
>>
>>106643280
This
The Strix Halo mini PCs are probably the best option, and likely to get better as the driver quality improves.
>>106643346
Yes it can at a whopping 10 grand. I don't consider 10 grand to be regular consumer money. But upwards of 2 grand certainly is.
>>
>>106643280
>$1700
Who the fuck would buy this? For what purpose?
>>
>>106642577
>speed actually pretty reasonable
I'm actually one one of these beefy mac studios, and it doesn't compare to NVIDIA at all. We're talking small amounts of tokens per second.
>>
File: 1739323834464764.png (387 KB, 2964x1338)
387 KB
387 KB PNG
>>106647938
Help me with random tech shit, OCR (not with Deepseek yet unfortunately), translations, (E)RP, creative writing, data formatting, etc. There's quite a range it's capable of. I've been working on making a tabletop RPG-lite simulator with it lately.
>>
File: oaijfoijsafd.png (1.05 MB, 1024x683)
1.05 MB
1.05 MB PNG
>>106640370

you can you just need alot of ram
>>
>>106640370
remember when they said this shit will kill chatgpt? lol
>>
>>106649616
Ollama fucked everything up by calling all the deepseek distills "deepseek". So every retard thought they were running full Deepseek R1 on their laptop. Now there's idiots saying they can run R1 just fine if they quant out 99.999% of the data.
Qwen 3 and GLM 4.5 are pretty good for technical questions. Gemma 3 isn't bad but I only use the abliterated 27B model on rare occasions. Idk about search assist since I never run that locally but Qwen3 supposedly has great tool calling. Unsloth is a good source for models but check the community notes too, sometimes they fuck up the prompt templates and have to patch in a fix.
>>106649993
>lmao, you need 128gb ram and nvidia gpu, retard
No one gives a shit pajeet, 128GB is a waste when you're stuck on dual channel. Go ask your lobotomized deepseek model what ram bandwidth is, I'll come back tomorrow when it's ready to give me the wrong answer.
>not that you would know when you cant run it
I've run a hundred different quants of deepseek and not just on my own hardware. I've made my own quants of deepseek too. Anything under q4 is unusable and the only reason you don't see it is because you are probably just as retarded.
>>
>>106650863
Thanks. I appreciate all the advice. I'll give those three models a try.
>Idk about search assist since I never run that locally but Qwen3 supposedly has great tool calling.
I'll look into it. Most of my prompts are about seeking out and summarising online information so that's a big help if search is well integrated.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.