/g/ - when will I be able to run 120b+ open source ai mo - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
09/19/25(Fri)21:53:43 No.106640370

File: deepresize1.jpg (38 KB, 1200x801)

38 KB JPG

Anonymous 09/19/25(Fri)21:53:43 No.106640370

when will I be able to run 120b+ open source ai models at home?

Anonymous
09/19/25(Fri)22:03:10 No.106640448

Anonymous 09/19/25(Fri)22:03:10 No.106640448

Your government won't let you.

Anonymous
09/19/25(Fri)22:04:16 No.106640453

Anonymous 09/19/25(Fri)22:04:16 No.106640453

>>106640370
When it's all obsolete.

Anonymous
09/19/25(Fri)22:29:22 No.106640627

Anonymous 09/19/25(Fri)22:29:22 No.106640627

>>106640370
You could run deepseek r1 671b itself on a mid tier 1.5k gaming rig with a 3090 and 128gb of ram almost since the beginning
https://unsloth.ai/blog/deepseekr1-dynamic
The dynamic quants of huge models are very good.

And for tiny 120b models you can run them at great Q3_k_S quants just with 128gb cheap ram on a poorfag gaming rig

>inb4 muh speed
3-8t/s is far from slow for that money, retard

Anonymous
09/20/25(Sat)03:34:32 No.106642266

Anonymous 09/20/25(Sat)03:34:32 No.106642266

>>106640370
When you buy two or three rtx 6000 pros. Stop being poor

Anonymous
09/20/25(Sat)04:21:15 No.106642507

Anonymous 09/20/25(Sat)04:21:15 No.106642507

>>106642266
>Stop being poor
Just buy a H100, they're only a bit over $30k

Anonymous
09/20/25(Sat)04:25:51 No.106642535

Anonymous 09/20/25(Sat)04:25:51 No.106642535

>>106640370
m4 ultra with 512GB of VRAM can run anything

Anonymous
09/20/25(Sat)04:30:40 No.106642558

Anonymous 09/20/25(Sat)04:30:40 No.106642558

>>106640627
>with a 3090 and 128gb of ram
>3-8t/s
seriously doubt this, proof?
it's a 100gb+ model and you have 24 GB of vram, you need to constantly load the 76GB into the GPU which is bottlenecked by PCIe5's 64 GB/s

Anonymous
09/20/25(Sat)04:32:52 No.106642569

Anonymous 09/20/25(Sat)04:32:52 No.106642569

Apple is the best bet right now due to unified memory, but there are specialized AI boxes coming out with lots of unified memory that cost even less.

Remember when having a discrete GPU was better than shared memory? No longer the case.

Anonymous
09/20/25(Sat)04:34:50 No.106642577

Anonymous 09/20/25(Sat)04:34:50 No.106642577

>>106642535
Costs $10k, but given you can do anything you want fully offline at speed actually pretty reasonable. Anyone with monty and concerned about privacy should be doing that.

Anonymous
09/20/25(Sat)04:35:38 No.106642580

Anonymous 09/20/25(Sat)04:35:38 No.106642580

>>106642577
you can't run the full model with it
1TB is needed for that

Anonymous
09/20/25(Sat)04:36:13 No.106642583

Anonymous 09/20/25(Sat)04:36:13 No.106642583

>>106642558
>what are mixture of experts models

Anonymous
09/20/25(Sat)04:40:27 No.106642604

Anonymous 09/20/25(Sat)04:40:27 No.106642604

>>106642580
If you buy two can you cluster?

Anonymous
09/20/25(Sat)04:50:34 No.106642653

Anonymous 09/20/25(Sat)04:50:34 No.106642653

>>106642583
deepseek r1 671b is such a model?

Anonymous
09/20/25(Sat)05:18:48 No.106642796

Anonymous 09/20/25(Sat)05:18:48 No.106642796

>>106642569
What boxes
When

Anonymous
09/20/25(Sat)05:24:01 No.106642832

Anonymous 09/20/25(Sat)05:24:01 No.106642832

>>106642569
Apple is developing mobile HBM with much higher bandwidth than DDR and much lower power usage than desktop HBM

Anonymous
09/20/25(Sat)06:43:33 No.106643280

Anonymous 09/20/25(Sat)06:43:33 No.106643280

File: 128GB APU.png (76 KB, 1022x472)

76 KB PNG

>>106642796

Anonymous
09/20/25(Sat)06:44:36 No.106643287

Anonymous 09/20/25(Sat)06:44:36 No.106643287

>>106642569
prompt processing speed is abysmal though

Anonymous
09/20/25(Sat)06:53:46 No.106643346

Anonymous 09/20/25(Sat)06:53:46 No.106643346

>>106643280
128gb is not nearly enough
Mac can do 512gb

Anonymous
09/20/25(Sat)06:54:51 No.106643353

Anonymous 09/20/25(Sat)06:54:51 No.106643353

>>106640370
I have a server with 400gb+ of RAM, given enough drive space I bet I could run it on CPU at a snail's pace

Anonymous
09/20/25(Sat)06:55:53 No.106643360

Anonymous 09/20/25(Sat)06:55:53 No.106643360

>>106643353
Yes you can!
>>106643346
True these make no sense until they offer more RAM.

Anonymous
09/20/25(Sat)06:56:24 No.106643363

Anonymous 09/20/25(Sat)06:56:24 No.106643363

>>106640370
When you'll stop being poor

Anonymous
09/20/25(Sat)06:57:42 No.106643374

Anonymous 09/20/25(Sat)06:57:42 No.106643374

>>106643280
>amd
>ai

Anonymous
09/20/25(Sat)07:31:06 No.106643564

Anonymous 09/20/25(Sat)07:31:06 No.106643564

>>106643360
Strip halo can't even support more than 128gb of ram
So we'll have to wait for new architecture

Anonymous
09/20/25(Sat)07:36:33 No.106643597

Anonymous 09/20/25(Sat)07:36:33 No.106643597

>>106640370
Right now? Don't even need that expensive hardware.

Anonymous
09/20/25(Sat)08:30:05 No.106643908

Anonymous 09/20/25(Sat)08:30:05 No.106643908

>>106642266
>>106643363
>>106643597
I'm not buying 4 $10k gpus. when can I buy
>>106643280
this cpu for my am5 motherboard?

Anonymous
09/20/25(Sat)08:33:41 No.106643932

Anonymous 09/20/25(Sat)08:33:41 No.106643932

>>106643908
No it's a soc

Anonymous
09/20/25(Sat)08:41:27 No.106643974

Anonymous 09/20/25(Sat)08:41:27 No.106643974

>>106643908
>I'm not buying 4 $10k gpus. when can I buy
M4 Mac Mini? Need cheaper? Discounted Xeon server.

Anonymous
09/20/25(Sat)11:47:49 No.106645709

Anonymous 09/20/25(Sat)11:47:49 No.106645709

>>106643280
>random chinese mfg
no thanks. and why the fuck did AMD limit strix halo to 128? stop landing behind the curve. it needs 256 and a pcie slot for a gpu minimum.

Anonymous
09/20/25(Sat)15:26:47 No.106647908

Anonymous 09/20/25(Sat)15:26:47 No.106647908

>>106642653
Yep.

Anonymous
09/20/25(Sat)15:29:05 No.106647938

Anonymous 09/20/25(Sat)15:29:05 No.106647938

so what do you guys do with this thing? Generate porn or a personal search engine with no tracking?

Anonymous
09/20/25(Sat)15:31:18 No.106647959

Anonymous 09/20/25(Sat)15:31:18 No.106647959

>>106647938
I don't use Deepseek specifically but LLMs are good at NLP so anything from OCR, light translation work, etc.
/lmg/ exists but it's mostly people ERPing.

Anonymous
09/20/25(Sat)17:22:24 No.106648942

Anonymous 09/20/25(Sat)17:22:24 No.106648942

i just want a small form factor gpu that has 256 gb of vram is that too hard to ask for

Anonymous
09/20/25(Sat)18:17:22 No.106649341

Anonymous 09/20/25(Sat)18:17:22 No.106649341

>>106640627
I have a 3090. You're telling me I just need 128GB RAM and I can run Deepseek? At which quantization?

Anonymous
09/20/25(Sat)18:38:42 No.106649532

Anonymous 09/20/25(Sat)18:38:42 No.106649532

>>106649341
That anon is overselling it. The super low deepseek quants are braindead compared to q4 Qwen or GLM 4.5. It also drops to much slower once you start filling the context, after a few thousand words it'll crawl at like 2-4t/s.
I have a 7900xtx with 64GB DDR5 and I can run GLM4.5 Air at around 12t/s until I hit around 5k context then it starts dropping fast. Dual channel ram bandwidth is the main bottleneck. I could probably get double the speed if I had an epyc system with 8 or 12 channel ram.
You can do a lot with a 3090 and some fast ram but 128GB is pointless when you're already bottlenecked by bandwidth. 64GB is the sweet spot.

Anonymous
09/20/25(Sat)18:40:09 No.106649538

Anonymous 09/20/25(Sat)18:40:09 No.106649538

>>106640370
When there's actual competition for nvidia

Anonymous
09/20/25(Sat)18:52:34 No.106649616

Anonymous 09/20/25(Sat)18:52:34 No.106649616

>>106649532
I thought it might be something like that. I remember when R1 came out there was a lot of hype about "technically" being able to run it locally (but it was just retarded quants and distills).
Do you happen to have any recommendations for scientific and technical questions with search assist? I'm looking into Unsloth's dynamic quant of Gemma 3.

Anonymous
09/20/25(Sat)19:57:26 No.106649993

Anonymous 09/20/25(Sat)19:57:26 No.106649993

>>106649532
>That anon is overselling it
>I have a 7900xtx with 64GB DDR5
lmao, you need 128gb ram and nvidia gpu, retard
and the quant is not braindead, not that you would know when you cant run it

Anonymous
09/20/25(Sat)20:00:53 No.106650014

Anonymous 09/20/25(Sat)20:00:53 No.106650014

>>106643280
This
The Strix Halo mini PCs are probably the best option, and likely to get better as the driver quality improves.
>>106643346
Yes it can at a whopping 10 grand. I don't consider 10 grand to be regular consumer money. But upwards of 2 grand certainly is.

Anonymous
09/20/25(Sat)20:07:00 No.106650050

Anonymous 09/20/25(Sat)20:07:00 No.106650050

>>106643280
>$1700
Who the fuck would buy this? For what purpose?

Anonymous
09/20/25(Sat)20:33:14 No.106650241

Anonymous 09/20/25(Sat)20:33:14 No.106650241

>>106642577
>speed actually pretty reasonable
I'm actually one one of these beefy mac studios, and it doesn't compare to NVIDIA at all. We're talking small amounts of tokens per second.

Anonymous
09/20/25(Sat)21:12:51 No.106650462

Anonymous 09/20/25(Sat)21:12:51 No.106650462

File: 1739323834464764.png (387 KB, 2964x1338)

387 KB PNG

>>106647938
Help me with random tech shit, OCR (not with Deepseek yet unfortunately), translations, (E)RP, creative writing, data formatting, etc. There's quite a range it's capable of. I've been working on making a tabletop RPG-lite simulator with it lately.

Anonymous
09/20/25(Sat)21:23:52 No.106650532

Anonymous 09/20/25(Sat)21:23:52 No.106650532

File: oaijfoijsafd.png (1.05 MB, 1024x683)

1.05 MB PNG

>>106640370

you can you just need alot of ram

Anonymous
09/20/25(Sat)21:24:31 No.106650534

Anonymous 09/20/25(Sat)21:24:31 No.106650534

>>106640370
remember when they said this shit will kill chatgpt? lol

Anonymous
09/20/25(Sat)22:12:19 No.106650863

Anonymous 09/20/25(Sat)22:12:19 No.106650863

>>106649616
Ollama fucked everything up by calling all the deepseek distills "deepseek". So every retard thought they were running full Deepseek R1 on their laptop. Now there's idiots saying they can run R1 just fine if they quant out 99.999% of the data.
Qwen 3 and GLM 4.5 are pretty good for technical questions. Gemma 3 isn't bad but I only use the abliterated 27B model on rare occasions. Idk about search assist since I never run that locally but Qwen3 supposedly has great tool calling. Unsloth is a good source for models but check the community notes too, sometimes they fuck up the prompt templates and have to patch in a fix.
>>106649993
>lmao, you need 128gb ram and nvidia gpu, retard
No one gives a shit pajeet, 128GB is a waste when you're stuck on dual channel. Go ask your lobotomized deepseek model what ram bandwidth is, I'll come back tomorrow when it's ready to give me the wrong answer.
>not that you would know when you cant run it
I've run a hundred different quants of deepseek and not just on my own hardware. I've made my own quants of deepseek too. Anything under q4 is unusable and the only reason you don't see it is because you are probably just as retarded.

Anonymous
09/20/25(Sat)23:00:27 No.106651261

Anonymous 09/20/25(Sat)23:00:27 No.106651261

>>106650863
Thanks. I appreciate all the advice. I'll give those three models a try.
>Idk about search assist since I never run that locally but Qwen3 supposedly has great tool calling.
I'll look into it. Most of my prompts are about seeking out and summarising online information so that's a big help if search is well integrated.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.

Janitor applications are now being accepted. Click here to apply.