/g/ - Looking for 120b+ LLM rigs (Alternatives to the DGX Spark - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
Looking for 120b+ LLM rigs (Al(...) 05/03/26(Sun)04:07:08 No.108742720

File: images.jpg (8 KB, 277x182)

Looking for 120b+ LLM rigs (Alternatives to the DGX Spark Anonymous 05/03/26(Sun)04:07:08 No.108742720

I want to run 120b+ models (Llama 3/Nemotron) locally but dropping $4,700 on a DGX Spark seems like peak consumerist brainrot. I'm looking for a real alternative for VRAM heavy setups that doesn't involve paying the "AI Workstation" premium for a fancy case and 128GB of LPDDR5X.

Should I just go the 6x used RTX 3090 route and deal with the 2000W power draw and industrial fan noise in my room or is there a better way to handle p2p without NVLink being a bottleneck? The Mac Studio M3 Ultra with 192GB unified memory is an option but it feels like paying a massive onions tax for slower tokens and being locked into the Apple ecosystem. I’ve also looked at scavenging eBay for used A100s or a refurbished Supermicro server but I don't want to get scammed on dead enterprise silicon.

Anonymous
05/03/26(Sun)04:08:42 No.108742725

Anonymous 05/03/26(Sun)04:08:42 No.108742725

>https://www.amd.com/en/products/processors/consumer/ryzen-ai/ryzen-ai-halo.html
Maybe this will be an option, it's a month away. No clue about price.

Anonymous
05/03/26(Sun)04:08:43 No.108742726

Anonymous 05/03/26(Sun)04:08:43 No.108742726

just rent a cloud api or server

Anonymous
05/03/26(Sun)04:20:05 No.108742767

Anonymous 05/03/26(Sun)04:20:05 No.108742767

>>108742726
No i want to run local models

Anonymous
05/03/26(Sun)04:32:55 No.108742830

Anonymous 05/03/26(Sun)04:32:55 No.108742830

>>108742767
pedo

Anonymous
05/03/26(Sun)04:32:57 No.108742832

Anonymous 05/03/26(Sun)04:32:57 No.108742832

File: 505545dd2f219e82931d92a77(...).gif (353 KB, 500x453)

353 KB GIF

>>108742725
$3,500 minimum, these AI mini PCs are expensive as hell.

Anonymous
05/03/26(Sun)04:47:19 No.108742874

Anonymous 05/03/26(Sun)04:47:19 No.108742874

>>108742830
get a life

Anonymous
05/03/26(Sun)04:51:11 No.108742895

Anonymous 05/03/26(Sun)04:51:11 No.108742895

china has cheap 4090 48gb

Anonymous
05/03/26(Sun)04:58:31 No.108742924

Anonymous 05/03/26(Sun)04:58:31 No.108742924

File: 1777437812869198.jpg (41 KB, 512x384)

41 KB JPG

>>108742895
Not really a 4090 with 48Gb costs 4100$ in chinese sites.

Anonymous
05/03/26(Sun)05:17:48 No.108742987

Anonymous 05/03/26(Sun)05:17:48 No.108742987

>>108742720
The Macs are unironically the best hardware for this use case, and they are well-supported by llama.cpp which is what you'd most likely be using. The only downside is that if you wanted to venture into anything more adventurous than LLM inference you'll be shit out of luck without Nvidia hardware.

Anonymous
05/03/26(Sun)05:18:17 No.108742990

Anonymous 05/03/26(Sun)05:18:17 No.108742990

>let's find out the best alternative to dgx spark because I don't like it and I pretend it's not the best for local llm
I'm tired of this game

Anonymous
05/03/26(Sun)05:39:31 No.108743059

Anonymous 05/03/26(Sun)05:39:31 No.108743059

my uncle works for an ai hardware company and he said he got one of these for free. kinda jelly

Anonymous
05/03/26(Sun)05:39:56 No.108743060

Anonymous 05/03/26(Sun)05:39:56 No.108743060

>>108742987
but Unified Memory bandwidth is slow for 120b models compared to a multi GPU stack. The DGX Spark is much faster for prompt processing and actually lets me run CUDA only tools from GitHub without waiting for Mac ports. I’m mostly worried the Mac becomes a $5k paperweight the moment I want to do more than just basic chat inference.

Anonymous
05/03/26(Sun)05:43:41 No.108743075

Anonymous 05/03/26(Sun)05:43:41 No.108743075

>>108742990
Lol, I never said I didn’t like it I just said it’s too expensive.

Anonymous
05/03/26(Sun)05:53:29 No.108743119

Anonymous 05/03/26(Sun)05:53:29 No.108743119

File: 2917ee45-2bfc-4a50-a2ee-a(...).gif (7 KB, 246x259)

7 KB GIF

>>108743059
Ask him if he's hiring, I'd take a buggy prototype for free too

Anonymous
05/03/26(Sun)06:05:28 No.108743160

Anonymous 05/03/26(Sun)06:05:28 No.108743160

>>108742720
The Spark sucks. I have one from my job for testing, and there's basically nothing I've been able to use it for that wasn't better done with a desktop workstation an discrete GPU.

p.s. if you actually want one for some reason, get the Asus-branded one. It's inexplicable that people are paying more for the nvidia FE version with the same hardware / worse cooling.

Anonymous
05/03/26(Sun)06:07:42 No.108743171

Anonymous 05/03/26(Sun)06:07:42 No.108743171

>>108742990
Gotta kill time when you're unemployed.

Anonymous
05/03/26(Sun)06:13:40 No.108743190

Anonymous 05/03/26(Sun)06:13:40 No.108743190

File: 1777356064251936.jpg (2.23 MB, 5006x3464)

2.23 MB JPG

>>108743160
Yeah that’s kind of what I suspected most of the value seems to just be the form factor + unified memory, not actual throughput.

What are you running on your desktop setup though? My main concern isn’t raw compute, it’s fitting 120B+ models without everything choking on PCIe.

Have you tried multi-3090 (or A100) setups without NVLink? I keep seeing mixed answers on whether tensor parallel over PCIe is actually usable or just a stuttery mess.

Also curious if you’ve found a decent middle ground between “6x 3090 space heater” and “sell kidney for enterprise gear.”

Anonymous
05/03/26(Sun)07:40:11 No.108743580

Anonymous 05/03/26(Sun)07:40:11 No.108743580

Just use an API from Claude or DeepSeek, LOL. Why take the effort and money to run local LLMS? I really don't understand.

Anonymous
05/03/26(Sun)09:00:49 No.108743933

Anonymous 05/03/26(Sun)09:00:49 No.108743933

>>108742720
STOP USING AI

Anonymous
05/03/26(Sun)10:10:53 No.108744312

Anonymous 05/03/26(Sun)10:10:53 No.108744312

>>108743580
this is the stupidest thing i've ever heard

Anonymous
05/03/26(Sun)10:26:24 No.108744401

Anonymous 05/03/26(Sun)10:26:24 No.108744401

>>108742720
>>108742832

I bought a Framework Desktop on launch, Ryzen AI MAX 395+ with 128GB unified RAM for $2300. For that price it was worth but prices are fucked now. Still, here's the current landscape

sub $2k budget:
>Beelink SER10 MAX HX 470

$3k ish
>Framework Desktop or GMKtec EVO-X2 128GB

$4k+
>wait for new M5 Ultra Mac Studios and buy one with as much RAM as you can afford

keep in mind Strix Halo is 256GB/s RAM speed, the new mac studio will probably be over 1,000GB/s (M3 Ultra was 819GB/s). so inference will be a LOT faster, though still slower than a discrete GPU setup (5090 is 1,792GB/s)

imo the discrete GPU route isn't worth it due to the combination of cost, required physical space, and power draw. unless you're using a fuckton of tokens every month

if I had money to blow I'd definitely opt for the upcoming Mac Studio M5 Ultra.

Anonymous
05/03/26(Sun)11:55:13 No.108744987

Anonymous 05/03/26(Sun)11:55:13 No.108744987

>>108743060
>I’m mostly worried the Mac becomes a $5k paperweight the moment I want to do more than just basic chat inference.
you can just sell it
There will still be people buying those in a decade
a bunch of ewaste like 6x 3090, probably not

Anonymous
05/03/26(Sun)11:56:35 No.108744997

Anonymous 05/03/26(Sun)11:56:35 No.108744997

>>108743190
>tensor parallel over PCIe
>3090
just buy some shitbox with 128gb ram, you clearly have no idea what you’re doing

Anonymous
05/03/26(Sun)11:57:11 No.108744998

Anonymous 05/03/26(Sun)11:57:11 No.108744998

>>108743075
Have you considered that running high parameter counts is just always going to be expensive?

Anonymous
05/03/26(Sun)12:01:23 No.108745012

Anonymous 05/03/26(Sun)12:01:23 No.108745012

>>108744998
My midrange desktop that I bought 128gb ram for last year because it was cheap (lol lmao) runs 120b models just fine
The actual expensive stuff begind when you go above 128gb ram + 24 gb vram, because consumer mobo is no longer good enough or you’re blowing a fortune on gous

Anonymous
05/03/26(Sun)12:02:27 No.108745020

Anonymous 05/03/26(Sun)12:02:27 No.108745020

>>108743933
snail cat

Anonymous
05/03/26(Sun)12:38:53 No.108745272

Anonymous 05/03/26(Sun)12:38:53 No.108745272

dgx spark was mocked by the basement 3090ers
how the table had turned

Anonymous
05/03/26(Sun)12:45:18 No.108745317

Anonymous 05/03/26(Sun)12:45:18 No.108745317

>>108744401
You're contemplating the cost of ludicrously speced pcs. If these prices are something you're stressing over you need get some self awareness. They well all be completely obsolete in 5 years, especially the mini pcs.

They should have made the dgx spark fit in a 5.25 bay slot.

Anonymous
05/03/26(Sun)12:52:51 No.108745375

Anonymous 05/03/26(Sun)12:52:51 No.108745375

>>108742720
Use 3x Gmtek strix halo 128gb machines linked via fabric

Anonymous
05/03/26(Sun)14:17:32 No.108745893

Anonymous 05/03/26(Sun)14:17:32 No.108745893

Bump

Anonymous
05/03/26(Sun)14:48:17 No.108746077

Anonymous 05/03/26(Sun)14:48:17 No.108746077

>>108742720
interconnect speed between gpus is basically irrelevant for inference. your gpus will work pretty much independently of each other if you use tensor parallelism

Anonymous
05/03/26(Sun)14:58:17 No.108746138

Anonymous 05/03/26(Sun)14:58:17 No.108746138

>>108742720
A year ago a Xeon scalable with AMX and refurb DDR5, combined with any recent NVIDIA GPU, to run ktransformers.

Now fuck you.

Anonymous
05/03/26(Sun)15:19:43 No.108746297

Anonymous 05/03/26(Sun)15:19:43 No.108746297

What do you need all this stuff for?

Anonymous
05/03/26(Sun)15:35:21 No.108746399

Anonymous 05/03/26(Sun)15:35:21 No.108746399

>>108744997
>ram instead of vram
clearly you are the one who don't know anything about this not the OP

Anonymous
05/03/26(Sun)15:37:41 No.108746427

Anonymous 05/03/26(Sun)15:37:41 No.108746427

>>108742720
What's your usecase?
The recent mid-sized qwen and gemma modesl (27b-35b) models are quite capable and can fit comfortably in 32gb of vram.
Which can be had for ~1,300$ in the Radeon AI pro R7900 if you're near a microcenter. Slightly more annoying than nvidia cards and not as fast as a 5090, but if it's just for inference ROCm or vulkan you can get competitive speeds out of it, two of those cards puts you at 64g of vram for comparable cost to a 5090.

quanted you can make these models fit in 24g, and the MoE models you can offload some to CPU and still get tollerable speed.

Strix Halo was a good deal at launch, but the prices have gone nuts. AMD just announced a first party box, If you want to try it I'd wait and see if that one can be had for more like MSRP.

Mac is a solid choice, but you're locked into it and macos and at the mercy of the community. That said, the community around it is strong, and there's stuff that lands there before it lands in other places (dflash, recent example).

Hate to say it, but X is a good source of info and experiences with this stuff.

A pile of Used 3090's is a popular in the community but you pay for it in power and heat.

Anonymous
05/03/26(Sun)15:38:08 No.108746432

Anonymous 05/03/26(Sun)15:38:08 No.108746432

>>108746297
>>108742720
>I want to run 120b+ models

read

Anonymous
05/03/26(Sun)15:40:44 No.108746448

Anonymous 05/03/26(Sun)15:40:44 No.108746448

>>108746427
i want to run 120b+ models

Anonymous
05/03/26(Sun)15:41:53 No.108746457

Anonymous 05/03/26(Sun)15:41:53 No.108746457

>>108746448
and then?

Anonymous
05/03/26(Sun)15:43:18 No.108746472

Anonymous 05/03/26(Sun)15:43:18 No.108746472

>>108746399
It depends on what he wants it for. CPUMAXXING was considered a legit strategy. you can get tollerable t/s out of a Rome, Milan, or Xeon CPU if you've got all the memory channels populated, and you'll have enough pcie to start putting GPU's in later if you decide you need more later.

>>108746427
R9700* not R7900

>>108746448
Have you tried the 27B dense or 35B A3B? Just "I want this many beaks" is not an answer. are you coding, are you openclawing, are you ai-waifuing? what's your usecase?

Anonymous
05/03/26(Sun)15:44:12 No.108746478

Anonymous 05/03/26(Sun)15:44:12 No.108746478

>>108746457
coding and agentic stuff maybe some stable diffusion

Anonymous
05/03/26(Sun)15:48:05 No.108746521

Anonymous 05/03/26(Sun)15:48:05 No.108746521

>>108746472
>are you coding, are you openclawing, are you ai-waifuing? what's your usecase?
all of them :}

Anonymous
05/03/26(Sun)15:54:19 No.108746566

Anonymous 05/03/26(Sun)15:54:19 No.108746566

>>108746478
Ok.
What kind of agentic stuff?
Genuinely interested.

Anonymous
05/03/26(Sun)16:09:24 No.108746673

Anonymous 05/03/26(Sun)16:09:24 No.108746673

>>108742990
>ARM Linux
Hell-box

Anonymous
05/03/26(Sun)16:09:53 No.108746675

Anonymous 05/03/26(Sun)16:09:53 No.108746675

>>108742924
maybe 4 ur gweilo ass lMAOO

Anonymous
05/03/26(Sun)16:13:40 No.108746700

Anonymous 05/03/26(Sun)16:13:40 No.108746700

>>108746673
DGX spark is probably the best arm linux experience available that's not android.

Anonymous
05/03/26(Sun)16:26:38 No.108746790

Anonymous 05/03/26(Sun)16:26:38 No.108746790

>>108742720
M5 ultra maybe? It supposedly will have 1.2TB/s mem bandwidth and better perf. Not sure how well it will score in PP.
I heard about some tech that helps speed up the prefill, but haven't digged into it.
Perhaps it's best to wait it out until there's decent consumer hardware. Also I think you will need around 512GB for the best models on Q4/5.

Anonymous
05/03/26(Sun)17:36:32 No.108747250

Anonymous 05/03/26(Sun)17:36:32 No.108747250

>>108746675
>maybe 4 ur gweilo ass lMAOO
Saaar!

Anonymous
05/03/26(Sun)17:55:46 No.108747374

Anonymous 05/03/26(Sun)17:55:46 No.108747374

>>108742720
Buy 3 r9700.
If you are poor you can also go with sxm2 cards either 6x16GB or 3x 32GB

Anonymous
05/03/26(Sun)18:03:14 No.108747427

Anonymous 05/03/26(Sun)18:03:14 No.108747427

>>108746457
this retard has no idea what he's doing

Anonymous
05/03/26(Sun)18:51:52 No.108747740

Anonymous 05/03/26(Sun)18:51:52 No.108747740

>>108742720
Unlike graphics cards, you can find good deals for ai mini-pcs on places like ebay. Its worth a look if you're mostly interested in ai.

Anonymous
05/03/26(Sun)20:33:39 No.108748373

Anonymous 05/03/26(Sun)20:33:39 No.108748373

>>108742720
dgx spark is a cuda prototyping machine. if you're not prototyping in the cuda ecosystem before enterprise roll out it's not for you.

maybe check out the strix halo options.

Anonymous
05/03/26(Sun)20:35:34 No.108748383

Anonymous 05/03/26(Sun)20:35:34 No.108748383

>>108743190
A lot of the Nvdia Spark systems is in the networking hardware. If you're just going local you don't need that hardware.

Anonymous
05/04/26(Mon)01:08:43 No.108749668

Anonymous 05/04/26(Mon)01:08:43 No.108749668

if dgx spark can connect a fast dgpu it would be perfect

Anonymous
05/04/26(Mon)06:24:00 No.108750676

Anonymous 05/04/26(Mon)06:24:00 No.108750676

File: 2.png (1.43 MB, 1184x864)

1.43 MB PNG

>>108747427

Anonymous
05/04/26(Mon)06:39:14 No.108750701

Anonymous 05/04/26(Mon)06:39:14 No.108750701

>>108748383
>A lot of the Nvdia Spark systems is in the networking hardware. If you're just going local you don't need that hardware.

Not really. The networking is mainly for multi node scaling. The real value of DGX Spark is the 128GB unified memory, which is exactly what you need for running 120B models locally.

Anonymous
05/04/26(Mon)08:35:40 No.108751202

Anonymous 05/04/26(Mon)08:35:40 No.108751202

File: file.png (2.18 MB, 1024x1024)

2.18 MB PNG

>>108750701
I would just wait a few years and store muns up like a squirrel till something bespoke, and purpose built for fast high density inference comes along, it's inevitable, the 6xxx nvidia consumer chips are rumored to hit 5.5-6+ k tops. Will be worth the wait. Local llms are dogshit right now even if you can have a pack of them working at code together. To commit to buying now would be foolish due to the coming power step-up even at the low end. And theres also that whole cooking your gpu thing, even with bast case preventative care your burning your hardware.. I'm 4.8 million comfy gens in on my 5070 TI and the system sound output just died on it, only been running gens on it for about 7 months.

Anonymous
05/04/26(Mon)09:55:10 No.108751625

Anonymous 05/04/26(Mon)09:55:10 No.108751625

>>108751202
lmao

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.