[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


I'm planning to get picrel, will be using it mostly for inference but I'll eventually get into training and development.
Requesting /g/'s opinion on this, I'll greatly appreciate if anons that own this hardware chime in and tell me their experiences. Thanks in advance.
inb4
>anon that's not for inference, it's slower than a 4090
I know memory bandwidth is slower, I'm ok with that
>>
>>108537357
if you're going this far just spend more to future proof yourself
>>
>>108537376
please elaborate on how can I future proof myself
>>
>>108537381
spend more money
>>
>>108537384
yeah but spend more money in what? 2x 5090?
>>
>>108537389
if you are serious about trying to self host LLM's you should look into getting a decommed server, don't bother with this mac mini shit designed to sit in a datacenter for 2 years before becoming irrelevant
>>
>>108537461
that's really not different than getting a high end workstation board with a lot of memory so not interested in that, I'm requesting experiences with this device
>>
>>108537357
Last I heard this thing thermal throttles the shit out of itself to the point of being a waste of money.
>>
>>108537479
Interesting, do you know if the ASUS one also thermal throttle?
>>
>>108537357
my understanding of those is that they're for prototyping before rolling out onto enterprise level nvdia based systems so is maximized for that rather than brute force inference power. saying that, nvdia seems to be prioritizing its own products rather than hardware it sells to vendors, so the price is probably going to be more stable.

i understand the last batch of updates improved it for gaming. might be good.
>>
>>108537496
Thanks for the insight anon. I'm more really interested in the perf per watt of this device compared to the 3090 that I currently have for inference. The fact that I could easily carry this mini pc with me and gen wherever I move is really appealing.
>>
Unless you're already a data center admin that is going to be managing dgx servers or you need this to learn the OS for a cert/job then just spend that 4k on tokens.
The spark will be deprecated in a year when rubin comes out and you can always have the latest model/hardware in the cloud instead.

Unless you're going to be running it for the next 2 years straight 24/7 (you won't) then it's not economical.
>>
>>108537516
>perf per watt
wel i understand they're arm systems with a custom linux distro stuck on top? i am not sure how much power that will save with a fucking blackwell welded into it but for what it's worth anyway.

in case you missed it
https://www.tomshardware.com/pc-components/gpus/nvidia-dgx-spark-review
>>
>>108537523
>suggesting to buy AI tokens
opinion discarded
>>108537607
>they're arm systems with a custom linux distro stuck on top?
It's plain ubuntu with nvidia drivers built in and likely some other library / optimizations added
Thanks for the article, this idles at 30W and 160W under GPU load which seems to be related to the high speed NICs it has onboard, still that's half the power consumption of my 3090
>>
>>108537649
>still that's half the power consumption of my 3090
well then you're winning. apparently it comes with all the software to network your generations with your x86 as well.
>>
>>108537649
It only has 128gb of ram, any model you could fit on it will be cheap as fuck and last forever if you're paying for the tokens.
>>
>>108537357
i bought 2x asus ascent gx10 boxes
haven't set them up yet bc i've been dealing with migraines, but i will probably post about them in a week or two once i finally get them all set up
>>
>>108537357
is there ever a reason to actually use this or the strix halo things for local AI over a macbook pro with 5x the unified ram?
>>
>>108537703
please create a thread anon with your observations anon, let me know any keywords you plan to use so I can monitor.

>>108537721
porn sloppa gen
>>
>>108537721
Not unless you need it specifically for work
>>
>>108537721
cuda. the dgx spark is meant as a workstation for prototyping enterprise shit that uses the nvdia ecosystem. it comes with a lot of networking hardware and software to facillitate that usecase.
>>
>>108537721
oops I meant mac studio, forgot the macbooks only go to 128gb too apparently, but yeah the studio seems like the direct competitor and if you want to run the big models like deepseek/kimi/glm it'd be one of those vs 4+ anything else, assuming you're going for these low power all in one devices
>>
>>108537743
that makes sense, I guess you have to go nvidia for a lot of use cases outside of standard inference
>>
>>108537726
i'll probably just make sure "asus ascent gx10" is in the thread OP
most likely it will be the weekend of the 10th, or if not then, the weekend of the 17th
>>
>>108537761
yeah pretty much. even if all this shit falls apart nvdia would still be standing as a major software vendor by that alone. their software runs in most non-ai data centers as well and it all comes with long term support contracts. jensen isn't stupid.
>>
>>108537770
thanks anon, appreciate it
>>108537687
I don't care, I still want to run locally even if it's expensive
>>
>>108537748
>>108537761
basically if this means something to you then you're in the pimps know crowd
>onboard ConnectX 7 NIC running at up to 200 Gbps.
>>
>>108537865
not only that but it comes with 10Gbit copper networking as well
>>
Not to go too off topic, but I was looking into an entry level solution before I go for a more expensive solution like the DGX Spark. I've been running some small LLMs locally on different 8 to 12vram GPUs (Nvidia and AMD) and want something a bit better. Basically it's either buy a cheaper ($1000-$1600) AMD or Mac box with unified memory for VRAM maxxing or a used 3090 on ebay. Mostly use LLMs for logic questions, translating, and code, but would like to be able to train it and have it use tools for web searching or random shit I think of.
>>
>>108537726
I have a Spark, currently running some 15 services in a mix of GPU and traditional home server applications.
No complaints - it’s small, quiet, ideal to run 24/7 at home as a personal server.
On AI, I have a fine tuned llama3 LLM that knows a lot about my personal finances and provides investment advisory. Also ComfyUI for some elaborate, custom pr0n.
The power hog loud old x86 tower is now a gaming machine that is only on when running a game.
>>
>>108537990
Thank you anon, how's the performance from your perspective? (subjective), does it feel snappy?
>>
>>108537357
https://old.reddit.com/r/LocalLLaMA/comments/1scf1x8/dont_buy_the_dgx_spark_nvfp4_still_missing_after/
>>
>>108538803
seems NVFP4 works on the Spark
https://build.nvidia.com/spark/nvfp4-quantization/overview
>>
>>108537357
I have the Jetson nano. It works but the last version of Linux 4 Tegra they officially support is Ubuntu 20.04. You have to go with Armbian or unofficial images to be able to run 24.04 or later.

From what I read online the DGX spark situation seems a bit better but Nvidia's tooling is absolute dogshit and I'm certain you're going to have a little bit of a bad time. Watch out.
>>
>>108542257
thanks anon, interesting feedback, is the nano fully functional with armbian / unofficial images?
>>
>>108542597
It is functional but you're going to miss out on some modern software. For example I was unable to run plezy on it because the GLIBC it supports is too old.
>>
I have the Spark (Asus GX10 version). I can't recommend it, but it's hard for me to judge fairly because I have a TR workstation with 384GB.

My company bought the Spark for testing, but it was too slow for the particular application we were working on, and now it's idle. I thought it might be good to use it to run local agents that would test code on my workstation, leaving the workstation's memory available, but I haven't been able to find any useful models in the 128GB class. The focus is on either much smaller models that would work on a regular GPU, or much larger ones that want a cluster.

I also have a Strix Halo board that I'd take over the Spark, although it's not very good either. It's somewhat slower than Spark, especially PP, but they're both slow enough that it's hard to care. But with Strix Halo, it's much cheaper, can have a faster GPU added rather easily, and is x86 (the ARM processor in the Spark complicates things).
>>
>>108537357
I have a gx10 and it works great. This thing is essentially a 5070 with big vram. I've use it for testing stuff like flux2 dev and multi agent. If I have the money I would definitely buy 2.
As for power efficiency, there's not much you can do to undervolt or power cap. The only config worth using is the power mizer mode (something like that I forgot the name) which lets the gpu frequency drop when idle (default is max when idle).
An underrated use is to run multiple services together. If you rarely run them simultaneously then spark is a good fit because models are already in vram. MoE runs great, too.
And no, I tried overclock the vram but it's actually system ram so you can't overclock with nvidia-smi. BIOS also doesn't expose vram or system ram overclocking settings.
>>
>>108542883
I don't understand, how is it missing modern software if it's using an unofficial/alternative OS? Or is it locked to a particular ligc / kernel version and it doesn't work with anything else?
>>
File: maxresdefault.jpg (94 KB, 1280x720)
94 KB
94 KB JPG
>>108537357
>Requesting /g/'s opinion on this
idk about DGX Spark but I have DJI Spark
>>
>>108543015
I see, yours a particular scenario but interesting for sure. I wish I could just get an AMD and be done with it.
>>108543107
Interesting, will take a not of the frequency. Fucking nvidia makes it so hard to finetune their hardware
>>108543245
>DJI Spark
Sounds jeet coded anon, I'm sorry
>>
I suppose if you don't do training then mac studio + nvidia egpu is an option worth considering
>>
>>108543228
NVIDIA officially supports it up to ubuntu 18.04 (!). I found prebuilt images for 20.04, and if you want a 24.04 image you have to build it yourself.

GLIBC on the 20.04 image is ancient and thus I can't compile the dependencies modern plezy uses. You need to upgrade the OS to do that.
>>
>>108543353
thinking you'll need cross compiling, interesting
>>
>>108537357
Spark owner here as well. I got the Lenovo variant. I used it mainly to build processing pipelines and for experimenting. The 128gb of memory allow you to load multiple models concurrently, but it’s a tight squeeze and the unified memory is somewhat of a foot-gun if you don’t size properly.
>>
>>108543373
I actually just failed to build the 24.04 image due to some limitations. Trying to use this: https://github.com/pythops/jetson-image

It seems it's ubuntu 20.04 tops. It's pretty much ewaste now, it wouldn't be if Manjaro ARM was available.
>>
>>108537389
just buy a superpod
>>
>>108543683
are you sure your board is orin nano? I just built the 24.04 image successfully out of curiosity
>>
>>108544233
I have the regular B01 jetson nano. No good.
>>
At $3000 price tag it's hard to beat its value.
>>
completely pointless product, waste of ram
>>
>>108549789
not all of us are gaymers anon



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.