I'm planning to get picrel, will be using it mostly for inference but I'll eventually get into training and development.Requesting /g/'s opinion on this, I'll greatly appreciate if anons that own this hardware chime in and tell me their experiences. Thanks in advance.inb4>anon that's not for inference, it's slower than a 4090I know memory bandwidth is slower, I'm ok with that
>>108537357if you're going this far just spend more to future proof yourself
>>108537376please elaborate on how can I future proof myself
>>108537381spend more money
>>108537384yeah but spend more money in what? 2x 5090?
>>108537389if you are serious about trying to self host LLM's you should look into getting a decommed server, don't bother with this mac mini shit designed to sit in a datacenter for 2 years before becoming irrelevant
>>108537461that's really not different than getting a high end workstation board with a lot of memory so not interested in that, I'm requesting experiences with this device
>>108537357Last I heard this thing thermal throttles the shit out of itself to the point of being a waste of money.
>>108537479Interesting, do you know if the ASUS one also thermal throttle?
>>108537357my understanding of those is that they're for prototyping before rolling out onto enterprise level nvdia based systems so is maximized for that rather than brute force inference power. saying that, nvdia seems to be prioritizing its own products rather than hardware it sells to vendors, so the price is probably going to be more stable.i understand the last batch of updates improved it for gaming. might be good.
>>108537496Thanks for the insight anon. I'm more really interested in the perf per watt of this device compared to the 3090 that I currently have for inference. The fact that I could easily carry this mini pc with me and gen wherever I move is really appealing.
Unless you're already a data center admin that is going to be managing dgx servers or you need this to learn the OS for a cert/job then just spend that 4k on tokens. The spark will be deprecated in a year when rubin comes out and you can always have the latest model/hardware in the cloud instead.Unless you're going to be running it for the next 2 years straight 24/7 (you won't) then it's not economical.
>>108537516>perf per wattwel i understand they're arm systems with a custom linux distro stuck on top? i am not sure how much power that will save with a fucking blackwell welded into it but for what it's worth anyway.in case you missed ithttps://www.tomshardware.com/pc-components/gpus/nvidia-dgx-spark-review
>>108537523>suggesting to buy AI tokensopinion discarded>>108537607>they're arm systems with a custom linux distro stuck on top? It's plain ubuntu with nvidia drivers built in and likely some other library / optimizations addedThanks for the article, this idles at 30W and 160W under GPU load which seems to be related to the high speed NICs it has onboard, still that's half the power consumption of my 3090
>>108537649>still that's half the power consumption of my 3090well then you're winning. apparently it comes with all the software to network your generations with your x86 as well.
>>108537649It only has 128gb of ram, any model you could fit on it will be cheap as fuck and last forever if you're paying for the tokens.
>>108537357i bought 2x asus ascent gx10 boxeshaven't set them up yet bc i've been dealing with migraines, but i will probably post about them in a week or two once i finally get them all set up
>>108537357is there ever a reason to actually use this or the strix halo things for local AI over a macbook pro with 5x the unified ram?
>>108537703please create a thread anon with your observations anon, let me know any keywords you plan to use so I can monitor.>>108537721porn sloppa gen
>>108537721Not unless you need it specifically for work
>>108537721cuda. the dgx spark is meant as a workstation for prototyping enterprise shit that uses the nvdia ecosystem. it comes with a lot of networking hardware and software to facillitate that usecase.
>>108537721oops I meant mac studio, forgot the macbooks only go to 128gb too apparently, but yeah the studio seems like the direct competitor and if you want to run the big models like deepseek/kimi/glm it'd be one of those vs 4+ anything else, assuming you're going for these low power all in one devices
>>108537743that makes sense, I guess you have to go nvidia for a lot of use cases outside of standard inference
>>108537726i'll probably just make sure "asus ascent gx10" is in the thread OPmost likely it will be the weekend of the 10th, or if not then, the weekend of the 17th
>>108537761yeah pretty much. even if all this shit falls apart nvdia would still be standing as a major software vendor by that alone. their software runs in most non-ai data centers as well and it all comes with long term support contracts. jensen isn't stupid.
>>108537770thanks anon, appreciate it>>108537687I don't care, I still want to run locally even if it's expensive
>>108537748>>108537761basically if this means something to you then you're in the pimps know crowd>onboard ConnectX 7 NIC running at up to 200 Gbps.
>>108537865not only that but it comes with 10Gbit copper networking as well
Not to go too off topic, but I was looking into an entry level solution before I go for a more expensive solution like the DGX Spark. I've been running some small LLMs locally on different 8 to 12vram GPUs (Nvidia and AMD) and want something a bit better. Basically it's either buy a cheaper ($1000-$1600) AMD or Mac box with unified memory for VRAM maxxing or a used 3090 on ebay. Mostly use LLMs for logic questions, translating, and code, but would like to be able to train it and have it use tools for web searching or random shit I think of.
>>108537726I have a Spark, currently running some 15 services in a mix of GPU and traditional home server applications.No complaints - it’s small, quiet, ideal to run 24/7 at home as a personal server.On AI, I have a fine tuned llama3 LLM that knows a lot about my personal finances and provides investment advisory. Also ComfyUI for some elaborate, custom pr0n.The power hog loud old x86 tower is now a gaming machine that is only on when running a game.
>>108537990Thank you anon, how's the performance from your perspective? (subjective), does it feel snappy?
>>108537357https://old.reddit.com/r/LocalLLaMA/comments/1scf1x8/dont_buy_the_dgx_spark_nvfp4_still_missing_after/
>>108538803seems NVFP4 works on the Sparkhttps://build.nvidia.com/spark/nvfp4-quantization/overview
>>108537357I have the Jetson nano. It works but the last version of Linux 4 Tegra they officially support is Ubuntu 20.04. You have to go with Armbian or unofficial images to be able to run 24.04 or later.From what I read online the DGX spark situation seems a bit better but Nvidia's tooling is absolute dogshit and I'm certain you're going to have a little bit of a bad time. Watch out.
>>108542257thanks anon, interesting feedback, is the nano fully functional with armbian / unofficial images?
>>108542597It is functional but you're going to miss out on some modern software. For example I was unable to run plezy on it because the GLIBC it supports is too old.
I have the Spark (Asus GX10 version). I can't recommend it, but it's hard for me to judge fairly because I have a TR workstation with 384GB.My company bought the Spark for testing, but it was too slow for the particular application we were working on, and now it's idle. I thought it might be good to use it to run local agents that would test code on my workstation, leaving the workstation's memory available, but I haven't been able to find any useful models in the 128GB class. The focus is on either much smaller models that would work on a regular GPU, or much larger ones that want a cluster.I also have a Strix Halo board that I'd take over the Spark, although it's not very good either. It's somewhat slower than Spark, especially PP, but they're both slow enough that it's hard to care. But with Strix Halo, it's much cheaper, can have a faster GPU added rather easily, and is x86 (the ARM processor in the Spark complicates things).
>>108537357I have a gx10 and it works great. This thing is essentially a 5070 with big vram. I've use it for testing stuff like flux2 dev and multi agent. If I have the money I would definitely buy 2.As for power efficiency, there's not much you can do to undervolt or power cap. The only config worth using is the power mizer mode (something like that I forgot the name) which lets the gpu frequency drop when idle (default is max when idle).An underrated use is to run multiple services together. If you rarely run them simultaneously then spark is a good fit because models are already in vram. MoE runs great, too.And no, I tried overclock the vram but it's actually system ram so you can't overclock with nvidia-smi. BIOS also doesn't expose vram or system ram overclocking settings.
>>108542883I don't understand, how is it missing modern software if it's using an unofficial/alternative OS? Or is it locked to a particular ligc / kernel version and it doesn't work with anything else?
>>108537357>Requesting /g/'s opinion on thisidk about DGX Spark but I have DJI Spark
>>108543015I see, yours a particular scenario but interesting for sure. I wish I could just get an AMD and be done with it.>>108543107Interesting, will take a not of the frequency. Fucking nvidia makes it so hard to finetune their hardware>>108543245>DJI SparkSounds jeet coded anon, I'm sorry
I suppose if you don't do training then mac studio + nvidia egpu is an option worth considering
>>108543228NVIDIA officially supports it up to ubuntu 18.04 (!). I found prebuilt images for 20.04, and if you want a 24.04 image you have to build it yourself.GLIBC on the 20.04 image is ancient and thus I can't compile the dependencies modern plezy uses. You need to upgrade the OS to do that.
>>108543353thinking you'll need cross compiling, interesting
>>108537357Spark owner here as well. I got the Lenovo variant. I used it mainly to build processing pipelines and for experimenting. The 128gb of memory allow you to load multiple models concurrently, but it’s a tight squeeze and the unified memory is somewhat of a foot-gun if you don’t size properly.
>>108543373I actually just failed to build the 24.04 image due to some limitations. Trying to use this: https://github.com/pythops/jetson-imageIt seems it's ubuntu 20.04 tops. It's pretty much ewaste now, it wouldn't be if Manjaro ARM was available.
>>108537389just buy a superpod
>>108543683are you sure your board is orin nano? I just built the 24.04 image successfully out of curiosity
>>108544233I have the regular B01 jetson nano. No good.
At $3000 price tag it's hard to beat its value.
completely pointless product, waste of ram
>>108549789not all of us are gaymers anon