HP ZGX Nano G1n (DGX Spark)

[-]

valentt@reddit

what is a better purchase and why is this shit?

[-]

Kubas_inko@reddit

You can get AMD Strix Halo for less than half the price or Mac Studio with 3x faster memory for 300 USD less.

[-]

The Strix doesn't come with a built-in $2000 network switch. As a single unit, sure the strix or the mac might make more sense for inference but these things really shine when you have 2, 4, 8, etc in parallel and it scales incredibly well.

[-]

colin_colout@reddit

ohhh and enjoy using transformers, vllm, or anything requires CUDA. i love my strix halo, but llama.cpp is the only software i can use for inference.

The world still runs on CUDA unfortunately. The HP Spark is a great deal if you're not just token counting and value compatibility with Nvidia libraries.

If you just want to run llama.cpp or ollama inference, look elsewhere though.

[-]

Kubas_inko@reddit

You can run vllm with Vulcan on strix.

[-]

colin_colout@reddit

You can run vllm with Vulcan on strix.

Ok...can you help me understand how? vllm mainline has no vulkan support.

I'm pulling my hair out here... I've heard others on reddit say vllm supports vulkan, but I can't find that anywhere.

Maybe youre confusing it with rocm or HIP implementation, or maybe llama.cpp which has a vulkan backend?

...but good news is vllm rocm supports sooo many models now (gpt-oss and qwen3-next!)

Months ago it was nearly useless unless you like llama2, so I'll walk back _some_ of my compatibility concerns (it's still a huge issue but at least support is trending in the right direction).

[-]

colin_colout@reddit

thanks! just learned this (gonna try it out).

last i tried, i think i was using rocm directly and no modern models were supported.

[-]

bobaburger@reddit

depends on what OP gonna use the box for, if anything that needed CUDA, it's what the price for.

anyway, OP, merry xmas!

the pricing is not much differet from spark, is $200 discount worth it though? :D

[-]

Kubas_inko@reddit

They are posting this on locallama, so I don't expect that.

[-]

stoppableDissolution@reddit

People on locallama also train their models, which is slow but doable on spark and virtually impossible on strix, for example

[-]

Kubas_inko@reddit

Why is it impossible on strix? All training frameworks are only cuda based?

[-]

stoppableDissolution@reddit

Pretty much, yes. You can train on cpu, but its going to take a few eternities.

[-]

bobaburger@reddit

aside from Local LLMs, r/localllama is actually a place where ML/DL enthusiasts without a PhD degree gather talking about ML/DL stuff as well 😁

[-]

MontageKapalua6302@reddit

Can the AMD stans ever stop themselves from chiming in stupidly?

[-]

waiting_for_zban@reddit

I think the DGX sparks are rusting on the shelves. I know very few professional companies (I live near a EU startup zone), and many bought 1 to try following the launch hype, and ended up shelving it somewhere. It's no where practical to what Nvidia claim it to be. Devs who need to work on cuda, already have access to cloud cuda machines. And locally for inference or training, it doesn't make sense on the type of tasks that many requires. Like for edge computing, there is 0 reason to get this over the Thor.

So I am not surprised to see prices fall, and will keep falling.

[-]

Aggravating_Disk_280@reddit

It’s a pain in the ass with arm cpu and a cuda gpu, because some package doesn’t have the right build for the Plattform and all the drivers are working in a container

[-]

aceofspades173@reddit

have you actually worked with these before? nvidia packages and maintains repositories to get vllm inference up and running with just a few commands.

[-]

Aggravating_Disk_280@reddit

Yes I got one from my employer. It’s okay if you just want to spin some (v)LLMs up, but if you want to do some training and needing some older packages it’s a nightmare. Often they only have the Mac arm version build

[-]

Miserable-Dare5090@reddit

Dude, the workbooks suck and are outdated. containers referenced are 3 versions behind for their OWN vllm container. it’s ngreedia at its best. again, check the forums.

It has better PP Than the strix or mac. i can confirm i have all 3. GLM4.5 air slows to a crawl on mac after 45000 tokens (pp 8tkps!!) but stays around 200tkps on the spark.

[-]

KvAk_AKPlaysYT@reddit

Why not halo? Just curious.

[-]

aceofspades173@reddit

made a similar comment above but these have a \~$2000 connect X-7 card built-in which makes them scale really well as you add more. comparing one of these vs one strix halo doesn't make a whole lot of sense for inference. there aren't a ton of software and hardware options to scale strix halo machines together where the spark can network at almost 375GB/s semi-easily between each of them which is just mind boggling if you compare speeds between PCI-e links for GPUs in a consumer setup

[-]