[-]

ywis797@reddit

if i have $3,999, i could buy a laptop workstation with 192GB RAM (4 slots), and with RTX 5090 24GB.

[-]

beef-ox@reddit

Unified memory ≠ system RAM

They’re not even remotely close in terms of AI inference speeds.

AMD APU and M-series machines use unified memory architecture, just like the DGX Spark. This is actually a really big deal for AI workloads.

When a model offloads weights to system RAM, inferencing against those weights happens on the CPU.

When the GPU and CPU share the same unified memory, inference happens on the GPU.

A 24GB GPU with 192GB system RAM will be incredibly slow by comparison for any model that exceeds 24GB in size, and faster on models that are below that size. The PCIe-attached GPU can only use VRAM soldered locally on the GPU board during inference.

A system with, say, 128GB unified memory may allow you to address up to 120GB as VRAM, and the GPU has direct access to this space.

Now, here’s where I flip the script on all you fools (just joking around). I have a laptop with a Ryzen 7 APU from three years ago that can run models up to 24GB at around 18-24 t/s and it doesn’t have any AI cores, no tensor cores, no NPU.

TLDR, the DGX Spark is bottlenecked by its memory speed, since they didn’t go with HBM, it is like having an RTX Pro 6000 with a lot more memory. It’s still faster memory than the Strix, and both are waaaaay faster than my laptop. And the M-series are bottlenecked primarily by ecosystem immaturity. You don’t need a brand new impressive AI-first (or AI only) machine if what you’re doing either: a) fits within a small amount of VRAM b) the t/s is already faster than your reading speed

[-]

Loose-Sympathy3746@reddit

There are lots of mini pc type machines with comparable inference speeds for less money. However, the advantage of the Spark is the much higher processing speeds due to the Blackwell chip, and the fact it’s pre loaded with the a robust ai tool set for developers. If you are building AI apps and models it is a good development machine. If all you want is inference speed there are better options.

[-]

beef-ox@reddit

I think there’s a far, far stronger argument to be made about CUDA compatibility.

If you have experience with both AMD and Nvidia for AI, you’ll know using AMD is an uphill battle for a significant percentage of workflows, models, and inference platforms.

[-]

IcyEase@reddit

I think a lot of folks are completely missing the point of the DGX Spark.

This isn't a consumer inference box competing with DIY rigs or Mac Studios. It's a development workstation that shares the same software stack and architecture as NVIDIA's enterprise systems like the GB200 NVL72.

Think about the workflow here: You're building applications that will eventually run on $3M GB200 NVL72 racks (or similar datacenter infrastructure). Do you really want to do your prototyping, debugging, and development work on those production systems? That's insanely expensive and inefficient. Every iteration, every failed experiment, every bug you need to track down - all burning through compute time on enterprise hardware.

The value of the DGX Spark is having a $4K box on your desk that runs the exact same NVIDIA AI stack - same drivers, same frameworks, same tooling, same architecture patterns. You develop and test locally on the Spark with models up to 70B parameters, work out all your bugs and optimization issues, then seamlessly deploy the exact same code to production GB200 systems or cloud instances. Zero surprises, zero "works on my machine" problems.

This is the same philosophy as having a local Kubernetes cluster for development before pushing to production, or running a local database instance before deploying to enterprise systems. The Spark isn't meant to replace production inference infrastructure - it's meant to make developing for that infrastructure vastly more efficient and cost-effective.

If you're just looking to run local LLMs for personal use, yes, obviously there are better value options. But if you're actually developing AI applications that will run on NVIDIA's datacenter platforms, having the same stack on your desk for $4K instead of burning datacenter time is absolutely worth it.

[-]

Aroochacha@reddit

It's not even good at that. You can develop on an actual GB200 for 1.5 to. years for the same price. That point is moot. Especially with docker and code start instances where you can further extend that cloud time.

[-]

IcyEase@reddit

In what world is a GB200 $0.22/hr? I appreciate the counterpoint, but your math doesn't quite work out here.

$4,000 ÷ $42/hr = ~95 hours of GB200 time, not 1.5-2 years. To get even 6 months of 8-hour workdays (about 1,040 hours), you'd need roughly $43,680. For 1.5-2 years, you're looking at $500K-$700K+.

Now, you're absolutely right that with zero-start instances and efficient Docker workflows, you're not paying for 24/7 uptime.

Iteration speed matters. When you're debugging, you're often doing dozens of quick tests - modifying code, rerunning, checking outputs. Even with zero-start instances, you're dealing with:

Spin-up latency (even if it's just minutes) Network round-trip times Upload/download for data and model weights Potential rate limiting or availability issues

With local hardware, your iteration loop is instant. No waiting, no network dependencies, no wondering if your SSH session will drop. Total cost of ownership. If you're doing serious development work - say 4-6 hours daily - you'd hit the $4K cost in just 23-30 days of cloud compute. After that, the Spark is pure savings.

Yes, cloud development absolutely has its place, especially for bursty workloads or occasional testing. But for sustained development work where you need consistent, immediate access? The local hardware math works out.

[-]

corkorbit@reddit

I think you're quite right, but that's not how it was marketed by that man in the black jacket

[-]

PhilosopherSuperb149@reddit

I don't have experience with Halo Strix but man, my Spark runs great. The key is to run models that are 4 bit or especially, nvfp4. I've quantized my own Qwen coder (14B), ran images using SD and Flux. Video with wan 2.2. Currently running oss-gpt:120b and its plenty fast. Faster than I'm gonna read the output. I dunno, this post sounds like FUD

[-]

Tai9ch@reddit

You'd expect it to minimally work and to hopefully work better than trying to run 70B models on a CPU with dual-channel RAM or a GPU with 12GB of VRAM.

The questions are whether it lives up to the marketing and how it compares to other options like Strix Halo, Mac Pro, or just getting a serous video card with 96 or 128 GB of VRAM.

Currently running oss-gpt:120b and its plenty fast.

I just benchmarked a machine I recently built for around $2000 running gpt-oss-120B at 56 tokens/second. That's about the same as I'm seeing reported for the Spark.

Sure, it's "plenty fast". But the Spark performing like that for $4k is kind of crap compared to other options.

[-]

Sfaragdas@reddit

Hi, what machine did you built for around $2000? Can you share specification? Currently I have build under $1k but 16vram on 5060ti in small case c6, Ryzen 5 3600, 16GB RAM. For gpt-OSS-20b is perfect, but now I’m hungry to run oss-120b ;)

[-]

Tai9ch@reddit

Refurb server with 3 Radeon Instinct MI 50's in it, which gives 96GB of VRAM total.

It's great for llama.cpp. Five stars, perfect compatibility.

Compatibility for pretty much anything else is questionable; I think vLLM would work if I had 4 cards, but I haven't gotten a chance to mess with it enough.

[-]

PhilosopherSuperb149@reddit

Good for something like Stable Diffusion? I've got a line on a lot of mi50s, and I need some cheap image gen servers

[-]

Tai9ch@reddit

I haven't managed to get any image generation stuff to work on the MI 50s.

I've only spent a couple hours messing with it so far. There's a lot of stuff I haven't tried (e.g. mixing old versions of libraries with current versions of apps), but certainly just following the instructions to install ComfyUI and then hitting "go" doesn't work.

There's a reason the cards are getting dumped on the secondary market cheap. If AMD was still supporting the MI50 32GB in ROCM 7, they'd sell for significantly more money.

[-]

Sfaragdas@reddit

Nice ;) Thanks for tip ;)

[-]

manrajjj9@reddit

Yeah, for $4k it should definitely outperform a $2k build, especially given the hype. Running large models on subpar hardware is just frustrating, and the value prop needs to be clear. If it can't deliver, folks might start looking elsewhere for better bang for their buck.

[-]

xternocleidomastoide@reddit

The integrated ConnectX was a huge selling point for us at that price.

These are not for enthusiasts with constrained disposable income. But if you are in an org developing for deployment at scale in NVDA back ends, these boxes are a steal for $4K.

[-]

corkorbit@reddit

Because the Spark can plug into your network at native speeds?

[-]

PhilosopherSuperb149@reddit

For me there are other appealing things too. I'm not really weighing in on the price here - just performance. But that connectx7 NIC is like $1000 alone. 20 core CPU and 4TB nvme in a box I can throw in my backpack, runs silent... its pretty decent.

I advise a few different ceos on AI, and they are expressing a lot of interest in a standalone, private, on prem desktop assistant that they can chat with, travel with and not violate their SOC2 compliance rules, etc.

[-]

Eugr@reddit

prompt processing is much higher on Spark though...

[-]

Serprotease@reddit

FUD…. … It’s an underwhelming piece of hardware, not a speculative investment to be scalped/flipped.

[-]

m31317015@reddit

It should've been expected since the day it was announced and be doubted the moment the slides were leaked.

- It's a first gen product
- It's cooling design is purely aesthetics (like early gens macbook). It's quiet but toasty.
- They (IMO) definitely delayed the product on purpose to avoid heads-on collision with Strix Halo.
- 3 3090s costs around $1800-2800 and will still be better than the spark in TG because of the bandwidth issue. It's more power hungry but if you need the performance the choice is there.
- There's little hope 1 PFLOPS is going to show up on something with 273 GB/s memory bandwidth. It's not practical when you can simply raise it up like 70% and get much better results.
- One possible way it could get 1PFLOPS could be model optimizations for NVFP4, but that's for the future.

There is no bad news. The "bad news" was always the news, it's just some people that are too blind to see.
Plus making a proprietary format that requires training from scratch to have better performance on a first gen machine, this idea alone to me is already crazy.

[-]

Kutoru@reddit

It is 1 PFLOP on the rough equivalent of 4T/s (x16) memory bandwidth for compute intensity calculations, which more than maps out.

The 30 TFLOPs on FP32 is more than enough for 273 GB/s.

Unless it's only for solo inference which generally is not compute intensive anyway.

[-]

m31317015@reddit

I mean it's marketed as a "personal supercomputer", hints it's a "developer kit for applications on DGX". Judging on these two use cases I can more than confident to say that it targets solo inference.

I agree 30 TFLOPs on FP32 is enough for 273 GB/s, that's why it feels so lacking though. It's fucking $3k+, and for that two unit which people may think are worth it for the 200G QSFP, I'd rather get a PRO 6000 at that point, Max-Q or downclock if power consumption is a concern.

[-]

Kutoru@reddit

Inference is a strict subset of training. There's no way it can not target inference.

If the choice is between 2x DGX Spark and a RTX PRO 6000 then an individual should get a PRO 6000.

If it's between 2 people getting 1 DGX Spark and 1 person getting a 6000 (or 4 people sharing a 24GB MIG GPU), it depends on the circumstances, but if the end result is the cloud then right now it's heavily tilted towards the 2 DGX Spark.

[-]

bethzur@reddit

I got one, but I haven't opened it yet. I think I'm just going to return it. $4K is a lot for a mediocre product.

[-]

atape_1@reddit

Jesus, not only it costs twice what the AMD version does, it's also has half the claimed performance (probably due to inadequate cooling?)

[-]

Insomniac55555@reddit

Which AMD version are you talking about ? It will be super helpful for me, thanks!

[-]

BogoTop@reddit

Framework Desktop is 1999 USD at 128GB and Ryzen AI Max+ 395, might be this one

[-]

hw999@reddit

Framework supports fascists. Maybe consider a more ethical company.

[-]

townofsalemfangay@reddit

r/LocalLLaMA please keep the discussion respectful without veering into politics.

[-]

Insomniac55555@reddit

Thanks. Actually I am in Europe and primarily a Mac user but for some specific development work that involves x64 dlls, I am bound to intel now.

So, I thought of buying an intel pc that can be used for running LLM for future so I short listed:

GMKtec EVO-T1 Intel Core Ultra 9 285H AI Mini PC. This is inferior to the one you recommended but I am thinking with eGPU in future sometime can really help. Any guidance on this is deeply appreciated!

[-]

Dramatic_Entry_3830@reddit

Framework is from the Netherlands

[-]

fallingdowndizzyvr@reddit

Ah... what? Framework is American. Based in SF.

"Headquarters San Francisco, California , United States"

https://en.wikipedia.org/wiki/Framework_Computer

[-]

Dramatic_Entry_3830@reddit

My bad in the frame.work website from Europe Impressum states: Company: Framework® Computer BV Headquarters Address:

Flight Forum 40 Ground Floor, 5657 DB Eindhoven, The Netherlands

E-Mail: support@frame.work
Managing Director: Nirav Patel
VAT number (EU): NL864030204B01

Trademark: Framework®, Registered in U.S. Patent and Trademark Office

Maybe that's just their EU office then

[-]

petuman@reddit

Mixed them up with Fairphone?

[-]

Rich_Repeat_22@reddit

"work that involves x64 dlls"

huh? Intel is using AMD64 for the x86-64 support last 20 years not their own Itanium.

[-]

droptableadventures@reddit

Itanium is ia64, never heard of someone calling that x64.

[-]

Rich_Repeat_22@reddit

Which is why makes no sense why been restricted to Intel for x64 workloads, when actually AMD64 is basically what Intel is using. 🤔

[-]

droptableadventures@reddit

I'm not sure what you mean, x64 is the same thing as x86-64 and amd64.

[-]

Rich_Repeat_22@reddit

Yes which makes no sense why need exclusively Intel to develop for x64.

[-]

droptableadventures@reddit

Oh, you're saying /u/Insomniac55555 should also consider an AMD PC, not just Intel. I see.

[-]

bitzap_sr@reddit

x64 is what Microsoft/Windows calls x86-64/AMD64.

[-]

wen_mars@reddit

x64 is both Intel and AMD.

[-]

Zyj@reddit

Is it? When ordering it in Germany it‘s like 2600€ or more and you‘ll have to wait a long time.

[-]

ghostopera@reddit

They likely mean AMD's Halo Strix platform. The Ryzen AI MAX+ 395 is their current top of the line for that IIRC.

There are several mini-pcs with this with up to 128gb of ram and such.

I've been considering the Minisforum MS-S1 Max personally. But there are several manufacturers making them. Framework, Geekom, etc.

Pretty sick little systems!

[-]

nmrk@reddit

Check out the MS-02 that was just shown at a trade show in Japan.

[-]

ghostopera@reddit

Ah, I saw that the other day! It does look interesting, though very much a different beast from the MS-S1. It's also an intel system, and with much lower memory bandwidth. Could be a nice little desktop system though, assuming it can keep a GPU cool enough :).

[-]

nmrk@reddit

Three PCIE slots but only 350w power supply. I am awaiting the first power supply hacks to run a high end GPU.

[-]

Ravere@reddit

The USB 4 v2 (80Gb/s) port that supports thunderbolt 5 really makes the Minisforum version the most tempting to me.

[-]

Zyj@reddit

The Bosgame M5 also has 2x USB4

[-]

perelmanych@reddit

What are you going to use it for, external video card?

[-]

Insomniac55555@reddit

Actually I am confined to intel based pc due to some development work so I can’t go for AMD.

[-]

SuperMazziveH3r0@reddit

DGX isn't a GPU so it's not something you are looking for anyways.

[-]

VegaKH@reddit

AMD Ryzen AI Max+ 395

[-]

pCute_SC2@reddit

Strix Halo

[-]

fallingdowndizzyvr@reddit

It's actually more like 50% more than a AMD Max+ 395. You have to get the "low spec" version of the Spark though. That being a 1TB PCIe 4 SSD instead of a 4TB PCIe SSD. Considering that domr 4TB SSDs have been available for around $200 lately, I think that downgrade is worth saving $1000. So the 1TB SSD model of the Spark is only $3000.

[-]

Freonr2@reddit

Bosgame 395 is $1839 with a 2TB drive

https://www.bosgamepc.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395

[-]

arko_lekda@reddit

> Bosgame M5

Very poor naming choice, considering that it's going to compete with the Apple M5.

[-]

terminoid_@reddit

it's either a poor naming choice or a genius one

[-]

PersonOfDisinterest9@reddit

It's enough to make me not even look into if it's worth buying.

I can't tolerate a company that appears to be trying to confuse people or trick the careless into buying their thing.
The "trick grandma into buying our stuff for the grandkids" marketing strategy is heinous.

[-]

muntaxitome@reddit

It's just a letter and a number, Just in the 80s alone both DEC, Olivetti and Acorn had M[number] series of devices.

Bossgame also have a p3 and a b95. Probably just a coincidence. Apple already tried to take the trademark for the name of the most popular fruit across industries. You want to give them a letter of the alphabet too? I know they tried with 'i' already.

Apple should just use more distinct naming if they don't want to collide with other manufacturers.

[-]

PersonOfDisinterest9@reddit

I would prefer that all of the companies give unique names to their products.

Don't even get me started on Apple, I'll be here all day ranting about their bullshit.

I don't have to like Apple or their garbage practices, to hate all the companies that ride Apple coattails and flood the market with products that are clearly biting off Apple.

[-]

muntaxitome@reddit

Bosgame M5 AI Mini Desktop Ryzen AI Max+ 395 96GB/128GB+2TB

Is the title of the product. Sorry but nobody in the market for a 2000 dollar mini pc will be confused here. It starts with the brand 'bosgame', it says 'Ryzen AI Max', it looks completely different. Like this is just not confusing? It's not like it's called Appl Max Mini M5

[-]

PersonOfDisinterest9@reddit

The "trick grandma into buying our stuff for the grandkids" marketing strategy.

It's literally that. It's marketing garbage so that the ignorant are fooled into associating their stuff with the big brand stuff.

[-]

muntaxitome@reddit

I am sorry to hear you were so confused thinking that 'Bosgame M5 AI Mini Desktop Ryzen AI Max+ 395 96GB/128GB+2TB' was the same device as a Mac Mini M4.

[-]

PersonOfDisinterest9@reddit

I'm sorry to hear that you are emotionally attached to corpo garbage that you feel the need to defend their dubious honor.

[-]

muntaxitome@reddit

Corporate garbage? Of the massive megacorp 'bosgame'?

I will defend the right of people to use the letter M in their product name despite apple also having a product that uses the letter M.

IBM has had an M5 server line since the 1960s and to this day. BMW has M5 cars. Why are apple famboys not angry at apple over that infringement but then when apple uses another letter of the alphabet they think apple owns it?

There is no confusion. Ryzen is right in the title.

[-]

PersonOfDisinterest9@reddit

The IBM thing is a server, and the car is a car. They are all idiotic corpo names.

Bosgame is particularly heinous because they are a garbage company biting off a bigger company.

[-]

muntaxitome@reddit

At least those exist, there is no M5 apple device

[-]

Standard-Potential-6@reddit

A coincidence probably, but one that Apple benefits from as a giant company with more mindshare, due to the attitude of people like the one you’re replying to. I think different regions may also have different reactions to matching brand names.

[-]

Frankie_T9000@reddit

Or the bmw M5

[-]

Charming_Support726@reddit

There is already a Apple Ultra lookalike from Beelink called GTR5. I ordered one, but sent it back because of brand specific hardware issues of the board. You might encounter discussions about on reddit as well.

As a replacement I ordered a Bosgame M5, which does look like a gamers unit and works perfectly well. Nice little workstation for programming, office, ai-research. Also runs Steam/Proton well under ubuntu.

[-]

kyralfie@reddit

Can you tell me more about specific issues with it? I'm considering it because of its 2x10GB/s networking.

[-]

Charming_Support726@reddit

Sorry, still early here. Yes I am referring to the GTR9 Pro.

I was really unhappy because I encountered constant crashes. Here you find a report from a blog, about what was going wrong, but AFAIK there is no final solution to it and Beelink had set the device "out of stock" https://craigwilson.blog/post/2025/2025-09-25-beelink395bsod/

This issue seems a HW issue to me because the GbE mostly are crashing under load. There were also some confusion with the BIOS options - some prevent disabling iommu.

I personally think device based on the sixunited mainboard like the Bosgame M5 and others are a much safer choice.

[-]

kyralfie@reddit

Wow, thanks, that's huge and a dealbreaker. I'm glad I just stumbled upon your comment. Thanks again for mentioning it. I'll dig into it myself now. It genuinely looked like the best option for me. And ngl I liked the Mac mini ripoff design as well.

[-]

Charming_Support726@reddit

Have a look here : https://strixhalo-homelab.d7.wtf/

And here: https://github.com/kyuz0/amd-strix-halo-toolboxes

[-]

fallingdowndizzyvr@reddit

It was $1699 2 weeks ago. That doesn't change the fact the Spark is "more like 50% more" rather than "costs twice".

[-]

Zyj@reddit

But if you‘re in the EU it‘s now shipped locally. Thus still 170€ cheaper than previously.

[-]

Freonr2@reddit

Is anyone actually shipping a $3k Spark yet?

[-]

fallingdowndizzyvr@reddit

Yeah. Here are a couple of places you can buy it right now.

https://www.cdw.com/product/asus-ascent-gx10-personal-ai-supercomputer-with-nvidia-gb10-grace-blackwell/8534235

https://www.centralcomputer.com/asus-ascent-gx10-personal-ai-supercomputer-with-nvidia-gb10-grace-blackwell-superchip-128gb-unified-lpddr5x-memory-1tb-pcie.html

[-]

Freonr2@reddit

Fair enough, thanks for the links.

[-]

Educational_Sun_8813@reddit

and you will not be able to easily expand hard drive, since nvidia screw everywhone with custom not standard size of nvme xD while on strix halo you can easily fit 8TB, and R/W performance is faster (around 4.8GB/s) than on dgx spark (compared with framework desktop with samsung 990pro drive, and you can fit two of them)

[-]

Nice_Grapefruit_7850@reddit

Why would the ssd speed matter so much once it's loaded into ram?

[-]

wen_mars@reddit

It wouldn't. Nobody is claiming that.

[-]

night0x63@reddit

My day one opinion:

half performance because non sparse ( the numbers are for sparse processing... No one does that).
Half again because most do FP8 processing

But I didn't want to rain on my coworkers claiming it's best thing since sliced bread

So I didn't email him with that

[-]

Status_Contest39@reddit

this box is useless, the bandwidth of VRAM is the bottleneck, only RTX 3050 level, prefill performance sucks.

[-]

paphnutius@reddit

What would be the best solution for running models that don't fit into 32gb VRAM locally? I would be very interested in faster/cheaper alternatives.

[-]

Silver_Jaguar_24@reddit

It's been a gimmick all along. Supercomputer in a box my a$$

[-]

xxPoLyGLoTxx@reddit

This is an extreme case of schadenfreude for me. Nvidia has seen astronomical growth in their stock and GPUs over the last 5 years. They have completely dominated the market and charge outrageous prices for their GPUs.

When it comes to building a standalone AI product, which should be something they should absolutely crush out of the park, they failed miserably.

Don’t buy this product. Do not support companies that overcharge and underdeliver. Their monopoly needs to die.

[-]

asfsdgwe35r3asfdas23@reddit

This is not “an AI product”. It is meant to be a development kit for their Grace supercomputers. Although since it has a lot of VRAM it has created a lot of hype. That is exactly why Nvidia has nerfed it in every way possible to make it is useless as the could for inference and training. Why would they launch a $3K product that compete with their $10K GPUs that sell like hot cakes?

[-]

Tarekun@reddit

Nvidia marketed this as a personal AI supercomputer from the very first presentation. This is not something the public came up with only because it had a lot of memory

[-]

ibhoot@reddit

Have to disagree, 128GB VRAM is not alot in the AI space, for a Dev box I think DGX substandard. £3k for 128GB is crap, AMD Halo can be had for £2k or under. People might point to the performance, performance means little when you have to go lower quants. 192GB or 256GB should of been the minimum at £2.5k price point. Right now I'd go for Halo 128GB or a pair if I need small AI lab or look at rigging up multiple 3090s depending on cost availability space heat/ventilation. I know DGX stack has the nvidia stack which is great but DGX is a year late in my eyes.

[-]

asfsdgwe35r3asfdas23@reddit

128gb is enough for inference of most models. Sure, you can buy second hand RTX3090 and wire them together. But:

1) No company/university buying department will allow you to buy GPUs from eBay. 2) You need to add the cost of the whole machine, not just the GPUs. 3) You need to find a place in which you can install your +3000Watts behemoth that at peak power is more noisy than a rammstein concert. Also find an outlet which can provide enough power for the machine.

In contract DGX Spark is a tiny and silent computer that you can have in your table.

[-]

xxPoLyGLoTxx@reddit

Right but what you are failing to realize is that for a small form factoryou can get a Ryzen Max AI mini pc or a Mac Studio for better price to performance.

[-]

InDreamsScarabaeus@reddit

But not CUDA development, which is the stated purpose of a DGX dev box

[-]

xxPoLyGLoTxx@reddit

Lol sure. I keep hearing that. Sounds like sour grapes to me.

[-]

asfsdgwe35r3asfdas23@reddit

I am not saying that this is a good product. In fact I am saying that Nvidia nerfed it to force people to buy more expensive GPUs. But I don’t think that you can cómprate a Spark with a RTX3090 cluster.

[-]

ibhoot@reddit

Business is vastly different. DGX are suppose to be personal Dev boxes, tinkering learners. Business wise, what models & quants would a business be happy with & how many instances do you need running concurrently, for any service offering DGX is not going to cut it with 128GB. There might some SMBs where DGX makes sense but as you scale the service as an SMB, would 3k DGX vs 2x Halo 256GB meet your needs based a single unit of deployment? 1k difference in cost? As a business you will want a minimum of 2x for HA so 6k DGX vs 6k vs 3x Halo, at certain price points different option open up. Just think DGX would of be awesome a year ago, now? Not so much. Must admit it does look super cool.

[-]

xxPoLyGLoTxx@reddit

Disagree. It’s marketed as such by Nvidia themselves. You claiming they purposefully “nerfed” it is giving Nvidia too much credit. I think they can clearly make powerful large GPUs but when it comes to a small form factor they are far behind Apple and AMD.

Also, if you recall, they hid the memory bandwidth for a very long time. And now it is clear why. They knew it wouldn’t be competitive.

[-]

fullouterjoin@reddit

NVDA did this 100% on purpose. Why would they make a "cheap" device that would compete with their cash cows? Hell even the 5090 is too cheap.

[-]

xxPoLyGLoTxx@reddit

They spent hours and hours and hundreds of thousands of dollars developing a product that performs poorly…on purpose?

I have to disagree. What actually happened is this is the best they could do with a small form factor. Given their dominance in the field of AI, they assumed it would be the only good option when finally released.

But then they dragged their feet releasing this unit. They hid the memory bandwidth. They relied on marketing. They probably intended to release this long ago and in the meantime apple and AMD crushed it.

It makes no sense to think they spent tons of resources on a product for it to purposefully fail or be subpar.

[-]

ionthruster@reddit

They spent hours and hours and hundreds of thousands of dollars developing a product that performs poorly…on purpose?

It sounds far-fetched, but the Coca-cola company did exactly this "kamikaze" strategy against Crystal Pepsi, developing Tab Clear. Coca-cola intentionally released a horrible product to tarnish a new product category that a competitor is making headway on. They could do this because they were dominating thr more profitable, conventional product category. Unlike Nvid- oh, wait...

[-]

xxPoLyGLoTxx@reddit

Coca-Cola? The company that changed their recipe for no apparent reason and it failed miserably? Companies fuck up sometimes. A lot, actually. And it’s not usually on purpose.

The simpler explanation is they underestimated the competition. Just my opinion. I really don’t think they wanted this to fail. They could have made it much better and still not a threat to their higher-end GPUs.

[-]

ab2377@reddit

they are probably raising couple more 100s of billion dollars to fund openai.

[-]

Ok_Warning2146@reddit

Well, to be fair, when it was announced to be 273GB/s, it is already out of consideration for most people here.

[-]

IrisColt@reddit

Exactly.

[-]

forte-exe@reddit

What tests can be run and what to look for?

[-]

johnkapolos@reddit

Furthermore, if you run it for an extended period, it will overheat and restart.

Le fuk? Long runs is literally the use case.

[-]

Zomboe1@reddit

We're living in the era when the enclosure case matters more than the use case.

[-]

Boring-Ad-5924@reddit

Kinda like 5070 having 4090 performance

[-]

-Akos-@reddit

So far from what I’ve seen in every test is that the whole thing is a letdown, and you are better off with a Strix Halo AMD PC. This box is for developers who have big variants running in datacenters and they want to develop for those systems with as little changes as possible. For anyone else, this is an expensive disappointment.

[-]

SkyFeistyLlama8@reddit

Unless you need Cuda. It's the Nvidia tax all over again, you have to pay up if you want good developer tooling. The price of the Spark would be worth it if you're counting developer time and ease of use; we plebs using it for local inference aren't part of Nvidia's target market.

[-]

thebadslime@reddit

The Halo Strix boxes support ROCM 7.9 which has many improvements. AMD is catching up IMO.

[-]

fallingdowndizzyvr@reddit

ROCm 7.9 doesn't seem to be any different than 7.1 from what I can tell.

[-]

noiserr@reddit

7.9 the development branch. So it's just slightly ahead of whatever the latest (7.1) is.

[-]

fallingdowndizzyvr@reddit

7.1 is also a development branch. 7.0.2 is the release branch.

[-]

MoffKalast@reddit

AMD try not to make confusing version/product number challenge (impossible)

[-]

fallingdowndizzyvr@reddit

The difference is between development of the current major release and that of the next major release. It was also the same with ROCm 6. 6.4.X was still being developed even though 7 preview had been released. So 7.1 is the development of ROCm 7. 7.9 is the preview of ROCm 8.

[-]

MoffKalast@reddit

mfw

[-]

fallingdowndizzyvr@reddit

First came ROCm 7 preview.

"2025-07-24"

https://rocm.docs.amd.com/en/docs-7.0-beta/preview/release.html

Then came ROCm 6.4.4.

"September 24th, 2025"

https://www.amd.com/en/resources/support-articles/release-notes/RN-AMDGPU-LINUX-ROCM-6-4-4.html

One doesn't preclude the other. You have to continue development on the current release to maintain it while you do development on the next release.

[-]

wsippel@reddit

ROCm 7.9 is the development and testing branch for TheRock, the new ROCm build system. It's whatever the current ROCm branch is, just built with TheRock.

[-]

PersonOfDisinterest9@reddit

I'm glad there's finally some kind of progress happening there, but I will be mad at AMD for a long time for sleeping on the decade+ long delay. People had been begging AMD to compete since like 2008, and AMD said "Mmm, nah". All through the bitcoin explosion, and into the AI thing.

Now somehow Apple is the budget king. Apple. Cost effective. Apple.

AMD needs to hurry up.

[-]

uksiev@reddit

ZLUDA exists though, it may not run every workload but the ones that do run, run pretty well with little overhead

[-]

Ok_Income9180@reddit

I’ve tried to use ZLUDA. It covers only a subset of the API and (for some reason) doesn’t include a lot of the calls you need for ML/AI. They seem focused on 3d rendering for some reason. It has been a while since I’ve looked at it though. Maybe this has changed?

[-]

shroddy@reddit

Did you use the "new" version or the old pre-nuke version? Has the new version already caught up to what was lost?

[-]

Flachzange_@reddit

ZLUDA is an interesting project, but its atleast 5 years away from being even somewhat viable.

[-]

Freonr2@reddit

If you have access to HPC I don't know why you need one of these.

You should be able to just use the HPC directly to fuzz your code, Porting from a pair of Sparks to a real DGX powered HPC environment where you have local ranks and global ranks is going to take extra tuning steps anyway.

However, for university labs that cannot afford many $300k DGX boxes along with all the associated power and cooling they're probably perfect. You don't want one student stealing one of your four DGX boxes for 8 hours to fuzz code, but if you have 128 nodes on a major provider it's not a massive deal, you can schedule via Slurm or Ray or whatever, and allowing researchers to use a few nodes to fuzz new code for a few hours at a time is NBD.

[-]

randomfoo2@reddit

Most HPC environments don't give researchers or developers direct access to their nodes/GPUs and use slurm, etc - good for queuing up runs, not good for interactive debugging. I think most dev would use a workstation card (or even a GeForce GPU) to do your dev before throwing reasonably working code over the fence, I could see an argument for the Spark more closely mirroring your DGX cluster setup.

[-]

asfsdgwe35r3asfdas23@reddit

You can launch an interactive slurm job, that opens a terminal and allows you to debug, launch a job multiple times, open a Jupyter notebook… Also almost every HPC system has a testing queue in which you can send short jobs with very high priority.

I would find more annoying having to move all the data from spark to the HPC, create a new virtual environment, etc… than using an interactive slurm job or the debug queue.

[-]

Freonr2@reddit

This all aligns with my experience as well for the most part. Everyone is using VS Code over SSH.

I think some of the reserchers I've worked with before do own consumer GPUs at home, but that's of questionable value.

I can see the spark being great for a post grad working on research who would like to apply for compute grants from HPC providers or commercial partners, they can say they have real experience with FSDP/NCCL and want a HPC compute grant to scale up their model. But I think once you get a job at a real lab with HPC you will just constantly run into issues.

[-]

asfsdgwe35r3asfdas23@reddit

This all aligns with my experience as well for the most part. Everyone is using VS Code over SSH.

Taking into account this, I would love a laptop with a tiny CPU, just enough for VS Code and Chrome, that is extremely thin, weighs like 500gr and has a 3-day battery life. I use my laptop as an SSH machine; I run no code or do any tasks on it.

In my company, everybody is requesting to switch from the MacBook Pro to the MacBook Air.

[-]

randomfoo2@reddit

I'm going to need to complain to my slurm admin lol

[-]

Freonr2@reddit

From first hand experience, this isn't accurate.

You can use srun (instead of sbatch) to reserve instances for debugging.

I think most dev would use a workstation card

Nope.

[-]

Ok_Top9254@reddit

It has 10x FP64 of a 5090 with 1/3 the power dissipation (or 1/6th as in the post).

[-]

Iory1998@reddit

Your point being? Could you please explain to me how 10x FP64 will help us run a model 10x faster than a 5090?

[-]

Ok_Top9254@reddit

Wording. Even though it overheats, it will still help you run 90GB fluid, physics, protein folding, LLM, CV models faster than the fastest consumer gpu, the 5090 and even the workstation cards. It's not a disappointment, it's exactly what you'd expect with that form factor and TDP unless you are delusional.

For all intents and purposes it's a very very cut down version of the GB200 superchip, part of the Blackwell datacenter line for supercomputers, with the same arch, SoC layout, datacenter grade networking and software support, making it a very tiny and weak but still genuine AI supercomputer node that can be clustered.

Strix Halo is a glorified laptop APU that has nothing to with Amd's main Instinct datacenter gpu line. That's it.

[-]

Iory1998@reddit

I understand better now, thank you. It seems that most of us misunderstood the purpose of this machine. But. In our defense, Nvidia's marketing didn't help either; they focused so much on inference capabilities that we simply thought it was an inference machine. To be honest, I was always perplexed to would Nvidia launch a new product that would obviously cannibalize its pro GPUs.

[-]

wen_mars@reddit

It has 10x FP64 of a 5090

Source for this?

Even if true, it's only faster on fp64 workloads. LLMs are bandwidth-bound so 5090 and rtx pro 6000 blackwell will run them much faster.

[-]

Eugr@reddit

It actually has different arch than GB200. For instance, it uses MediaTek CPU and not scaled down Grace CPU...

[-]

Ok_Top9254@reddit

True. The X925 actually has 50% wider SIMD pipeline (6x 128bit vs just 4x 128 bit in the Neoverse v2) so the cores are actually faster than the GB200 ones IPC wise and it has +200MHz extra clock speed on top so it's almost as fast as Core Ultra 285K/R9 9950X.

(but GB200 obviously has way more cache and 72 of them compared to just 10 in Spark)

CPU / System	Single-core	Multi-core	Link
Intel Core Ultra 9 285K	3,262	22,702	Link
NVIDIA DGX Spark (CPU)	3,237	19,794	Link
Apple M3 Ultra (Mac Studio 2025)	3,247	28,169	Link
AMD Ryzen 9 9950X	3,385	21,428	Link

[-]

auradragon1@reddit

It’s not design for local LLM inference for you. It’s designed for CUDA devs.

[-]

Iory1998@reddit

Oh, I see. So, that makes a lot of sense. This means that most reviewers are wrong about it, too, because they kept testing inference with it.

[-]

noiserr@reddit

and you are better off with a Strix Halo AMD PC.

And you actually get a pretty potent PC with all the x86 compatibility.

[-]

vinigrae@reddit

Confirm we received our reservation to purchase the spark but on last minute decided to wait for some more benchmarks and we went with the Strix Halo, no regrets! You will simply run models you couldn’t before locally, at less than half the price of the spark for basically the same performance for general use case.

[-]

Eugr@reddit

I see many people comparing Spark to Strix Halo. I made a post about it a few days ago: https://www.reddit.com/r/LocalLLaMA/comments/1odk11r/strix_halo_vs_dgx_spark_initial_impressions_long/

For LLMs, I'm seeing 2x-3x higher prompt processing speeds compared to Strix Halo and slightly higher token generation speeds. In image generation tasks using fp8 models (ComfyUI), I see around 2x difference with Strix Halo: e.g. default Flux.1 Dev workflow finishes in 98 seconds on Strix Halo with ROCm 7.10-nightly and 34 seconds on Spark (12 seconds on my 4090).

I also think that there is something wrong with NVIDIA supplied Linux kernel, as model loading is much slower under stock DGX OS than Fedora 43 beta, for instance. But then I'm seeing better LLM performance on their kernel, so not sure what's going on there.

[-]

randomfoo2@reddit

For llama.cpp inference, it mostly uses MMA INT8 for projections+MLP (\~70% of MACs?) - this is going to be significantly faster on any Nvidia GPU vs RDNA3 - the Spark should have something like 250 peak INT8 TOPS vs Strix Halo at 60.

For those interested in llama.cpp inference, here's a doc I generated that should give an overview of operations/precisions (along w/ MAC %s) that should be useful at least as a good starting point: https://github.com/lhl/strix-halo-testing/blob/main/llama-cpp-cuda-hip.md

[-]

Eugr@reddit

Thanks, great read!

Any ideas why HIP+rocWMMA degrades so fast with context in llama.cpp, while it performs much better without it (other than on 0 context)? Is it because of bugs in rocWMMA implementation?

Also, you doc covers NVIDIA up to Ada - anything in Blackwell that worth mentioning (other than native FP4 support)?

[-]

randomfoo2@reddit

So actually, yes, the rocWMMA implementation has a number of things that could be improved, I'm about to submit a PR after some cleanup that in my initial testing improves long context pp by 66-96%, and I'm able to get the rocWMMA path to adapt the regular HIP tiling path for tg (+136% performance as 64K on my test model).

The doc is up to date. There is no Blackwell specific codepath. Also, I've tested NVFP4 w/ trt-llm and there is no performance benefit currently: https://github.com/AUGMXNT/speed-benchmarking/tree/main/nvfp4

[-]

Eugr@reddit

Yeah, I've tested TRT-LLM too and got the same impression.

[-]

randomfoo2@reddit

Here's the PR: https://github.com/ggml-org/llama.cpp/pull/16827

[-]

Eugr@reddit

Thanks, great work! Although based on the comments, this entire thing is going to be overhauled soon anyway. But it would be nice to have something in the meantime.

[-]

eleqtriq@reddit

I think the supplied disk might just be slow. I haven't seen a benchmark on it, though, to confirm.

[-]

Eugr@reddit

I provided some disk benchmarks in my linked post above. The disk is pretty fast, and I'm seeing 2x model loading difference from the same SSD and same llama.cpp build (I even used the same binary to rule out compilation issues) on the same hardware. The only difference is that in one case (slower) I'm running DGX OS which is Ubuntu 24.04 with NVIDIA kernel (6.11.0-1016-nvidia), and in another case (faster) I'm running Fedora 43 beta with stock kernel (6.17.4-300.fc43.aarch64).

[-]

eleqtriq@reddit

Very interesting. Thanks.

[-]

constPxl@reddit

[-]

BaseballNRockAndRoll@reddit

Look at that leather jacket. What a cool dude. So relatable.

[-]

EXPATasap@reddit

Lololololololololol fucking heroic!

[-]

Awkward-Candle-4977@reddit

the more you buy,
the more you pay

[-]

SameIsland1168@reddit

The more you pay, the more I shave (Jensen never has a beard)

[-]

Tacx79@reddit

Nvidia always advertises flop performance on sparse computations, dense computation is always half of it. You never* use sparse computations.

* - unless your matrix is full of zeros or it's heavily quantized model with weights full of zeros, you also need to use special datatype to benefit from that, even in torch sparse tensors have barely any support so far

[-]

DarkArtsMastery@reddit

The unit looked like crap from Day #1. Just wake up finally and realize this is nothing but a money grab.

[-]

ManufacturerSilver62@reddit

I really wanted a spark, but thanks for telling me their c***, I'll just buy a 5090 😕. This is honestly really disappointing, as I was totally willing to shell out the 4k for one. Oh, well, I can make one hell of a custom pc for that price too.

[-]

Sarum68@reddit

CES is only a few months away now,, will be interesting if they announce a Spark 2.0... like everything else in life, never good to buy the first production model.

[-]

Dr_Karminski@reddit (OP)

John Carmack's post: https://x.com/ID_AA_Carmack/status/1982831774850748825

[-]

sedition666@reddit

I have just cut and pasted the post so you don't have to visit the Xitter hellscape

DGX Spark appears to be maxing out at only 100 watts power draw, less than half of the rated 240 watts, and it only seems to be delivering about half the quoted performance (assuming 1 PF sparse FP4 = 125 TF dense BF16) . It gets quite hot even at this level, and I saw a report of spontaneous rebooting on a long run, so was it de-rated before launch?

[-]

dogesator@reddit

His assumption is wrong, the FP16 performance he measured is what is expected from Nvidias reported numbers, the blackwell ratio of fp16 to fp4 flops is double of his assumption as confirmed in official nvidia documentation here

[-]

smayonak@reddit

I wonder how they are charging so much for these things if they are only providing half of the advertised performance.

[-]

MoffKalast@reddit

They more people buy, the more performance they save.

[-]

eloquentemu@reddit

less than half of the rated 240 watts

TBF when I tried to figure out what the "rater power draw" was, I noticed nvidia only lists "Power Supply: 240W" so it's obviously not a 240W TDP chip. IMHO it's shady that they don't give a TDP, but it's also silly to assume that the TDP of the chip is more than like 70% of the PSU's output rating.

As an aside, the GB10 seems to be 140W TDP and people have definitely clocked the reported GPU power at 100W (which seems the max for the GPU portion) and total loaded at >200W so I don't think the tweet is referring to system power.

[-]

Moist-Topic-370@reddit

I have recently seen my GB10 GPU at 90 watts while doing video generation. Is the box hot, yes, has it spontaneously rebooted, no.

[-]

NoahFect@reddit

https://xcancel.com/ID_AA_Carmack/status/1982831774850748825

[-]

BetweenThePosts@reddit

Framework is sending him a strix halo box fyi

[-]

Ok_Top9254@reddit

I'm sorry but "I saw" (from someone else) and "it seems" are trash benchmarks. I believe him but at least do something rigorous when proving it, so we see the numbers or screenshots so we see it's not even worse or just misconfiguration.

[-]

theUmo@reddit

I'm glad he shared what he has even if the quality level of his data requires weasel words. Others can follow up with rigor.

[-]

night0x63@reddit

That Twitter also is in line with my opinion... He takes one step further and halves a third time because bf16.

My day one opinion:

half performance because non sparse ( the numbers are for sparse processing... No one does that).
Half again because most do FP8 processing

But I didn't want to rain on my coworkers claiming it's best thing since sliced bread

So I didn't email him with that

[-]

dogesator@reddit

Its disappointing that nearly everyone in the comments is just accepting what this post says at face value without any source.

The reality is that neither Awni or John carmack ever actually tested the FP4 performance, they only tested fp16 and then incorrectly assumed the ratio of FP16 to FP4 for blackwell, but the blackwell documentation itself shows that the FP16 performance figures is what you should expect in the first place, John even acknowledged this documentation in his tweet thread:

[-]

SlapAndFinger@reddit

The thing that kills me is that these boxes could be tweaked slightly to make really good consoles, which would be a really good reason to have local horsepower, and you could even integrate Wii/Kinect like functionality with cameras. Instead we're getting hardware that looks like it was designed to fall back to crypto mining.

[-]

ButterscotchSlight86@reddit

NVidia is more marketing than hardware..5090 ..30% …4090 🙃

[-]

john0201@reddit

Apple should redo the get a Mac comparing but instead of an iMac and a PC it’s the M5 studio and this thing.

[-]

corkorbit@reddit

There is competition in the wings from Huawei and a bunch of outfits building systolic array architectures. Ollama already has PR for Huawei Atlas. If the Chinese are serious about getting into this market segment things could get very interesting.

[-]

adisor19@reddit

So basically when Apple releases the M5 MAX and M5 ULTRA based devices, we will finally have some real competition to Nvidia.

[-]

Cergorach@reddit

Not really. Or more accurately: Depends on what you use it for and how you use it. A M4 Pro already has the same memory bandwidth as the Spark, a 64GB version costs about $2k. The problem is the actual GPU performance, that's not even close to Nvidia GPU performance, not really important with inference, unless you're working with pretty large context windows.

And let's be honest, a 64GB or 128GB solution isn't going to run models anything close to what you can get online. Heck even the 512GB M3 Ultra ($10k) can run the neutered version of DS r1 671b, results are still not as good as what you can get online.

No solution is perfect, speed, price, quality, choose two, and with LLM you might be forced to choose one at the moment... ;)

[-]

Serprotease@reddit

M3 ultra can run glm4.6@q8 and useable speed. It will handle anything below 400b@q8 and decent context - which is a large part of the open source models.

But I agree with your overall statement. There is no perfect solution right now, only trade-offs.

[-]

adisor19@reddit

The key word here is "now". In the near future, once the M5 MAX and M5 ULTRA devices are released, we will have a damn good alternative to the Nvidia stack.

[-]

Serprotease@reddit

You’re positive, that’s a good thing.

MPS is good for Llm inference, but do not have the kind of support CUDA has for the rest. Raw power is good, but sometimes, things just don’t work.

[-]

Cergorach@reddit

There is no perfect solution right now, only trade-offs.

And unless you have a very specific usecase in LLM, buying things like these is madness unless you have oodles of money.

The reason why I bought a MacMini M4 Pro 64GB RAM, 8TB storage is because I needed a Mac (business), I wanted something extremely efficient, I needed a large amount of RAM for VMs (business), that it runs relatively large LLMs in it's unified memory is a bonus, not the main feature.

[-]

tomz17@reddit

Depends on the price... don't expect the Apple options to be *cheap* if they are indeed comparable in any way.

[-]

adisor19@reddit

It depends how you look at cheap. If you compare it with what is available from Nvidia etc, chances are that it will be cheap if the current prices for the M3 ULTRA for example will be pretty much the same as for the M5 ULTRA thought I have some doubts about that seeing that RAM prices have skyrocketed recencently for example.

[-]

tomz17@reddit

will be pretty much the same as for the M5 ULTRA

They will price based on whatever the market will bear.... if the new product is anticipated to have a larger demand due to a wider audience (e.g. local LLM use) and a wider range of applicability then they will price it accordingly.

Apple didn't get to be one of the richest companies on the plant by being a charity. They know how to price things.

[-]

beragis@reddit

I am hoping to see some really extensive reviews of LLMs running on both the M5 Max and M5 Ultra. Assuming prices don't change much, for the same price as the DGX you can get a M5 Max with over 2x the memory bandwidth for the same price and for 1200 to 1500 more you can get an Ultra with 256 GB memory and over 4x the bandwidth.

[-]

voplica@reddit

Did anyone try running ComfyUI with Flux image generation or Wan2.2 video gen or any other similar tasks to see if this machine is usable for these tasks?

[-]

Clear_Structure_@reddit

I is true.🤝Jetson AGX Thor is cheaper and at least around 135TFLOPs FP 16 and memory faster🤟🫶

[-]

nasduia@reddit

Annoyingly I've still not seen a like-for-like benchmark for the Thor vs Spark.

[-]

Zyj@reddit

The Evo-T1 is AI only by name.

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

harrro@reddit

Worst most spamming bot

[-]

TiL_sth@reddit

1 PFlops is with sparsity. Is 480 measured with sparsity? Using numbers with sparsity has been the standard (terrible) way Nvidia reports tflops for generations

[-]

entsnack@reddit

yawn Find me an alternative: small, CUDA, Grace ARM CPU, Blackwell GPU. Not saying it isn’t overpriced, that’s the Nvidia tax (which employers pay).

You’d be silly to buy this if you’re just an inference monkey though.

[-]

aikitoria@reddit

Nvidia probably classified this as a GeForce product, which means it will have an additional 50% penalty to fp8/fp16/bf16 with fp32 accumulate, and then the number is as expected.

[-]

Hambeggar@reddit

That's exactly what's happened. So Nvidia hasn't lied. It does have 1 PF of Sparse FP4 performance. The issue here is that Carmack is extrapolated it's Sparse FP4 performance from BF16 incorrectly...

[-]

DerFreudster@reddit

So people are surprised at this coming from the same guy that told us the 5070 was going to have 4090 performance at $549? I don't understand wtf people are thinking.

[-]

Appropriate-Wing6607@reddit

AI the era of snake oil salesman is upon us. They have to in order to keep the shareholder happy and stocks up.

[-]

Nice_Grapefruit_7850@reddit

Yea that one had a huge grain of salt, they completely ignored how currently frame gen is mostly for smoothing out already good framerates. However in the future when more video games logic is decoupled from the rendering you could use that plus Nvidia reflex and get 120fps responsiveness with only 80fps cost.

[-]

HiddenoO@reddit

Yea that one had a huge grain of salt

No, it was a plain lie. You don't have the same performance just because you interpolate as many frames as it takes to have the same FPS shown in the corner. Performance comparisons in gaming are always about FPS (and related metrics) when generating the same images, and just like with direct image quality settings, you're no longer generating the same images when adding interpolated images.

[-]

R_Duncan@reddit

Benchmarks are out.

395 has similar performances until 4k context, then slows horribly down.

[-]

Hambeggar@reddit

Did Nvidia market it as 1 pflop of fp4, or 1pflop of sparse fp4?

If it's still half under sparse, then...yeah, I dunno what nvidia is doing. How is that not a lie?

[-]

Comrade-Porcupine@reddit

To me the interesting thing about these machines is not necessarily their potential use for LLMs (for which it sounds like.. mixed results) but the fact that outside of a Mac they're the only generally consumer-accessible workstation class (or close to workstation class) Aarch64 computer available on the market.

Apart from power consumption advantages of ARM, there are others ... I've worked at several shops in the last year where we did work targeting embedded ARM64 boards of various kinds, and there are advantages to being able to run the native binary directly on host and "eat your own dogfood."

And so if I was kitting out a shop that was doing that kind of development right now I'd seriously consider putting these on developers desks as general purpose developer workstations.

However, I'll wait for them to drop in price ... a lot ... before buying one for myself.

[-]

MoffKalast@reddit

I hope Jensen Huang fixes this soon.

I demand Jensen sits down and codes the fix himself, I will not accept any other solution! /s

[-]

candre23@reddit

I'm shocked. Shocked!

Well, not that shocked. Turns out that you can't get something for nothing, and "it's just as fast as a real GPU but for a quarter the power!" was a really obvious lie.

[-]

tiendat691@reddit

Oh the price of CUDA

[-]

ScaredyCatUK@reddit

https://www.youtube.com/watch?v=82SyOtc9flA

[-]

kasparZ@reddit

AMD so far is winning in this round. Nvidia may have become complacent.

[-]

mitchins-au@reddit

It’s only bad news if you actually bought one

[-]

tarruda@reddit

Imagine spend $4k on this only to find out you were robbed by the most valuable company in the world.

[-]

SilentLennie@reddit

I think it can do it, it's just bandwidth constrained.

[-]

FrostAutomaton@reddit

Nitpicking:
While a *legendary* programmer, Carmack did not write the fast inverse square root algorithm: https://en.wikipedia.org/wiki/Fast_inverse_square_root. It was likely introduced to Id by a man named Brian Hook.

[-]

Signal_Fuel_7199@reddit

so im buying gpd win 5 max 395 rather than dgx spark

2000 dollar does everything 4000 can do and way better?

glad i waited

[-]

randomfoo2@reddit

I think this is expected? When I was running my numbers (based on Blackwell arch). The tensor cores are basically comparable to an RTX 5070 GB205 right (see Appendix C)? https://images.nvidia.com/aem-dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell-gpu-architecture.pdf

493.9/987.8 (regular/sparse) TFLOPS Peak FP4 Tensor TFLOPS with FP32 Accumulate (FP4 AI TOPS)
123.5 Peak FP8 Tensor TFLOPS with FP32 Accumulate
61.7 Peak FP16/BF16 Tensor TFLOPS with FP32 Accumulate

FP8 and FP16/BF16 perf can be doubled w/ FP16 Accumulate (useful for inference) or with better INT8 TOPS (246.9) - llama.cpp's inference is mostly done in INT8 btw.

I don't have a Spark to test, but I do have a Strix Halo. As a point of comparison, Strix Halo has a theoretical peak of just under 60 FP16 TFLOPS as well but the top mamf-finder results I've gotten much lower (I've only benched \~35 TFLOPS max) and when testing with some regular shapes with aotriton PyTorch on attention-gym it's about 10 TFLOPS.

[-]

Mr_gmTheBest@reddit

So buying a couple of rtx5090s will be better?

[-]

Rich_Repeat_22@reddit

Maybe 2 RTX4080S 48GB from china will be cheaper and better purchase 🤔

[-]

mrjackspade@reddit

a name every programmer should know from the legendary fast inverse square root algorithm

John Carmack didn't invent the fast inverse square root.

Greg Walsh did, and he was only one of a long line of authors in its creation going back as far as 1986

https://www.netlib.org/fdlibm/e_sqrt.c

/* Other methods (use floating-point arithmetic) ------------- (This is a copy of a drafted paper by Prof W. Kahan and K.C. Ng, written in May, 1986)

[-]

NandaVegg@reddit

To be fair, it's done slightly differently in his code than fdlibm's implementation.

[-]

Dr_Karminski@reddit (OP)

My bad

[-]

mr_zerolith@reddit

Wow, i thought it would be half the perf for twice the money but it's apparently much worse than that

[-]

IrisColt@reddit

...and this is not even a joke, sigh...

[-]

Rich_Repeat_22@reddit

Ouch.

And costs over 3xR9700 (96GB) some models as much as 4xR9700s (128GB).

[-]

Double_Cause4609@reddit

Uh....

The 1PFLOP wasn't a lie. That was the sparse performance. You do get it with sparse kernels (ie: for running pruned 2:4 sparse LLMs, support's in Axolotl btw), but the tests were run on commodity dense kernels which are more common.

Everybody knew that the PFLOPs wouldn't be accurate of typical end-user inference if they read the specs sheet.

[-]

NoahFect@reddit

"1 PFLOP as long as most of the numbers are zero" is the excuse we deserved, but not the one we needed.

[-]

Double_Cause4609@reddit

Uh, not most. Half. It's 1:2 sparsity. And it's actually pretty common to see that in neural networks. ReLU activation functions trend towards 50% or so, for example.

There's actually a really big inequality in software right now because CPUs benefit from sparsity a lot (see Powerinfer, etc), while GPUs historically have not benefited in the same way.

Now, in the unstructured case (ie: raw activations), you do have a bit of a problem on GPUs still (GPUs still struggle with unbounded sparsity), but I'm guessing that you can still use the sparsity in the thing for *something* somewhere if you keep an eye out.

Again, 2:4 pruned LLMs come to mind as a really easy win (you get full benefit there really easily), but there's probably other ways to exploit it, too (possibly with tensor restructuring algorithms like hilbert curves to localize the sparsity appropriately).

[-]

Darth_Ender_Ro@reddit

So it's just half of a supercomputer? What if we buy 2?

[-]

Darth_Ender_Ro@reddit

<>

[-]

eleqtriq@reddit

It just sounds like it might be defective. Haven't seen these issues from other reviewers.

[-]

gachiemchiep@reddit

10\~20% reducing in performance because of heating is acceptable, but in half 50%. That is too much. I also remember the day when GTX 4090 burned their own power cable because of over-heating. Did Nvidia test their product before releasing?

[-]

Tyme4Trouble@reddit

It’s one petaFLOPS of sparse FP4. It’s 500 teraFLOPS dense FP4 which is almost certainly what was being measured. If 480 teraFLOPS measured is accurate, that’s actually extremely good efficiency.

Sparsity is notoriously difficult to harness, and anyone who has paid attention to Nvidia marketing will already know this.

[-]

MyHobbyIsMagnets@reddit

I don’t know anything about this product, but how is this not fraud?

[-]

ArchdukeofHyperbole@reddit

I'd buy one for like $500. Just a matter of time, years, but I'd be willing to get a used cheap one on eBay some day.

[-]

PermanentLiminality@reddit

It may be quite a wait. 3090's are still $800.

[-]

_lavoisier_@reddit

Wasn't this obvious in the first place? Cooling of these mini pcs are never adequate due to physical constraints. You won't get max performance out of such design...

[-]

Sicarius_The_First@reddit

At first, I thought the DGX was cucked, but now I know.

[-]

JLeonsarmiento@reddit

But, can it run Crisis?

[-]

smithy_dll@reddit

Yes, there’s a video on YouTube

[-]

stacksmasher@reddit

Yikes!

[-]

Informal-Spinach-345@reddit

Don't worry all the enlightened fanbois on linkedin will explain how it's for professionals to mimic datacenter environments (despite having a way slower nvlink and overall performance) and not for inference.

[-]

Upper_Road_3906@reddit

It's intentionally slow, they can do higher bandwidth memory for similar costs but they lie about poor yield and increase the cost because "complexity"

[-]

-dysangel-@reddit

Oof. So glad I just bit the bullet and got a Studio

[-]

FullstackSensei@reddit

In other news, a Ferrari is a bad option for a family car.

I know a lot of us were excited when it was first announced at CES, but by May it was very clear this was not targeted at people running LLMs locally. TFLOPS is not a very relevant metric for the target audience. This is a dev kit for businesses running full-fat DGX B200 boxes. A half dozen Sparks for developers is peanuts compared to even a single DGX B200.

[-]

VegaKH@reddit

This, sir, is no Ferrari.

[-]

FullstackSensei@reddit

Not all Ferraris were the fastest. The Spark is the 308 GTB of Nvidia. But I guess knee-jerk down voting is what people are here for.

[-]

llama-impersonator@reddit

everyone's a critic here.

[-]

BestSentence4868@reddit

Yeah which is why all the NVIDIA released benchmarks are LLMs and VLMs running locally /s
Its safe to say its a product in search of a market, and telling people its a small cluster on your desk to run cuda before running it a datacenter is copium

[-]

Iamisseibelial@reddit

No wonder the email I got saying my reservation is only good for 4 days kinda blew my mind. Like really I have to rush to buy this now or I have no guarantee at getting one.

Glad I told my org that I don't feel comfortable making a $4k decision so fast to make my WFH life easier when it's essentially my entire Q4 budget for hardware. Despite the hype leadership had around it and hell including my original thoughts as well.

[-]

egomarker@reddit

Bad news: DGX Spark may have only half the performance claimed.

[-]

MitsotakiShogun@reddit

Story of my life. Also Nvidia's life.

[-]

joninco@reddit

You mean the 5070 isn’t as fast as a 4090? Say it aint so Jensen!

[-]

innovasior@reddit

So basically nvidia is doing false advertisement. Good local inference will propably never happen

[-]

levian_@reddit

can it be fixed with updates or is it doomed?

[-]

FullOf_Bad_Ideas@reddit

Measured image diffusion generation on DGX Spark was around 3x slower than on 5090. Roughly the level of 3090 which was 568 INT4 dense and 1136 sparse INT4, but had 71 TFLOPs dense BF16 with FP32 accumulate and 142 FP16 TFlops dense with FP16 accumulate.

So performance is as expected in there. Maybe Spark has the same 2x slowdown with BF16 with FP32 accumulate as 3090 has.

[-]

Cergorach@reddit

Gee... Who didn't see that coming a mile away...

Kids, don't preorder, don't drink the coolaid!

[-]

Unlucky_Milk_4323@reddit

.. at twice the price initially mentioned.