Mac Studio vs GB10
Posted by TaylorHu@reddit | LocalLLaMA | View on Reddit | 22 comments
I can get a used Mac Studio with 128gb of memory for about the same price as a GB10 (DGX Spark) based system. Which would you all recommend? Mac wins on pure horsepower and memory bandwidth, but GB10 allows for all of the CUDA specific workflows and tools and compatibility.
StardockEngineer@reddit
GB10 wins on prompt processing by a long shot. If you're a coder or tend to uploads docs/images, PP is probably >66% of all workloads, which makes the GB10 faster.
Looking at my coding sessions, its as high as 90% on some days because agents are constantly opening files for reads.
TaylorHu@reddit (OP)
Even with the drastically lower memory bandwidth? That's interesting.
StardockEngineer@reddit
PP is compute, not memory bandwidth.
webii446@reddit
I’m actually running both setups side by side right now. two Mac Studio M3 Ultras and two DGX Sparks (MSI Edge Expert). For me, it comes down to a very clear split in how I use them.
The Mac Studio has a much more general computer vibe, which makes it my go-to for general chat and direct inference. I can just use it like my regular PC, fire up LM Studio, and interact with the models naturally. Once generation actually starts, the throughput on the Mac is incredibly smooth and consistent, making it a highly responsive daily driver. The big catch with the Mac though is the time to first token and prefill speed. On larger prompts or agentic workloads, the Mac can feel painfully slow to get started.
This is exactly where the Spark just destroys it. The DGX Spark acts strictly as my dev machine for building and testing before I eventually move my AI workloads over to the cloud. It really shines in prompt processing and heavy concurrency, which is exactly what you need for multi agent setups or any fine-tuning. Honestly, the Spark is also way better as an overall work machine for AI dev because the CUDA ecosystem is still completely unmatched, making it a lot easier to experiment.
Where the Spark really becomes invaluable for me is when I'm traveling. Because it doesn't need a monitor or any peripherals to function, I just bring it along and treat it like a portable headless server. I can plug it in wherever I am and just access it from my laptop to run all my heavy AI workloads on the go.
Ultimately, they complement each other so well, but it heavily depends on your specific use case. If you are okay with slower prompt processing but want fast token generation, don't need much of the CUDA ecosystem, and want a machine you can interact with naturally like a normal PC, go for the Mac Studio. Otherwise, if you need fast prompt processing for heavy workloads, want full access to CUDA tools, and need a solid dev machine for training, the Spark is definitely the way to go.
TaylorHu@reddit (OP)
I have a main computer. It's a Windows box and while WSL2 is usually good enough there are definitely times I wish I had MacOS. And it's beefy, 9950X3D, 4070 Ti Super, 64GB of DDR5. Coding, gaming, Lightroom, handles it all well.
So this machine would be more of a box to run AI models on, run background jobs, and definitely learn more about AI/LLM concepts to stay relevant. It would be nice if it could replace my $200/mo Codex sub though, then I could justify the cost a little more. I was worried that the slow memory bandwidth/token generation of the GB10 would be painful, but it sounds like that's less of a concern than a lot of the early reviews and naysayers on Reddit made it out to be when it launched?
I've also considered a Strix Halo as an even cheaper option, of course.
GroundbreakingMall54@reddit
mac studio and its not even close imo. the cuda stuff on gb10 sounds nice in theory but the memory bandwidth difference is massive for inference and the software ecosystem on mac with mlx is way more mature than whatever nvidia ships for that thing
webii446@reddit
The software infrastructure of Spark is significantly better than MLX, especially for fine tuning, image generation, and video generation tasks.
While memory bandwidth on Spark can be a limitation, I feel it’s largely offset by the relatively slow prefill performance on Mac Studio.
Easy-Unit2087@reddit
Not exactly. Mac wins on inference, GB10 wins on PP. Who wins overall depends on the workload. Multiple parallel agents, large prompts, ... typical for intensive agentic use --> GB10 wins (on vLLM). Single prompt, Mac wins due to higher TG. It's true that when you get into stuff like fine-tuning models, there's no substitute for being in the CUDA ecosystem.
Recommendation really depends on your use case and technical skills. Doesn't get easier than Mac with LM Studio. GB10 is more complicated, although AI can do most of that for you.
TaylorHu@reddit (OP)
When the DGX Spark was announced everyone was bemoaning it's low memory bandwidth compared to an actual desktop GPU, as if that was the most important thing to factor in outside of just total memory size. I thought that the Studio's 800gbs would be a massive gain for anything LLM related then, hence the "horsepower" remark. Sounds like that's not the case?
Easy-Unit2087@reddit
Again, it depends on the usage. If you prompt "Write me a short story about two cats in a barn", the Mac will be much faster and "win" hands down. If you prompt "Do a multi-subagent audit on my code base with expert skills on UX/UI, concurrency, memory leaks, ..." the DGX Spark will be finished before the Mac even gets to inference. Reviews of the DGX Spark from when it first launched are completely outdated, because not only did the ecosystem improve a lot (e.g. firmware, vLLM and ease-of-deployment thanks to community effort), AI usage with multiple agents each with dozens of tools is now a common use case, but last year's benchmarks were all about TG without concurrency.
inthesearchof@reddit
This is actually one of those wait till the new version comes out moments for mac studio. The M5 addresses the prompt processing/ prefill issues that are holding Mac Studios back.
spky-dev@reddit
The studio (Max or Ultra) has significantly higher memory bandwidth than a GB10, and metal support for models is quite good.
You’ll have higher PP at minimum as it’s a function of memory bandwidth.
Ecosystem wise, both are fine, you’re just picking between Cuda and MLX. Otherwise, the GB10 is minimally useful as an actual computer, where as the Studio is well… Just another Mac.
br_web@reddit
thoughts on the Asus ROG Flow Z13?
TaylorHu@reddit (OP)
I have a main computer. It's a Windows box and while WSL2 is usually good enough there are definitely times I wish I had MacOS. So this machine would be more of a box to run AI models on, run background jobs, it would be nice if it could replace my $200/mo Codex sub though.
StardockEngineer@reddit
PP is not a function of memory bandwidth, it's pure compute.
iMrParker@reddit
Yep, pp is compute bound and the GB10 can crunch much faster than something like an M3 ultra
GroundbreakingMall54@reddit
mac studio easily imo. the memory bandwidth alone makes it way better for inference and you actually get a usable desktop out of it. gb10 sounds cool on paper but the software ecosystem is still catching up, plus you're locked into nvidia's tooling for everything. with 128gb unified memory you can run pretty much any model that fits
TaylorHu@reddit (OP)
I have a main computer. It's a Windows box and while WSL2 is usually good enough there are definitely times I wish I had MacOS. So this machine would be more of a box to run AI models on, run background jobs, it would be nice if it could replace my $200/mo Codex sub though.
tarpdetarp@reddit
I'd wait for the M5 Mac Studio which will massively reduce prompt processing time. Rumours are it'll be announced near WWDC in June.
TaylorHu@reddit (OP)
Yeah but at what cost XD. And some current Mac Studio models have a 4-5 month backlog right now.
catplusplusok@reddit
AFAIK none of unified memory platforms are super fast and you need MoE models for usable coding/agent setups. NVIDIA would have faster prompt processing/finetuning and recent-ish Mac Studio faster generation. Either way, install custom vLLM forks - varok/dgx-vllm-nvfp4-kernel (NVIDIA) or vllm-mlx (Mac) to make the most of unique compute.
jacek2023@reddit
Mac wins on what...?