RTX Spark does not have 600GB/s Bandwith
Posted by rpiguy9907@reddit | LocalLLaMA | View on Reddit | 66 comments
Check the slides from Computex.
Every outlet that reported 600GB/s is completely wrong. That is the NvLink speed like everyone here said.
FullstackSensei@reddit
I think it was only notebookcheck and others copied brainlessly. Another possibility is that everyone used LLMs to get their articles out ASAP, and we all know how that works.
Either way, it was very obvious this wasn't going to happen.
The N1X is basically a GB10 with a different thermal profile. The underlying silicon is otherwise the same. Despite what the image leads people to believe, the GB10/N1/N1X are two dies connected via nvlink using TSMC CoWoS. The GPU side of the chip has no IO, much less any memory controllers. That leaves the CPU side to handle everything. Since the two are next to each other, the edge of the CPU chip next to the GPU cannot be used for anything else but nvlink. That leaves 3 sides of a medium sized die to handle all IO, including memory controllers. If you look at how much "shore" real estate each 32-bit memory channel takes on any chip, you don't have to be a chip design expert to realize there's no way this was going to have any more than 4 memory channel, same as the GB10. The only change is a memory speed bump from 8533MT to 9500MT, about a year after the GB10 came out.
SkyFeistyLlama8@reddit
All of which makes it a pretty good premium laptop chip, comparable to an M5 Max with the added benefit of having an Nvidia GPU. I've seen figures showing max TDP of less than 100 W; quick spikes up to 150W might be possible.
I'm old enough to remember when Nvidia marketed Tegra as hot shit way back when but the Surface RT was an incredible disappointment. The N1X hopefully won't repeat the past.
FullstackSensei@reddit
Nvidia and ARM were way smaller back when the RT came out. Tegra itself was a good chip, given available technology at the time.
N1X, like GB10, were supposed to come out last year, months before the RAM-pocalipse. Had it been released then, and priced around the same as Strix Halo, maybe it would've had a chance. But now, I have some serious doubts.
It's only resemblence with the M5 Max is the max memory configuration (128GB). Otherwise it has half the memory bandwidth, and the X925 performance cores have around M4 IPC, except the M4 clocks 15% higher. That's still no slouch, but given how bad Windows 11 runs on anything, I'm not holding my breath for even an M1 like experience.
I also don't understand why Microsoft is sidelining Qualcomm like this. At least the optics don't look good. The only reason Windows on ARM is a thing now is because they spent the past 7 years working with MS getting windows to work on ARM. Maybe Qualcomm will get their own event moment sometime later when an X3 Elite or whatever is released.
Anyways, spec sheets matter little. If Windows on ARM is to have a fleeting chance, MS will have to get it's shit together and fix all that's currently wrong with Windows 11 and WoA, and that's a tall order.
SkyFeistyLlama8@reddit
The N1X was partially designed by Mediatek. That company has plenty of experience with low and midrange smartphone chips but not with laptop chips. This is Mediatek's first mainstream laptop chip.
Qualcomm took a shortcut by acquiring Nuvia's ARM server designs (with custom ARM-compatible cores) and retooling them for laptops after years of disappointing 8cx performance (these used licensed ARM IP cores).
I'll wait for Microsoft's own keynote in a few days. I don't think it's sidelining Qualcomm, it's an effort to break into the high end and workstation end of the market with an ARM architecture while quietly leaving out previous ARM-related issues.
What's currently wrong with WoA in your view? I've used it for years since the old 8cx days on both personal and work machines and the improvement has been massive.
Welp, what an ignorant statement. Sheesh.
FullstackSensei@reddit
It's a long response. Sorry, but if you're going to call me ignorant, you're going to get the full fat list of what's wrong with W11.
> What's currently wrong with WoA in your view? I've used it for years since the old 8cx days on both personal and work machines and the improvement has been massive.
Typing this from an 8cx gen 2 laptop (HP Elite Folio). Single core performance, even on this isn't as bad. It's more or less on par with skylake. I also have skylake hardware and that still runs Windows better than this, despite both having 16GB RAM and 1TB NVMe.
Windows has over 3 decades of optimizations for x86. WoA just isn't at that level of optimization yet, and there's still . You can also see it in the number of Windows' own processes that are still running in x86 emulation. x86 emulation itself leaves quite a bit to be desired. It's still buggy and performance leaves a lot to be desired.
> Welp, what an ignorant statement. Sheesh.
You can disagree, but no need to be rude.
Windows 11 is objectively bad, because of an accumulation of intentional bad choices by MS. They intentionally ruined search to shove bing into everyone's face. They ruined the start menu for the same BS. The start menu, file manager, and even freaking notepad now run webviews, which make them 100x heavier and 100x less responsive than they need to be. They intentionally removed S3 (suspend to RAM, the old, reliable sleep mode) and shoved connected standby down everyone's throats, which doesn't work on most computers.
Microsoft's own head of Windows cites things like Filepilot and the steam deck as benchmarks of how Windows should be. Filepilot is a file explorer alternative written and maintained by a solo developer in Croatia. MS, is trying to beat a solo dev! They're literally saying Windows should run it's native apps as well as Wine emulation under Linux?!!!!
They collect a crapton of telemetry from everyone, but still can't make connected standby behave as it should, and not fully wakeup a laptop and drain it's battery while the user thinks it's in standby.
20 years ago, my thinkpads could spend a week on standby and still have 50% battery left. Today, I'm lucky to get 3 days of standby before the battery trains, on laptops that have 4-6x the battery life during use.
My 8cx gen 2 conevrtbile laptop and x86 lenovo tablet (surface like), both running the latest W11 will randomly not show the onscreen keypad when waking up from sleep mode in tablet mode. I've been using Windows tablets since Surface Pro 2. Never ever had this happen on W8 and W10. Yet, it's a chronic issue for 2 years since I was forced to upgrade to W11.
VS 2022 and 2026 have random crashes on the 8cx building or running .NET projects. They don't even have to be large. On x86, I keep the same instance of VS open for weeks at a time.
Call me ignorant all you want, but these are real issues in W11 and WoA. I've searched every one of them and I'm far from alone to have this experience.
Charming-Author4877@reddit
The biggest insult is to compare that with a RTX 5070.
It's below a 3060 ti
Serprotease@reddit
Why? Because of the bandwidth?
Charming-Author4877@reddit
Yes, significantly slower bandwidth than a budget card from 2011, about same cuda cores and likely less compute overall.
3 times slower bandwidth than a 900$ entusiast gaming card from 2011.
The actual truth is that nvidia is hardcore moving against consumers and professionals to continue buying GPUs, the problem is that 11 year old hardware is quite competitive with latest gen cloud services by now and in a year it might be on eye level.
The stockpiles of datacenter GPUs nvidia is sitting on is in the high millions, they never had an inventory that large as in january 2026.
They try to force the market into those crappy computers, as those are not strong enough to risk their cloud sales.
SpaceTraveler2084@reddit
2011??
KalonLabs@reddit
Still faster bandwidth than a 4060 🤷‍♂️
ShengrenR@reddit
*mobile
LatentSpacer@reddit
They need competition and we need to move away from CUDA towards something hardware-agnostic. That’s the only way we’ll ever get normal prices again.
I can’t wait for the day NVIDIA starts dying by its own hands when you can just ask an LLM to port the entire CUDA stack to anything else.
DataPhreak@reddit
Here's the problem: CUDA is good.
CUDA is why we can do FP4 processing. ROCm probably won't ever get it. CUDA has been around for a long time and is mature. It's super flexible. Everything is built on it.
The only way we get hardware agnostic CUDA like behavior is if A.) We build it from scratch. (there are projects) or B.) Nvidia makes the tech open license. (not going to happen)
But really, CUDA isn't the bottleneck in AI. It's memory bandwidth. Q4 FP16 processing isn't really that much slower, and Q5 or Q6 is really the sweet spot. MXF4 and Q4 are roughly even in the benchmark losses. However...
ROCm does do FP8, and I think that's where we end up once memory bandwidth and volume are no longer the issue. Right now, we squash models down too far because we're trying to fit them onto tiny consumer cards and want to go as fast as possible with the "Biggest" model possible. But really, if you had the memory and bandwidth, you'd likely want to run FP8 to get the smartest model possible. But again, that's not going to happen on consumer AMD cards, since they are RDNA architecture and FP88 requires CDNA architecture.
Basically, wait a few years and CUDA isn't going to be nearly as relevant for AI as it is right now.
barnett9@reddit
You forgot about C.) Nvidia is considered a monopoly and is forced to sell off CUDA.
LOL
DataPhreak@reddit
I like the way you think.
Even better, seize the means of compute.
jazir55@reddit
Option D is: Open Source Drop in replacement that has full compatibility, basically Proton but for CUDA.
DataPhreak@reddit
jazir55@reddit
Normally I would agree with you, but this isn't creating another standard, this would be emulating compatibility for an existing one. Proton doesn't add a new standard, it just converts Windows calls to native Linux calls via WINE.
SPACEXDG@reddit
Nope
traderjay_toronto@reddit
Mediatek and performance don’t belong in the same sentence lol
BoogerheadCult@reddit
Forgot the $3000 price tags.
They think everybody gonna go rush out and buy this shit. LOL.
More-Curious816@reddit
if it is 3k or higher just buy a mac, like wtf, it has better quality and higher bandwidth than whatever this crab from the worst companies for end users, NoVIDIA and Microslop.
BoogerheadCult@reddit
Right ? Better resale value too.
All these Nvidia shills and people buying these overpriced garbage has lost their god darn minds.
andy_potato@reddit
The numbers are indeed misleading. Also the moment I heard “Windows on ARM” I noped out.
More-Curious816@reddit
you are gonna make sataya sad, please don't say that.
andy_potato@reddit
It's sad to see an otherwise halfway decent hardware being crippled by such a poor OS choice.
PopularKnowledge69@reddit
So the target users are the same ones using DGX Spark but want mobility. I bet the performance is capped when the laptop is unplugged.
BringTea_666@reddit
I mean they already sold out DGX spark to suckers so this will sell too.
andy_potato@reddit
The DGX had legitimate use cases but poor marketing. People believed it was a kind of high memory local inference machine which it absolutely wasn’t.
More-Curious816@reddit
they should say that it was a crippled sandbox for cuda testing before deployment to the full chip. they didn't say that, and that why people believed it was the dream box for local inference from nvida, especially AI users with low to no technical knowledge. I wish a law for misleading advertising extended to these tech PowerPoint hype slides and actually enforced.
Sufficient_Phone_242@reddit
Only unified devices im eager for is apple m5 ultra if it ever comes out … under 10k CAD would be a « steal » today
Super_Sierra@reddit
Nvidia did something cool asf, 300 gb/s at 80w, that's pretty good. Wish the gputards would understand if they hate something it probably isn't for their market tho.
mrgulabull@reddit
Eh, the M5 Max has ~600GB/s at ~60-80w. It’s not CUDA, but this isn’t groundbreaking performance per watt.
SkyFeistyLlama8@reddit
But then 600 GB/s gets you higher token generation numbers, not necessarily higher prompt processing performance. The N1X like the GB10 has much stronger prompt processing.
michaelsoft__binbows@reddit
limit 5090 to 400W, 4.25GB/s/W. 300/80 is 3.75GB/s/W. Kind of trash, no?
nomorebuttsplz@reddit
if you connect two of them and spread model across both, does that mean you can effectively get 600?
FullstackSensei@reddit
If you get two women pregnant at the same time, does that mean you'll have a child every 4.5 months? /s
Slightly more seriously, you won't get 600GB/s. For one, unlike DGX Spark, this lacks the 100gb NIC. For another, even if you had a 100gb NIC, you'd need highly optimized software, which as of now doesn't exist. But even if both were true, you can't expect linear scaling due to all the latencies involved. I'd say 1.7x is a best case, but that would require tremendous effort for something that's very much a niche of a niche.
anitamaxwynnn69@reddit
The level of creativity you need to have to come up with that analogy XD
Twirrim@reddit
It's a famous one from The Mythical Man Month, about the fallacy of expecting linear speed ups from allocating more people to a task. 2 people can't make a baby in 4.5 months, 9 people can't make one in a single month.
The wheels fall of the analogy a little bit, but it's a very effective way to get people to start to understand reality.
LightBroom@reddit
You should read The Mythical Man Month
FullstackSensei@reddit
It's a slight modification of a widely used portuguĂŞs proverb widely used when someone asks for something to be rushed.
ArtyfacialIntelagent@reddit
Lol, good one. But on average, actually it does.
gh0stwriter1234@reddit
Sorry to rain on your parade but llama.cpp has tensor parallel for awhile now and it does scale by about 150% on my ancient MI50s.
FullstackSensei@reddit
Sorry to rain on your own parade but multi-GPU in the same machine is not the same as two machines. It doesn't matter that tensor parallelism can work on RPC, latency alone will kill performance. The TCP/IP stack alone would kill scaling even if you had a 200gb NIC, except those laptops will be lucky to get 40gb USB4.
BTW, Mi50 is nothing to sneeze at, especially if you got them cheap last year. I've been running this since late September and can confirm it scales great with x16 per GPU. But that's mainly because of PCIe latency.
Glittering-Call8746@reddit
Scaling between two rtx spark ? I think that's not a use case..
Double_Cause4609@reddit
I think that's actually what the slide is claiming, yeah. The way it's laid out it's a bit ambiguous and the 600GB/s could be the interconnect bandwidth or the combined memory bandwidth with two chips.
If you have any way of calculating the same linear op in parallel on two chips (like tensor parallelism) you should effectively get close to 600GB/s I think. I'm pretty sure the arch of this chip should work with async tensor parallel kernels for the Nvidia datacenter GPUs if I had to guess (that's the point of the arch).
Failing support for that, there are still ways to compose the performance from two of these, but it's way harder than just having it all on one chip and makes everything super complicated code-wise. In principle though, yeah, it's possible.
fallingdowndizzyvr@reddit
That slide is clear. This one is not. This is why it was reported incorrectly.
https://www.notebookcheck.net/fileadmin/processed/a/2/csm_RTX-Spark-specs_dd5b710e5c.jpg
FullstackSensei@reddit
The slide is not ambiguous at all if you have followed anything about GB10.
GB10/N1/N1X is designed of two dies, a GPU die made by Nvidia and a CPU-IO die designed by Mediatek. The GPU die only has nvlink to connect it to the CPU die. This is basically the first product using nvlink licensed to 3rd parties. You can see it as a blueprint for how future collaborations between Nvidia and others like Intel, etc will look like.
fallingdowndizzyvr@reddit
No. For no other reason than 273 + 273 != 600.
Thin_Pollution8843@reddit
Of course not. 600gbs good if you connect more than 2. Memory inside each machine still processed whithing chips with only 300gbs
nomorebuttsplz@reddit
oh wow is the nvlink not even i/o?
FullstackSensei@reddit
Nvlink is to connect the GPU die with the CPU die. All the IO and memory controllers are on the CPU die, which was designed by Mediathek, BTW.
Thin_Pollution8843@reddit
Tbh me too. They positioning this chip as laptop amd killer. I doubt those laptops will have exposed specific nvlink connectors. Most likely this number here just to impress plebs
sn2006gy@reddit
No...
The weird thing is, they're expensive because of ConnectX 200gbit ports, but the 200gbit ports are still too slow to do split models for performance gains, just split models for memory pressure.
Now if these were NVLink and stackable i'd be singing a different tune.
superSmitty9999@reddit
What are they trying to sell this for?Â
No-Refrigerator-1672@reddit
Nvidia claimed it'll be the new era of PC, and they are right. It really looks like the future of computing is overpriced offcut chip, paired with lackluster IO and marketed like it's the best thing since sliced bread.
Dany0@reddit
AGI will solve this, everyone will have so much Jensen's AGI in their lives that they will be able to use it to print their own offcut chips with lackluster IO at home 🤣
looselyhuman@reddit
Microsoft must have a Gemma-style Copilot local model in the works that will be optimized for the hardware, and they'll have achieved their AI PC. That's all this is.
mindwip@reddit
Jenson is master of marketing. Amd getting hammered cause of this lol.
Gwolf4@reddit
You haven't seen his presentations if you think this, he is so cringe it sucks.
Mgladiethor@reddit
F nvidia, f apple.
gh0stwriter1234@reddit
Hammered? you haven't seen AMD stock recently have you...
FortheredditLOLz@reddit
Understatement. Jensen is probably easy top three salesman outside of musk and Cheetos
fallingdowndizzyvr@reddit
That was not the only slide. That one is clear. This one is not. This is why it was erroneously reported to be 600GB/s.
https://www.notebookcheck.net/fileadmin/processed/a/2/csm_RTX-Spark-specs_dd5b710e5c.jpg
CatalyticDragon@reddit
They managed to clock the LPDDR5 up to reach 300GB/s, making it 10-17% faster than a Strix Halo which is nice. Although Strix is cheaper and has been out for 18 months already plus having better software support.
I do much prefer the network capabilities on GB10 though.
mjsxi__@reddit
laughing at the dummy who thought it was 600gb/s in the last thread despite everyone correcting them and they wouldn't back down that they just misread it