RTX Spark's 128GB Unified Memory Sounds Great Until You Realize It Kills Upgradability. We Need Dedicated AI Accelerator Cards Instead

Posted by Renoktation@reddit | hardware | View on Reddit | 24 comments

I just saw NVIDIA's event at Computex 2026 where they unveiled their RTX Spark computers which houses a CPU-GPU combo on a single chip with up to 128 GB unified memory, primarily aimed at running local AI inferencing and may be some training as well.

Personally, I would like to run local AI models on my PC without compromising on speed or accuracy. Unfortunately, the biggest limiting factor in my case is VRAM on my RTX 5070 GPU. I understand that these GPUs are optimized for running video games and so it is not logical to expect these cards to have 48 GB VRAM. While APUs with unified memory can be a solution, but that would also mean that we would lose the flexibility to upgrade RAM and GPU on our system besides limiting performance due to thermal throttling.

Hence, I feel the only practical solution for desktop PC users who would want to run local AI also besides gaming is to add an AI accelerator card with some 32 to 48 GB VRAM on motherboard using PCIe slot. But for this, at first, we will need,

(a) Motherboards that would support 2 PCIe x16 slots and

(b) Affordable AI accelerator cards with sufficient RAMs

I would love to know how you people feel about it.

[-]

rowdy_1c@reddit

So you want affordable, high bandwidth, upgradeable ram? Pick two

[-]

Capable_Site_2891@reddit

Pick one?

[-]

rowdy_1c@reddit

Unified memory is non-upgradeable, DIMMs are low-bandwidth, CUDIMM/LPCAM are expensive

[-]

Capable_Site_2891@reddit

Can I have an example of some systems that are two of the three, though?

DD5 system ram is slow and expensive and upgradable Lpdr5 is slow and expensive and non upgradable Mac memory is medium speed, expensive, and non upgradable The price difference on the RTX 5090 to 6000: fast, expensive, and non upgradable

[-]

siazdghw@reddit

That's just where the industry is headed even if it upsets DIY/hobbyists like us. Unified memory is just the tip of the iceberg.

Look at Mac devices, complete control over everything has made them some of the best computing devices you can buy.

I don't see Nvidia and Qualcomm embracing ATX standards for their desktop models. ATX is antiquated and inefficient design.

[-]

Noble00_@reddit

Although the matter of discussion is nuanced it seems people in this post have forgotten about other/future module form factors like LPCAMM2 that's already featured on Framework 13 Pro:

https://frame.work/ca/en/laptop13pro

Their modules run at 7467mt/s but AFIK we can hit high speeds like 9600mt/s:

https://assets.micron.com/adobe/assets/urn:aaid:aem:4e076108-df95-4c2d-8785-06c30049afb5/original/as/lpddr5x-camm2-technical-brief.pdf

Same speeds already on M5, X2 Elite, PTL and faster than GB10/N1X (8533mt/s), and STX-H (8000mt/s).

As for desktop, it's even more complicated with more form factors but hints point towards CAMM2 and DDR6 for future gen.

[-]

randomkidlol@reddit

its all about planned obsolescence. theyll never make it upgradeable because they want you to throw it out in ~7 years time.

[-]

alexforencich@reddit

Unfortunately high bandwidth and low latency isn't really compatible with sockets, at least not cheap and compact sockets. You need a lot of pins, and very good signal integrity. Normal DDRx is only a handful of pins and it runs somewhat slow, so it can be socketed without too much issue. On the other extreme, something like HBM is practically impossible to put in a socket.

[-]

randomkidlol@reddit

we're talking about a laptop that can very easily have socketed SODIMM or SOCAMM modules in addition to dedicated soldered memory for a GPU. unified memory is great to reduce circuit board sizes and lower the BOM for manufacturers. laptops should have options and users should decide for themselves what is worth it to them.

[-]

alexforencich@reddit

Ah yeah I see what you mean, supplement the high bandwidth stuff instead of upgrading the high bandwidth stuff. I guess the question then is one of physical space - bigger battery and/or smaller laptop vs. potential for adding more RAM later via something like CAMM. And also I don't know if there are any potential architectural complications - it certainly makes things easy if the GPU can access all the RAM. But it's probably not too hard to have a mechanism for determining where things are allocated.

[-]

randomkidlol@reddit

having separate memory pools is how GPUs have worked for 30 years running now, and its still the preferred model for server workloads and gaming workloads. the unified memory model doesnt offer many performance benefits because GPU work is usually batched anyways, and GPUs are more sensitive to memory bandwidth than latency.

not sure how well the radeon instinct MI300A sells compared to the non APU parts, but the fact that theres not many APU skus for datacenter says a lot about what customers really want.

[-]

Renoktation@reddit (OP)

It certainly feels like this. These companies have invested so much in AI, they would like to milk consumers to pay off their debts. NVIDIA knew that AI is becoming mainstream. Still, it launched X 5070 with just 12 GB of VRAM when it could theoretically have 24 GB VRAM.

[-]

randomkidlol@reddit

yep nvidia does not want a 10 series blunder again. that generation was such huge leap in cost effectiveness, performance, and VRAM that it compromised the sales of 2 generations of future products. unless AMD is suddenly competitive again and they start panic pricing/upgrading their consumer product stack, i dont see them ever slimming down their margins to that degree again.

[-]

d4ybrake@reddit

SOCs are the future, your best bet is a framework or similar that lets you swap out the mainboard while keeping the same shell

[-]

Sopel97@reddit

what you want has existed for like a decade

[-]

Flynn58@reddit

Something I wonder is whether you could have an AI accelerator card with RAM sockets on the card itself? That way you can upgrade the dedicated memory of the card.

[-]

darksamus8@reddit

I was actually thinking accelerator cards with on-board upgradeable RAM. so you have your system ram, and then the ram meant for your AI models.

[-]

alexforencich@reddit

Well, you also need a CPU with more than 24 lanes of PCIe, which means you're in HEDT or server territory.

[-]

Renoktation@reddit (OP)

Ya, that's true as well. But most modern CPUs support 24 lanes. If another 4 lanes are added, we can connect accelerator cards in x8 slots as well. PCIe 5.0 would provide sufficient speed. So we may not need to enter server territory.

[-]

Jumpy-Dinner-5001@reddit

Doesn't work.

Same issue with Aple silicon and strix halo.

[-]