Comparison of upcoming x86 unified memory systems
Posted by Terminator857@reddit | LocalLLaMA | View on Reddit | 106 comments
AMD Gorgon halo summer this year. 15% faster memory clock speeds / bandwidth, than strix halo.
Intel nova lake ax expected early next year.
2027 summer: AMD Medusa Halo, 50% performance improvement with 6 memory channels up from 4 channels.
| Component | Architecture | Memory Type | Bandwidth (approx.) |
|---|---|---|---|
| Medusa Halo | Zen 6/RDNA5 | LPDDR6 | \~460 - 690 GB/s |
| Intel Nova Lake AX | - / Xe3 | LPDDR5X/6? | \~341 GB/s (10667 MT/s) |
| Gorgon Halo (Refresh) | Zen 5/RDNA3.5 | LPDDR5X-8533 | \~273 GB/s |
| Strix Halo | Zen 5/RDNA3.5 | LPDDR5X-8000 | \~256 GB/s |
pmttyji@reddit
AMD should've released 256/512GB variants last year itself. DGX too should've 256GB variant.
Current versions not great for Dense models.
Terminator857@reddit (OP)
Agree, so should have Intel. Why are they not seeing the huge market? They can charge an extra $1K and they will still sell well.
RoomyRoots@reddit
Intel is on a tightrope to not bankrupt and has lost a shitload of engineers.
fallingdowndizzyvr@reddit
Its nationalized now. That's not going to happen unless the US government goes BK.
RoomyRoots@reddit
Still it's a shell of what it used to be and they turned off lots of projects in the past few months.
fallingdowndizzyvr@reddit
Ah.. what? Intel stock is at an all time high.
Terminator857@reddit (OP)
They use to have $200 billion in the bank. How can you squander so much money?
RoomyRoots@reddit
The 14nm+++++++ meme comes from the reality. They got stuck on that years and years and years. Then mismanaged a lot, overpaid the CEO, shifted a lot of the R&D to Israel and etc. They pretty much did all the mistakes they could.
fallingdowndizzyvr@reddit
You know what other company made mistakes like that and was a couple of weeks away from going BK? Apple. How's it doing now?
RoomyRoots@reddit
Literally Apples and not-apples comparison.
fallingdowndizzyvr@reddit
LOL. So literally you avoided answering the question.
pmttyji@reddit
Their(both AMD & Intel) strategy teams need more coffee probably.
Simple Math for AMD. Below is price during release.
Strix Halo - $2000 - 128GB
DGX Spark- $4000- 128GB
Why didn't AMD release 256GB variant for $4000?
Even on GPUs, they(AMD & Intel) keep releasing same 24 or 32 GB cards. Why not 48 or 64 or 72 or 96 GB cards? I think recently Intel released 32GB card for $1000. They should've released 64GB card for $2000.
Don't know why both AMD & Intel struggling to capture market better.
Formal-Exam-8767@reddit
They are afraid of cannibalizing their other markets.
pmttyji@reddit
Still they're losing this market to NVIDIA
fallingdowndizzyvr@reddit
How so? I will venture to say that there have been more Strix Halo machines sold than Sparks. That's not losing the market. Remember, for a while it was hard to get a Strix Halo due to demand. It's never been hard to get a Spark.
UnbeliebteMeinung@reddit
Isnt the whole market full of chinese ai pcs which are sold like warm baozi?
Mochila-Mochila@reddit
The most powerful of these use the regular Strix Halo platform.
This will change once PRC releases a viable indigenous APU for the prosumer market (5+ years from now is my guess). Once this happens, AMD and Intel/nVidia will really have a thorn in their feet and won't be able to rest on their laurels anymore.
UnbeliebteMeinung@reddit
I cant wait 5 years. I want GDDR7 ai max now.
amethyst_mine@reddit
amd handles unified memory so awfully tho. its just a static split. meanwhile intel and apple actually have "unified" memory wjere both pull from the same pool
fallingdowndizzyvr@reddit
No. It's not. That's simply user error. I only run 512MB dedicated to the GPU. I run the other 125.5GB dynamically allocated between the CPU, GPU and NPU. I reserve 2GB for the CPU just because.
amethyst_mine@reddit
yes but when you do that theres a translation penalty and rocm can't run on the "shared" memory, only the "dedicated" memory according to docs i read around 2 months ago
fallingdowndizzyvr@reddit
Hm.... I guess I've been doing it all wrong since ROCm runs just fine in "shared memory". I've been doing that for longer than a couple of months.
ROCm0: AMD Radeon Graphics (128000 MiB, 125608 MiB free)
There really is no translation penalty. Well not anymore. A year or so ago, I found it be about 5%. Now, I can't detect it at all.
tecneeq@reddit
Does anyone know a list where you see bandwidth per $? Because that is what it boils down to. I can live with 64GB, but it needs to be fast.
Right now i use Strix Halo and a 5090.
Terminator857@reddit (OP)
Which do you use more: strix or 5090?
tecneeq@reddit
I use a PC with Debian 13 and a 5090 as my daily driver (except for gaming, i dual-boot into Windows for that). If in Debian, the 5090 is unused and i run llama-server:
The Strix Halo is used as a Proxmox server, but runs the same model with slithly different options:
I then have haproxy switch between them:
So, noninteractive stuff uses the Strix Halo if i'm not home or i'm roaming the world of Cyberpunk. If i'm on my PC, the agents use the faster 5090.
I get 4000 t/s pp and 170 t/s generation with the 5090.
500 t/s pp and 50 t/s generation on the Strix Halo.
I would say overall the 5090 serves more tokens.
Terminator857@reddit (OP)
Very interesting. You might want to make that a top post on localLlama.
FunkyMuse@reddit
i also want to know
robertotomas@reddit
All of those suck - you’re telling me i got to wait for 2028 to find a mac m4 competitor?
FoxiPanda@reddit
Lol I said this in nicer words and got downvoted in this thread so I just deleted my analysis. You're right though. They all suck ass compared to stuff that was released in 2025.
tecneeq@reddit
The longer your reply, the higher the chance for downvotes. They don't downvote all of it, just one tiny part.
FoxiPanda@reddit
Got, will caveman.
tecneeq@reddit
pls alwys delet long rply 2
Caffdy@reddit
redditors be like. The mob mentality is strong ngl
ImportancePitiful795@reddit
Define "M3 competitor". You mean M3U 512GB? Because that is now $25000 on used market. Which is the price you spend to buy a GH300 server, which is faster.
robertotomas@reddit
Hey man, we already had this discussion in this same thread. You can read my response there again if you like 😄
MeganDryer@reddit
You mean an M3 Ultra? The $6000 computer?
Medusa Halo would M4 Max prices, which is still a $3800 computer.
tecneeq@reddit
Once it's in production, it'll cost $6000.
robertotomas@reddit
Yes that’s kinda a fair point. Except you’ll be able to find them for like $3k second hand by then. Or you would be, if hardware kept getting better, anyway
ImportancePitiful795@reddit
Depends atm. M3U 512 right now are heading for the $25000 mark on second hand market.
At this point make no sense why get one of these and not a GH200 server for $10K more.
robertotomas@reddit
that is neither here nor there. The 512gb machine is no longer even available, and you get full bandwidth with the least ram option
ImportancePitiful795@reddit
Again, the only benefit to pick M3U 256GB over M5Max/M4Max/Strix Halo/DGX Spark 128GB is the extra 128GB RAM, not that the chip is better to justify the price.
robertotomas@reddit
The 395 and the spark is consistently slower in practice than the m4 max, despite the beefier gpus, precisely because of the bandwidth limitations that this thread is about.
ImportancePitiful795@reddit
395 is trading blows with the M4 Max, and Spark needs vLLM to stretch it's legs.
Also we would see like for like what's better, when AMD releases the MLX support for the AMD 395 (is in close beta right now) with the Lemonade Server (wrapper).
Storge2@reddit
Now I'm sad for buying a DGX Spark
Grouchy-Bed-7942@reddit
It's literally the most affordable option? Why regret it? Moreover, AMD driver support is still below what NVIDIA offers in the ecosystem; in my opinion, this won't be resolved before 2027/2028.
Storge2@reddit
Yeah almost true, Strix Halo is cheaper and apparently has solid support nowadays.
rpkarma@reddit
I wish. Has tonnes of rough edges in comparison :(
fallingdowndizzyvr@reddit
Comparison to what? Check out any one of the "Spark software" sucks threads.
mindwip@reddit
Had zero issues with amd drivers.
Grouchy-Bed-7942@reddit
But ROCm is not on the same level as Nvidia and CUDA in terms of performance; you just have to look at the difference in performance between a GB10 and a Strix Halo (I have both).
mindwip@reddit
I am fine and happy with mine, it's fast and works that all I need. And by fast i mean fast for the given known memory bandwidth. Not gpu fast of course.
Since it looks like you can do more direct comparisons, curious if you tried Lemonade? Heard good things about adding the npu in the mix. I have it installed but Imstudio has worked good enough I just been using it.
fallingdowndizzyvr@reddit
What good things have you heard? Since my experience has been meh. Sure, using the NPU is good to save power, but it doesn't help with performance. But since it also doesn't really get in the way of the GPU, it allows you to run another model at the same time.
fallingdowndizzyvr@reddit
How are you judging that performance? What program are you using?
Terminator857@reddit (OP)
Same, I'm using debian test.
sn2006gy@reddit
Don't be. These things are at least 2 years away and will cost more as I don't think the market will settle down in price anytime soon.
rpkarma@reddit
The only thing I’m sad about with mine is that NVFP4 is a lie.
Own_Mix_3755@reddit
Dont be, its a beast. If you would be sad for every tech advancement, you would cry almost everyday. Even if Intel AX or Medusa Halo releases in 2027, the question is when it is going to be available with good enough a ount of RAM. Realistically speaking I dont see them out in the wild in a year. Rather in second half of 2027 and then it will take time before things getting optimized for it. So I wouldnt worry.
And if they release Mac Studio with M5 Ultra, I am afraid that even the Medusa will be like 2x slower than that Mac.
axiomatix@reddit
Apple had laptops with 400GB/s of memory bandwidth and unified memory architecture in 2021. Somehow we're here with these options going into 2027.
ImportancePitiful795@reddit
Apple problem is the chip are slow even with MLX.
Bandwidth alone means sht if the chip cannot do the number crunching.
That's why AMD 395 trades blows with the M4 MAX even if the later has several times more bandwidth.
fallingdowndizzyvr@reddit
FIFY. The M5 changes all that.
ImportancePitiful795@reddit
M5 yes but up to M4 not.
And lets see the pricing first. Because if a M5 Max 128GB goes for $7000, that's bordering dual DGX Spark.
Let alone a M5 Ultra 512GB might be cheaper to buy a GH200 server 🤣 (literally they are not that expensive in the grand scale), since already we see M3U 512 GB at $25000 range on second hand market.
zeth0s@reddit
Amd has had unified memory for few years also for data centers
fallingdowndizzyvr@reddit
Any IGP has "unified memory" AKA "shared memory". That's been happening for decades. But that's not is being talked about here. It's fast unified memory. Which the Steam Deck definitely is not.
RoomyRoots@reddit
The problem is having to use Apple's ecosystem. I would rather wait and be able to run whatever I want with it.
rorowhat@reddit
Apple is great to play around with, that's about it.
YRUTROLLINGURSELF@reddit
thats some nice brain cancer you have there
rorowhat@reddit
Go train a model on apple hardware, I'll wait.
axiomatix@reddit
i don't understand this comment. i can do valuable time saving things on an apple device.
FastHotEmu@reddit
They simply don't want to introduce very fast memory sockets, they want us to pay through the nose for extra soldered-on or SoC RAM, following Apple's lead.
FoxiPanda@reddit
Being in the industry, there are legitimate signal integrity / latency / atomicity / coherency issues with standard DDR style memory slots. Each method has its own radar graph of strengths and weaknesses.
Soldered down memory and on-package memory helps solve a lot of those real technical issues at the cost of serviceability and expandability.
SOCAMM modules also claim to solve some of those issues, but they have their own tradeoffs too.
ElementNumber6@reddit
Being in the industry there are also product and business level discussions surrounding ecosystem lock-in and guaranteed time to upgrade.
FastHotEmu@reddit
I am also in the industry and I know the signal integrity issues are real. Soldered makes it easier, for sure. There are several standards that could be used, if the companies wanted to.
They don't need you to defend them, you are a consumer, right? Then you should be pushing for more consumer choices.
FoxiPanda@reddit
Sure, I'd love to have infinitely fast ram of infinite capacity for $0. Let's go.
Physics problems are real though /shrug
FastHotEmu@reddit
"The Oreo CEO said that more nourishing food is simply impossible! Why do you go against what the Oreo CEO said?!?!"
Caffdy@reddit
I'll gladly do it if there only was an option (where are the 140W, 1L in volume, 600GB/s memory bandwidth machine alternatives to the M5 Max?)
FastHotEmu@reddit
Bigger and more power hungry, my Epyc workstation with 400GB/sec (8 channels) and 256GB. But was way cheaper.
YoussofAl@reddit
Man how is Apple of all people mogging so hard with unified memory bandwidth.
Terminator857@reddit (OP)
Yes, Intel and AMD should stop with the small fry stuff , double their prices and bandwidth to look like apple.
RoomyRoots@reddit
Just see nvidia's comments. They don't care for consumers, this is a consumer line. The real money is on the DC cards where the cheapest can pay a cluster of those.
AMD could devour a major sector of the market but they can't displease the CEO cousin, I guess.
toptier4093@reddit
Feels good how I can now shit on my friends for hating on me having a Mac Studio. Oh wait, they're still in the "AI is dumb" phase based entirely on a couple of embedded Gemini answers shown in their Google results. Yeah some people..
Best_Control_2573@reddit
Doesn't mean much if you can't actually buy one.
ElementNumber6@reddit
Yeah. The pro line basically doesn't exist anymore. We're in a "wait and see" holding pattern, currently.
crantob@reddit
It all seems so simple: why not just add more parallel channels to your memory controller? Why has PC hardware been stuck with 2CH memory for decades?
My LLM's tell me it's mainly PCB cost - "many layers". I don't trust LLMs tho.
It does look like Apple threw a lot of infinite-iPhone-money at the problem and decided to fund the significant advancement.
For the rest of us, hey there's those AMD servers. You can eat beans for a while right?
Terminator857@reddit (OP)
I worked at Intel, so I can confirm there is a significant cost increase. More bumps on the die is perhaps the biggest cost increase. Easier done with larger dies but difficult on smaller dies. I also feel like most of us here at localLama would be glad to pay the extra cost. Intel and AMD just have to realize the market potential.
AnomalyNexus@reddit
Problem is pricing. Especially if you also want a gaming desktop. Basically means two pricey builds. Or some sort of egpu hybrid abomination
UnbeliebteMeinung@reddit
The main market for these pcs is not gaming. You dont need 128gb uma to play fortnite.
They just have to accept that these devices are purely used for llm inference.
Terminator857@reddit (OP)
My Son is playing a variety of games on his computer such as fortnite. I've been playing divinity 2 and others. Don't have an issue with games, but I suppose there is the super competitive league or something.
AnomalyNexus@reddit
Problem is next upgrade will be a 4K high refresh rate one. Gap there between APU and dedicated is likely to still be noticeable
UnbeliebteMeinung@reddit
Why do they even care about low power? Just put GDDR7 in...
Mochila-Mochila@reddit
There is zero info about NVL-AX's bandwidth that I know of.
Also, given the latest news, it's doubtful whether Intel will release a product with Xe3 outside of servers chips. So it's increasingly possible that NVL-AX as we know it will be scrapped altogether.
I'm personally looking forward to the future Intel APU with nVidia graphics. The release date was rumoured for around 2029, IIRC. Now, the plot twist it that given how nVidia's N1X is apparently doing so badly in terms of stability... perhaps Leather Jacket Man will decide to press forward the release of an actually decent APU - this time based on an x86 architecture, i.e. Intel. This might mean that, fingers crossed, a release might be on the cards in later 2028 ?
My personal hope and goal is to get an x86, CUDA-compatible, 1TB/1TB APU at an affordable price (~4000€) by 2030.
Caffdy@reddit
I very much doubt we would get that by 2030
Mochila-Mochila@reddit
I'm hoping that new RAM factories coming online around 2028, increased competition in the APU space, and a potential cooldown of the AI craze could help bring forth such offerings 😅
Tr4sHCr4fT@reddit
Narrator in 2030: The USD is now backed by RAM not gold.
Asspieburgers@reddit
Medusa Halo or Gorgon could be alright if it has a 256 GB RAM option, otherwise no point in upgrading from the Strix (I get the bandwidth of the Medusa is way better but I reckon it will be like 4k minimum lol)
Mochila-Mochila@reddit
My answer to FoxiPanda's deleted comment :
I'd like to see someone mate an APU with GDDR7 memory.
I'm guessing that if the machine were primarily aimed at AI workloads (LLMs and image/video generation), the increased latency wouldn't be too bothersome.
rhythmdev@reddit
2030 > 3tb/s + 256gb vram , price = $10k, I can take that deal.
Till then, enjoy 5090’s and 6000’s.
FastHotEmu@reddit
Unfortunately, some Reddit users are unhinged.
FoxiPanda@reddit
Oh nice you saved that - I actually agree with you, I would also like to see this with a pretty big memory controller so we could get enough aggregate memory bandwidth to make it worth it.
Independent-Date393@reddit
Medusa Halo at 690 GB/s peak would actually lap M4 Max if those numbers land. Apple has had a 4-year head start and the x86 ecosystem is just now converging on the same architecture.
sn2006gy@reddit
PCs were built on building how you like it and upgrading how you like it. A lot of inertia in that. Took a long while for people to get used to SoCs and i'm still not sure that. SoC's are the best answer. I hope the accelerators come down in price vs more vertical integration as the only option.
arousedsquirel@reddit
Let them live in their Apple ecosystem. 4 years ahead lol, not when I was buying my gpu's, more like 10 years behind....
FastHotEmu@reddit
My concern is that these are all non-upgradable systems. RAM goes bad from time to time and upgrades are a positive for consumers.
Companies like Apple will say an SoC is a requirement but that's simply not true. There are sockets that could work for very high bandwidth (eg multiple SoCAMM2, mezannine connectors, HBM, optical, etc.), but the makers are licking their lips knowing that consumers won't be able to upgrade RAM as long as they make it an SoC and cite performance reasons. They also don't want to cannibalise their server offerings.
I love being able to run models locally, but I want the systems to be upgradeable and repairable.
Awwtifishal@reddit
RAM doesn't go all bad at once. I test for bad ram from time to time and patch up the little bits that fail after a few years, by telling linux those regions of memory are reserved.
Reactor-Licker@reddit
Nova Lake AX is canceled.
Terminator857@reddit (OP)
Replaced by Nova Lake AX+ ?
FoxiPanda@reddit
All of these memory bandwidth numbers are depressing to me. A RTX Pro 6000 or a 5090 has 1.8TB/s and the Mac Studio M3 Ultra is already at 819GB/s ... so these x86 systems will probably kill the Macs at PP but will lag behind on TG...and are woefully behind - even in 2027 releases - what NVIDIA launched as a discrete card in ... 2025.
I'm kind of sad about the current state of things because the options are sacrifice PP and get big semi-fast memory or get fast PP but a small 32GB VRAM or pay 3x to bump that up to 96GB.
Where's the 2027 2-3TB/s 128-256GB unified option with decent PP?
The answer seems to be it doesn't exist and it's not even on the public roadmap...unless you're willing to pay NVIDIA a whole lot of money for a DGX station (around ~$100K). The M5 Ultra might get close, but TBD on that.