Comparison of upcoming x86 unified memory systems

Component	Architecture	Memory Type	Bandwidth (approx.)
Medusa Halo	Zen 6/RDNA5	LPDDR6	\~460 - 690 GB/s
Intel Nova Lake AX	- / Xe3	LPDDR5X/6?	\~341 GB/s (10667 MT/s)
Gorgon Halo (Refresh)	Zen 5/RDNA3.5	LPDDR5X-8533	\~273 GB/s
Strix Halo	Zen 5/RDNA3.5	LPDDR5X-8000	\~256 GB/s

[-]

pmttyji@reddit

AMD should've released 256/512GB variants last year itself. DGX too should've 256GB variant.

Current versions not great for Dense models.

[-]

Terminator857@reddit (OP)

Agree, so should have Intel. Why are they not seeing the huge market? They can charge an extra $1K and they will still sell well.

[-]

RoomyRoots@reddit

Intel is on a tightrope to not bankrupt and has lost a shitload of engineers.

[-]

fallingdowndizzyvr@reddit

Intel is on a tightrope to not bankrupt

Its nationalized now. That's not going to happen unless the US government goes BK.

[-]

RoomyRoots@reddit

Still it's a shell of what it used to be and they turned off lots of projects in the past few months.

[-]

fallingdowndizzyvr@reddit

Ah.. what? Intel stock is at an all time high.

[-]

Terminator857@reddit (OP)

They use to have $200 billion in the bank. How can you squander so much money?

[-]

The 14nm+++++++ meme comes from the reality. They got stuck on that years and years and years. Then mismanaged a lot, overpaid the CEO, shifted a lot of the R&D to Israel and etc. They pretty much did all the mistakes they could.

[-]

fallingdowndizzyvr@reddit

You know what other company made mistakes like that and was a couple of weeks away from going BK? Apple. How's it doing now?

[-]

RoomyRoots@reddit

Literally Apples and not-apples comparison.

[-]

fallingdowndizzyvr@reddit

LOL. So literally you avoided answering the question.

[-]

pmttyji@reddit

Their(both AMD & Intel) strategy teams need more coffee probably.

Simple Math for AMD. Below is price during release.

Strix Halo - $2000 - 128GB

DGX Spark- $4000- 128GB

Why didn't AMD release 256GB variant for $4000?

Even on GPUs, they(AMD & Intel) keep releasing same 24 or 32 GB cards. Why not 48 or 64 or 72 or 96 GB cards? I think recently Intel released 32GB card for $1000. They should've released 64GB card for $2000.

Don't know why both AMD & Intel struggling to capture market better.

[-]

Formal-Exam-8767@reddit

They are afraid of cannibalizing their other markets.

[-]

pmttyji@reddit

Still they're losing this market to NVIDIA

[-]

fallingdowndizzyvr@reddit

How so? I will venture to say that there have been more Strix Halo machines sold than Sparks. That's not losing the market. Remember, for a while it was hard to get a Strix Halo due to demand. It's never been hard to get a Spark.

[-]

UnbeliebteMeinung@reddit

Isnt the whole market full of chinese ai pcs which are sold like warm baozi?

[-]

Mochila-Mochila@reddit

The most powerful of these use the regular Strix Halo platform.

This will change once PRC releases a viable indigenous APU for the prosumer market (5+ years from now is my guess). Once this happens, AMD and Intel/nVidia will really have a thorn in their feet and won't be able to rest on their laurels anymore.

[-]

UnbeliebteMeinung@reddit

I cant wait 5 years. I want GDDR7 ai max now.

[-]

amethyst_mine@reddit

amd handles unified memory so awfully tho. its just a static split. meanwhile intel and apple actually have "unified" memory wjere both pull from the same pool

[-]

fallingdowndizzyvr@reddit

amd handles unified memory so awfully tho. its just a static split.

No. It's not. That's simply user error. I only run 512MB dedicated to the GPU. I run the other 125.5GB dynamically allocated between the CPU, GPU and NPU. I reserve 2GB for the CPU just because.

[-]

amethyst_mine@reddit

yes but when you do that theres a translation penalty and rocm can't run on the "shared" memory, only the "dedicated" memory according to docs i read around 2 months ago

[-]

fallingdowndizzyvr@reddit

Hm.... I guess I've been doing it all wrong since ROCm runs just fine in "shared memory". I've been doing that for longer than a couple of months.

ROCm0: AMD Radeon Graphics (128000 MiB, 125608 MiB free)

There really is no translation penalty. Well not anymore. A year or so ago, I found it be about 5%. Now, I can't detect it at all.

[-]

tecneeq@reddit

Does anyone know a list where you see bandwidth per $? Because that is what it boils down to. I can live with 64GB, but it needs to be fast.

Right now i use Strix Halo and a 5090.

[-]

Terminator857@reddit (OP)

Which do you use more: strix or 5090?

[-]

tecneeq@reddit

I use a PC with Debian 13 and a 5090 as my daily driver (except for gaming, i dual-boot into Windows for that). If in Debian, the 5090 is unused and i run llama-server:

llama-server --hf-repo unsloth/Qwen3.6-35b-a3b-GGUF:UD-Q5_K_XL --alias Qwen3.6 --no-mmap --host 0.0.0.0 --port 11337 --no-mmproj-offload --gpu-layers 99 --fit on --threads 8 --flash-attn on --cache-type-k q8_0 --cache-type-v q8_0 --presence-penalty 0.0 --repeat-penalty 1.0 --temperature 0.6 --top-k 20 --top-p 0.95 --min-p 0.0 --n-predict 32768 --ctx-size 262144

The Strix Halo is used as a Proxmox server, but runs the same model with slithly different options:

llama-server --hf-repo unsloth/Qwen3.6-35b-a3b-GGUF:UD-Q5_K_XL --alias Qwen3.6 --no-mmap --host 0.0.0.0 --port 1337
--gpu-layers 99 --fit on --batch-size 6144 --ubatch-size 1024 --threads 16 --prio 2 --flash-attn on --cache-type-k f16 --cache-type-v f16 --device ROCm0 --presence-penalty 0.0 --repeat-penalty 1.0 --temperature 0
.6 --top-k 20 --top-p 0.95 --n-predict 32768 --ctx-size 262144

I then have haproxy switch between them:

frontend llamacpp_frontend_11337
  bind 192.168.178.6:11337
  mode http
  default_backend llamacpp_backend_11337

backend llamacpp_backend_11337
  mode http
  balance roundrobin
  option httpchk GET /health
  http-check expect status 200
  server pc_5090    192.168.178.11:11337 check
  server strix_halo 192.168.178.6:1337  check backup

So, noninteractive stuff uses the Strix Halo if i'm not home or i'm roaming the world of Cyberpunk. If i'm on my PC, the agents use the faster 5090.

I get 4000 t/s pp and 170 t/s generation with the 5090.

500 t/s pp and 50 t/s generation on the Strix Halo.

I would say overall the 5090 serves more tokens.

[-]

Terminator857@reddit (OP)

Very interesting. You might want to make that a top post on localLlama.

[-]

FunkyMuse@reddit

i also want to know

[-]

robertotomas@reddit

All of those suck - you’re telling me i got to wait for 2028 to find a mac m4 competitor?

[-]

FoxiPanda@reddit

Lol I said this in nicer words and got downvoted in this thread so I just deleted my analysis. You're right though. They all suck ass compared to stuff that was released in 2025.

[-]

tecneeq@reddit

The longer your reply, the higher the chance for downvotes. They don't downvote all of it, just one tiny part.

[-]

FoxiPanda@reddit

Got, will caveman.

[-]

tecneeq@reddit

pls alwys delet long rply 2

[-]

Caffdy@reddit

I said this in nicer words and got downvoted in this thread

redditors be like. The mob mentality is strong ngl

[-]

ImportancePitiful795@reddit

Define "M3 competitor". You mean M3U 512GB? Because that is now $25000 on used market. Which is the price you spend to buy a GH300 server, which is faster.

[-]

robertotomas@reddit

Hey man, we already had this discussion in this same thread. You can read my response there again if you like 😄

[-]

MeganDryer@reddit

You mean an M3 Ultra? The $6000 computer?

Medusa Halo would M4 Max prices, which is still a $3800 computer.

[-]

tecneeq@reddit

Once it's in production, it'll cost $6000.

[-]

robertotomas@reddit

Yes that’s kinda a fair point. Except you’ll be able to find them for like $3k second hand by then. Or you would be, if hardware kept getting better, anyway

[-]

ImportancePitiful795@reddit

Depends atm. M3U 512 right now are heading for the $25000 mark on second hand market.

At this point make no sense why get one of these and not a GH200 server for $10K more.

[-]

robertotomas@reddit

that is neither here nor there. The 512gb machine is no longer even available, and you get full bandwidth with the least ram option

[-]

ImportancePitiful795@reddit

Again, the only benefit to pick M3U 256GB over M5Max/M4Max/Strix Halo/DGX Spark 128GB is the extra 128GB RAM, not that the chip is better to justify the price.

[-]

robertotomas@reddit

The 395 and the spark is consistently slower in practice than the m4 max, despite the beefier gpus, precisely because of the bandwidth limitations that this thread is about.

[-]

ImportancePitiful795@reddit

395 is trading blows with the M4 Max, and Spark needs vLLM to stretch it's legs.

Also we would see like for like what's better, when AMD releases the MLX support for the AMD 395 (is in close beta right now) with the Lemonade Server (wrapper).

[-]

Storge2@reddit

Now I'm sad for buying a DGX Spark

[-]

Grouchy-Bed-7942@reddit

It's literally the most affordable option? Why regret it? Moreover, AMD driver support is still below what NVIDIA offers in the ecosystem; in my opinion, this won't be resolved before 2027/2028.

[-]

Storge2@reddit

Yeah almost true, Strix Halo is cheaper and apparently has solid support nowadays.

[-]

rpkarma@reddit

solid support

I wish. Has tonnes of rough edges in comparison :(

[-]

fallingdowndizzyvr@reddit

Comparison to what? Check out any one of the "Spark software" sucks threads.

[-]

mindwip@reddit

Had zero issues with amd drivers.

[-]

Grouchy-Bed-7942@reddit

But ROCm is not on the same level as Nvidia and CUDA in terms of performance; you just have to look at the difference in performance between a GB10 and a Strix Halo (I have both).

[-]

mindwip@reddit

I am fine and happy with mine, it's fast and works that all I need. And by fast i mean fast for the given known memory bandwidth. Not gpu fast of course.

Since it looks like you can do more direct comparisons, curious if you tried Lemonade? Heard good things about adding the npu in the mix. I have it installed but Imstudio has worked good enough I just been using it.

[-]

fallingdowndizzyvr@reddit

Heard good things about adding the npu in the mix.

What good things have you heard? Since my experience has been meh. Sure, using the NPU is good to save power, but it doesn't help with performance. But since it also doesn't really get in the way of the GPU, it allows you to run another model at the same time.

[-]

fallingdowndizzyvr@reddit

How are you judging that performance? What program are you using?

[-]

Terminator857@reddit (OP)

Same, I'm using debian test.

[-]

sn2006gy@reddit

Don't be. These things are at least 2 years away and will cost more as I don't think the market will settle down in price anytime soon.

[-]

rpkarma@reddit

The only thing I’m sad about with mine is that NVFP4 is a lie.

[-]

Own_Mix_3755@reddit

Dont be, its a beast. If you would be sad for every tech advancement, you would cry almost everyday. Even if Intel AX or Medusa Halo releases in 2027, the question is when it is going to be available with good enough a ount of RAM. Realistically speaking I dont see them out in the wild in a year. Rather in second half of 2027 and then it will take time before things getting optimized for it. So I wouldnt worry.

And if they release Mac Studio with M5 Ultra, I am afraid that even the Medusa will be like 2x slower than that Mac.

[-]

axiomatix@reddit

Apple had laptops with 400GB/s of memory bandwidth and unified memory architecture in 2021. Somehow we're here with these options going into 2027.

[-]

ImportancePitiful795@reddit

Apple problem is the chip are slow even with MLX.

Bandwidth alone means sht if the chip cannot do the number crunching.

That's why AMD 395 trades blows with the M4 MAX even if the later has several times more bandwidth.

[-]

fallingdowndizzyvr@reddit

Apple problem ~~is~~ was the chip are slow even with MLX.

FIFY. The M5 changes all that.

[-]

ImportancePitiful795@reddit

M5 yes but up to M4 not.

And lets see the pricing first. Because if a M5 Max 128GB goes for $7000, that's bordering dual DGX Spark.

Let alone a M5 Ultra 512GB might be cheaper to buy a GH200 server 🤣 (literally they are not that expensive in the grand scale), since already we see M3U 512 GB at $25000 range on second hand market.

[-]

zeth0s@reddit

Amd has had unified memory for few years also for data centers

[-]

fallingdowndizzyvr@reddit

There was no market for PC, but even steam deck has it

Any IGP has "unified memory" AKA "shared memory". That's been happening for decades. But that's not is being talked about here. It's fast unified memory. Which the Steam Deck definitely is not.

[-]

RoomyRoots@reddit

The problem is having to use Apple's ecosystem. I would rather wait and be able to run whatever I want with it.

[-]

rorowhat@reddit

Apple is great to play around with, that's about it.

[-]

YRUTROLLINGURSELF@reddit

thats some nice brain cancer you have there

[-]

rorowhat@reddit

Go train a model on apple hardware, I'll wait.

[-]

axiomatix@reddit

i don't understand this comment. i can do valuable time saving things on an apple device.

[-]

FastHotEmu@reddit

They simply don't want to introduce very fast memory sockets, they want us to pay through the nose for extra soldered-on or SoC RAM, following Apple's lead.

[-]

FoxiPanda@reddit

Being in the industry, there are legitimate signal integrity / latency / atomicity / coherency issues with standard DDR style memory slots. Each method has its own radar graph of strengths and weaknesses.

Soldered down memory and on-package memory helps solve a lot of those real technical issues at the cost of serviceability and expandability.

SOCAMM modules also claim to solve some of those issues, but they have their own tradeoffs too.

[-]

ElementNumber6@reddit

Being in the industry there are also product and business level discussions surrounding ecosystem lock-in and guaranteed time to upgrade.

[-]

FastHotEmu@reddit

I am also in the industry and I know the signal integrity issues are real. Soldered makes it easier, for sure. There are several standards that could be used, if the companies wanted to.

They don't need you to defend them, you are a consumer, right? Then you should be pushing for more consumer choices.

[-]

FoxiPanda@reddit

Sure, I'd love to have infinitely fast ram of infinite capacity for $0. Let's go.

Physics problems are real though /shrug

[-]

FastHotEmu@reddit

"The Oreo CEO said that more nourishing food is simply impossible! Why do you go against what the Oreo CEO said?!?!"

[-]

Caffdy@reddit

I'll gladly do it if there only was an option (where are the 140W, 1L in volume, 600GB/s memory bandwidth machine alternatives to the M5 Max?)

[-]

FastHotEmu@reddit

Bigger and more power hungry, my Epyc workstation with 400GB/sec (8 channels) and 256GB. But was way cheaper.

[-]

YoussofAl@reddit

Man how is Apple of all people mogging so hard with unified memory bandwidth.

[-]

Terminator857@reddit (OP)

Yes, Intel and AMD should stop with the small fry stuff , double their prices and bandwidth to look like apple.

[-]

RoomyRoots@reddit

Just see nvidia's comments. They don't care for consumers, this is a consumer line. The real money is on the DC cards where the cheapest can pay a cluster of those.

AMD could devour a major sector of the market but they can't displease the CEO cousin, I guess.

[-]

toptier4093@reddit

Feels good how I can now shit on my friends for hating on me having a Mac Studio. Oh wait, they're still in the "AI is dumb" phase based entirely on a couple of embedded Gemini answers shown in their Google results. Yeah some people..

[-]

Best_Control_2573@reddit

Doesn't mean much if you can't actually buy one.

[-]

ElementNumber6@reddit

Yeah. The pro line basically doesn't exist anymore. We're in a "wait and see" holding pattern, currently.

[-]

crantob@reddit

It all seems so simple: why not just add more parallel channels to your memory controller? Why has PC hardware been stuck with 2CH memory for decades?

My LLM's tell me it's mainly PCB cost - "many layers". I don't trust LLMs tho.

It does look like Apple threw a lot of infinite-iPhone-money at the problem and decided to fund the significant advancement.

For the rest of us, hey there's those AMD servers. You can eat beans for a while right?

[-]

Terminator857@reddit (OP)

I worked at Intel, so I can confirm there is a significant cost increase. More bumps on the die is perhaps the biggest cost increase. Easier done with larger dies but difficult on smaller dies. I also feel like most of us here at localLama would be glad to pay the extra cost. Intel and AMD just have to realize the market potential.

[-]

AnomalyNexus@reddit

Problem is pricing. Especially if you also want a gaming desktop. Basically means two pricey builds. Or some sort of egpu hybrid abomination

[-]

UnbeliebteMeinung@reddit

The main market for these pcs is not gaming. You dont need 128gb uma to play fortnite.

They just have to accept that these devices are purely used for llm inference.

[-]

Terminator857@reddit (OP)

My Son is playing a variety of games on his computer such as fortnite. I've been playing divinity 2 and others. Don't have an issue with games, but I suppose there is the super competitive league or something.

[-]

AnomalyNexus@reddit

Problem is next upgrade will be a 4K high refresh rate one. Gap there between APU and dedicated is likely to still be noticeable

[-]

UnbeliebteMeinung@reddit

Why do they even care about low power? Just put GDDR7 in...

[-]

Mochila-Mochila@reddit

There is zero info about NVL-AX's bandwidth that I know of.

Also, given the latest news, it's doubtful whether Intel will release a product with Xe3 outside of servers chips. So it's increasingly possible that NVL-AX as we know it will be scrapped altogether.

I'm personally looking forward to the future Intel APU with nVidia graphics. The release date was rumoured for around 2029, IIRC. Now, the plot twist it that given how nVidia's N1X is apparently doing so badly in terms of stability... perhaps Leather Jacket Man will decide to press forward the release of an actually decent APU - this time based on an x86 architecture, i.e. Intel. This might mean that, fingers crossed, a release might be on the cards in later 2028 ?

My personal hope and goal is to get an x86, CUDA-compatible, 1TB/1TB APU at an affordable price (~4000€) by 2030.

[-]

Caffdy@reddit

My personal hope and goal is to get an x86, CUDA-compatible, 1TB/1TB APU at an affordable price (~4000€) by 2030

I very much doubt we would get that by 2030

[-]

Mochila-Mochila@reddit

I'm hoping that new RAM factories coming online around 2028, increased competition in the APU space, and a potential cooldown of the AI craze could help bring forth such offerings 😅

[-]

Tr4sHCr4fT@reddit

Narrator in 2030: The USD is now backed by RAM not gold.

[-]

Asspieburgers@reddit

Medusa Halo or Gorgon could be alright if it has a 256 GB RAM option, otherwise no point in upgrading from the Strix (I get the bandwidth of the Medusa is way better but I reckon it will be like 4k minimum lol)

[-]

Mochila-Mochila@reddit

My answer to FoxiPanda's deleted comment :

All of these memory bandwidth numbers are depressing to me. A RTX Pro 6000 or a 5090 has 1.8TB/s and the Mac Studio M3 Ultra is already at 819GB/s ... so these x86 systems will probably kill the Macs at PP but will lag behind on TG...and are woefully behind - even in 2027 releases - what NVIDIA launched as a discrete card in ... 2025.

I'm kind of sad about the current state of things because the options are sacrifice PP and get big semi-fast memory or get fast PP but a small 32GB VRAM or pay 3x to bump that up to 96GB.

Where's the 2027 2-3TB/s 128-256GB unified option with decent PP?

The answer seems to be it doesn't exist and it's not even on the public roadmap...unless you're willing to pay NVIDIA a whole lot of money for a DGX station (around ~$100K). The M5 Ultra might get close but will likely fall short of 1.5TB/s bw and the PP will improve but not to anywhere near NVIDIA levels... but I guess TBD on that.

I'd like to see someone mate an APU with GDDR7 memory.

I'm guessing that if the machine were primarily aimed at AI workloads (LLMs and image/video generation), the increased latency wouldn't be too bothersome.

[-]

rhythmdev@reddit

2030 > 3tb/s + 256gb vram , price = $10k, I can take that deal.

Till then, enjoy 5090’s and 6000’s.

[-]

FastHotEmu@reddit

Unfortunately, some Reddit users are unhinged.

[-]

FoxiPanda@reddit

Oh nice you saved that - I actually agree with you, I would also like to see this with a pretty big memory controller so we could get enough aggregate memory bandwidth to make it worth it.

[-]

Independent-Date393@reddit

Medusa Halo at 690 GB/s peak would actually lap M4 Max if those numbers land. Apple has had a 4-year head start and the x86 ecosystem is just now converging on the same architecture.

[-]

sn2006gy@reddit

PCs were built on building how you like it and upgrading how you like it. A lot of inertia in that. Took a long while for people to get used to SoCs and i'm still not sure that. SoC's are the best answer. I hope the accelerators come down in price vs more vertical integration as the only option.

[-]

arousedsquirel@reddit

Let them live in their Apple ecosystem. 4 years ahead lol, not when I was buying my gpu's, more like 10 years behind....

[-]

FastHotEmu@reddit

My concern is that these are all non-upgradable systems. RAM goes bad from time to time and upgrades are a positive for consumers.

Companies like Apple will say an SoC is a requirement but that's simply not true. There are sockets that could work for very high bandwidth (eg multiple SoCAMM2, mezannine connectors, HBM, optical, etc.), but the makers are licking their lips knowing that consumers won't be able to upgrade RAM as long as they make it an SoC and cite performance reasons. They also don't want to cannibalise their server offerings.

I love being able to run models locally, but I want the systems to be upgradeable and repairable.

[-]

Awwtifishal@reddit

RAM doesn't go all bad at once. I test for bad ram from time to time and patch up the little bits that fail after a few years, by telling linux those regions of memory are reserved.

[-]

Reactor-Licker@reddit

Nova Lake AX is canceled.

[-]

Terminator857@reddit (OP)

Replaced by Nova Lake AX+ ?

[-]

FoxiPanda@reddit

All of these memory bandwidth numbers are depressing to me. A RTX Pro 6000 or a 5090 has 1.8TB/s and the Mac Studio M3 Ultra is already at 819GB/s ... so these x86 systems will probably kill the Macs at PP but will lag behind on TG...and are woefully behind - even in 2027 releases - what NVIDIA launched as a discrete card in ... 2025.

I'm kind of sad about the current state of things because the options are sacrifice PP and get big semi-fast memory or get fast PP but a small 32GB VRAM or pay 3x to bump that up to 96GB.

Where's the 2027 2-3TB/s 128-256GB unified option with decent PP?

The answer seems to be it doesn't exist and it's not even on the public roadmap...unless you're willing to pay NVIDIA a whole lot of money for a DGX station (around ~$100K). The M5 Ultra might get close, but TBD on that.