Strix Halo 128GB vs M5 pro 64GB
Posted by DigitalguyCH@reddit | LocalLLaMA | View on Reddit | 50 comments
What would you pick if they were at the same/similar price, say around $3000 (Macbook pro 16" vs laptop at a little more or even Mini PC at a little less like $2500). Has someone tried both in terms of speed? I use LM studio. I tend to prefer MacOS because of Drawthings, which is much more user friendly than comfyUI (at least to me), but I believe it's 48 vs 96 GPU available RAM. Currently I am using a 24GB Macbook air and a 20GB AMD GPU in a eGPU dock with a 32GB RAM laptop, but I also have a 64GB RAM mini pc. Would the 20GB GPU make sense in a eGPU setup with Strix Halo?
Mysterious-Panic-325@reddit
If you want to extend your possibilities, go with the strix halo. None of your devices can run Nemotron Super, Qwen3.5 122b, MiniMax M2.7 or other bigger LLMs. DrawThings is nice but you can’t run as much models as on the Strix Halo in Comfyui. Also running Comfyui on the Strix Halo affords in 90% zero knowledge. Comfyui works out of the box on the preinstalled Windows 11. For all popular models preinstalled workflows are directly selectable from the templates tab. You just need to press the download models button and run it. Compared to a Macbook Air m4 the Strix Halo is day and night in terms of speed. Ltx2.3 5 sec videos in 640x640 are generated in about 190 seconds.
DigitalguyCH@reddit (OP)
Thanks a lot. I am not sure I understand what you mean by "out of the box". You mean that it's preinstalled in Strix halo? Because I had to mess with pyton and other stuff I barely understood to just install it on my laptop...
As for gaming, currently I have no time for gaming, but maybe at some point why not, but I also have a 7900 xt, I guess that is more capable that strix halo if used as a egpu (I also have an old desktop with a 2070 super, which I guess is on par with strix halo)
fallingdowndizzyvr@reddit
I have a old M1 Mac Studio and Strix Halo. The Mac Studio doesn't hold a candle to Strix Halo for AI, both LLM and image/video gen.
Anything before a M5 Mac isn't going to be competitive unless you are talking about a M3 Ultra. But that will be cost prohibitive.
Yes. I run a 7900xtx with my Strix Halo.
DigitalguyCH@reddit (OP)
I have a 7900 xt, so could I for instance run a model in a eGPU setup and offload part of it to the Strix Halo? Does it slow down a lot? Currently I have a laptop with 8840u and 790m, and it slows down quite a bit when the model does not fit in the 20GB vRAM
fallingdowndizzyvr@reddit
Yes.
The 790m doesn't have VRAM. It has system RAM that's dedicated to it. So when you offload there, you will be running the offloaded layers on the CPU. That's why you get a slowdown. On Strixy, the GPU can access all the RAM. So you don't offload on Strix Halo, you have a multi-gpu setup. The 7900xt and the 8060s.
DigitalguyCH@reddit (OP)
Great, now I understand. I can set like 8 or 16GB of RAM to the 790m in bios, and it's removed from system RAM and moved to the GPU, I guess it's like strix halo but to a much more modest extent, unless I am wrong. But since it's not much it still slows down quite a bit
fallingdowndizzyvr@reddit
It's exactly like Strix Halo. But the problem with the 790m is two fold. 1) It's using slow 2 channel RAM. So it's just slow RAM. 2) When you offload layers, that's going to be run by the CPU. Which is just slow.
So that's why it slows does so much.
Captain-Pie-62@reddit
I'm quite happy with my 128 GB Strix Halo. Runs 120b models just fine. Fast enough for me. Except for the 400b+ models, I can run anything on this machine. 64GB RAM doesn't cut the mustard for me.
peanutbuttergoodness@reddit
What’s your t/s on 120B models? Also what’s your llama command to run those models? I get 11ish and I can only do like half the layers offloaded. If I offload more than 68 layers I get 0% GPU usage and it becomes unusable.
Captain-Pie-62@reddit
I don't do benchmarking. I'm happy when the system answers faster than I can read (and I'm a fast reader!). I admit, that I initially had issues to run gpt-oss-120b at first, under ollama. Then I tried LM-Studio, eh voila! Also running Nemotron-3-Super.Love it! And some of the smaller ones, like Qwen-coder-next with 80b.
DigitalguyCH@reddit (OP)
I am ok even with 30-35b models but sometimes I need long context like 256 or more and I am afraid it's not going to fit on 64GB, especially while having a browser open.
ImportancePitiful795@reddit
Do you want a laptop or miniPC?
DigitalguyCH@reddit (OP)
If I can find a miniPC for at least 500 cheaper I could be ok, otherwise at similar price, a laptop
ImportancePitiful795@reddit
Well, AMD 395 128GB miniPCs used to be as low as $1600 just 6 months ago. Right now they are on the $2400+ range for the same machines.
Just FYI, MLX is currently in Beta for the AMD 395 via the Lemonade server, if interested for MLX.
fallingdowndizzyvr@reddit
$2700. $2400 was a couple of weeks ago.
ImportancePitiful795@reddit
Actually last week, as we had opened a discussion on the discord channel about it 🤣
Due_Duck_8472@reddit
You'll be able to do pervert roleplay just fine on both of them
What is your use case really?
DigitalguyCH@reddit (OP)
Image generation for my business (for posts, ads etc) and text analysis (summarizing some long texts, help preapring presentations etc.). No coding (or rarely and so far for the occasional scripts I have needed help, since I can't code, I have used Claude or Gemini). Maybe other things in the future, I have only discovered local LLMs recently, so I am trying to understand how they can help. So far image generation and editing has been very useful, but is very slow with models like qwen image and qwen image edit.
Due_Duck_8472@reddit
You would be happy with chatgpt or gemini
DigitalguyCH@reddit (OP)
you mean the paid version?
Due_Duck_8472@reddit
yeah sure, is that a problem? I take it no, since you plan to splurge 3k on a pc
DigitalguyCH@reddit (OP)
Yeah I am considering Gemrini AI pro for $20/month
Due_Duck_8472@reddit
Try it for a month and see if it floats your barge
PromptInjection_@reddit
64 GB is too little and the M5 is not blazing fast. I would prefer the 128GB.
putrasherni@reddit
M5 pro
brakx@reddit
Used M4 max might also work if you can get it for around $2500
DigitalguyCH@reddit (OP)
I haven't seen any, but I'll keep that in mind too
fallingdowndizzyvr@reddit
I wouldn't. See pre M5 Macs suck in compute.
asfbrz96@reddit
A gf is cheaper
Creepy-Bell-4527@reddit
That tells me nothing except you’ve never had a girlfriend, lol.
asfbrz96@reddit
My girl costs more than opus 4.7 at API price lmao
Creepy-Bell-4527@reddit
All the OpenClaw automations in the world couldn’t make my Claude API bill cost as much as this animal hoarding beauty costs but I digress…
_realpaul@reddit
Im convinced comfyui taught hundreds of thousands how to program a node based workflow and at least a couple hundred how to code their own custom nodes in python.
All in the name of waifus 😂
DigitalguyCH@reddit (OP)
I already have one... and.. it's not 😅
Bulky-Priority6824@reddit
Nah
mjTheThird@reddit
I would most def buy the M5 Pro for the memory bandwidth, your experience will be local model loading almost instantly.
Captain-Pie-62@reddit
Wrong assumption. Strix Halo has Unified memory. Meaning that CPU and GPU use the same RAM, only that you are free to decide, how much RAM is reserved for the GPU. And, IIRC this can be changed at runtime as well.
mjTheThird@reddit
It's MBP with M5 Pro or M5 Max, needs to be double pro'ed!!!
DigitalguyCH@reddit (OP)
M5 pro, M5 max is too expensive
mjTheThird@reddit
You will get better performance with Strix Halo. The baseline M5 MBP halfs the memory speed compares to M5 Pro MBP :/
DigitalguyCH@reddit (OP)
Baseline has only 32GB, I am talking about the M5 pro Macbook pro
mjTheThird@reddit
That's a good machine with the higher memory bandwidth. MBP M5 Pro
But the MBP M5 will only have half of the M5 Pro memory bandwidth. You can look up reviewer memory bandwidth tests.
tonyboi76@reddit
depends which workload youd be more sad to lose. for pure LLM size the strix 128gb wins clearly, that ~96gb usable runs 120b-class models (gpt-oss-120b etc) a 48gb mac just cant fit, and the commenter running 120b on his confirms it. the mac tops out around 30b comfortably, 70b at tight quant.
but you mentioned drawthings, and thats the catch, its apple silicon only. on the strix youre on windows/linux and back to comfyui which you already said you dont vibe with. so if image gen is part of your daily flow the mac kind of decides itself, and 48gb still runs solid 30b LLMs in LM studio.
speed-wise the mac is snappier on models that fit both, strix trades speed for raw capacity. so: want biggest-model bragging rights and dont care about drawthings, strix. want to keep the nice image-gen workflow and still run mid LLMs fine, mac.
pArbo@reddit
strix halo setups take advantage of unified memory - heavy weight model, large quant, big context - but will be a slower tokens/s on the more moderate quant/context that you likely end up running on the m5. accuracy or performance. the egpu won't offer much help to either platform.
PreparationTrue9138@reddit
Hi, I don't have strix halo or m5 max, but allow me to share what I know. I am an owner of a laptop with two egpus rtx 3090 and a m1 pro.
So you have now - egpu probably 7900 xt with 800 gb/s bandwidth 103 tflops int8 - mini pc, laptop, MacBook air Important here is egpu
For reference from Google search AI: M5 pro bandwidth 307 gb/s 16 tflops int8 M5 max bandwidth 600 gb/s 33 tflops int8 Strix halo 250 gb/s + 50 tflops int8
If I guessed your gpu right then I would go with strix halo with oculink. It's amd + amd I guess it will be compatible with rocm. Gpu will give you the speed you need for active parameters of your moe models. Oculink bottleneck might affect your speed a little, but I think it's better then just slow ram.
Mac is only better if you get m5 max version with 600gb/s bandwidth plus they promise prompt processing to be faster. But you won't be able to use your egpu. And maximum speeds might only be accessible via mlx engine.
So to put your gpu to good use and want to run bigger models I would go with strix halo. But m5 max might be faster due to fast unified memory.
tecneeq@reddit
Why not both? The Strix Halo Bosgame M5!
DigitalguyCH@reddit (OP)
mmh, didn't even know this was a thing....😅
Terminator857@reddit
Your comparison table is missing pricing.
DigitalguyCH@reddit (OP)
I didn't make a comparison table but budget is around $3000 max, less if possible
vaporcube7@reddit
If local inference speed is the main goal, memory bandwidth usually matters more once your model fits, so the M5 Pro will likely feel faster in generation. Strix Halo still has a real advantage for larger quantizations and multitasking headroom. Testing your exact LM Studio models is probably the cleanest way to decide.