Is 64gb on a m5pro an overkill?
Posted by AdEnvironmental4189@reddit | LocalLLaMA | View on Reddit | 43 comments
I‘m deciding between 48gb and 64gb, of course the more ram the better. But I’m not so sure if 64gb would improve 30b model performance (maybe 70b but with a slow rate of token/s).
M5pro is reaching my budget limit, I’m a rookie to llm, so I would like to know if anyone can explain.
anzzax@reddit
I think with current economy m5pro 64gb is a sweet offer. I have m4pro 48gb and happy with it, but $200 for +16gb of extra unified ram is a steal and will enable more creative options to run interesting workflows and automations. It's not just for LLMs but docker, extra services and sendboxes also require RAM.
AdEnvironmental4189@reddit (OP)
Thanks, I eventually ordered 64gb+1tb
CatPuzzled5725@reddit
How is your experience? I am thinking of ordering the same.
AdEnvironmental4189@reddit (OP)
I got qwen3.5 35B A3B 8bit for my local use, the model itself takes about 40g, 16k ctx, output at 50-60token/s , with usually 52g/64g and occasionally 60g/64g total ram usage.
I also tested other models e.g. qwen3.5 27b and its Claude variant, it takes only 20g but token/s is much slower (15token/s)
I think 48g is ok but 64g is the sweet spot.
CatPuzzled5725@reddit
Thanks mate, I am also thinking of going with 2TB just to allow for more space.
AdEnvironmental4189@reddit (OP)
The storage is totally up to you, if you are only using llm then you don’t need that much storage. At the end of the day, you will stick with the model you favor. For now, I kept qwen3 coder, qwen 3.5 35B A3B, Gemma 4 26B A4B and qwen 3.5 27B Claude distilled, and they take approx. 150gb storage in total.
Additionally, you can check omlx.ai leaderboard to see ai performance on M5pro. 27B dense model feels too slow for me, so I stick with 35B. Also I just learned that LM studio has its mobile app so you can set your laptop as a local server, I’ve been playing it for days.
AdEnvironmental4189@reddit (OP)
I test it, and to my surprise, it turns out excellent. 122b IQ2-m runs pretty well on my M5pro and I find that IQ3-xxs is its limit.
IQ3-xxs is using 50.83 gb full loaded (model itself is 44.76gb) 32k ctx, 59.07gb/64gb total system ram usage, 48/48 gpu offload (I also tested 46/48, no big difference, as there’s not much ram remain for other performance consuming apps, Chrome still good tho) 36 tokens/s output at the beginning, 28tokens/s output when reaching 32k. 6-7s first token generation time the whole test.
Very excellent.
GloomyPop5387@reddit
No. I have an m4 pro max with 128gb. I’d give a lot to have 256.
ImpressiveHair3798@reddit
M4 pro 128 c’est impossible mdr
AdEnvironmental4189@reddit (OP)
Thanks buds, of course the more ram the better, but pro model is nowhere near your max model in terms of bandwidth/gpu, so I’m not sure if 64gb on a pro chip is my go.
GloomyPop5387@reddit
Qwens smaller moe’s will be your friend with 64gb. They are quite good.
The 70b llms like llama3.3 are painfully slow for me.
superSmitty9999@reddit
Yeah I think dense models are slow right?
AdEnvironmental4189@reddit (OP)
Thanks buds
sleekstrike@reddit
I ordered M5Max with 128G RAM. I'd have gone for 256G if that was an option tbh because I want to run Minimax 2.1. I think the ~300B parameters mark is a sweet spot, maybe I'll change my mind after running qwen3.5 122B A10B.
wyudtix@reddit
Did you get 14 or 16 inch
sleekstrike@reddit
16 inch.
AdEnvironmental4189@reddit (OP)
im jealous haha
superSmitty9999@reddit
If you are looking to run local models no amount of ram is overkill, you could use a TB of RAM if you had it.
wewerecreaturres@reddit
Ah yes, the $4700 spark when dude is capping his budget on an m*pro MacBook.
aeonbringer@reddit
Asus version for 1tb ssd instead of 4tb is $3300 (used to be $3000 few weeks ago).
superSmitty9999@reddit
You mean $3000?
wewerecreaturres@reddit
I just looked at nvidia. Either way, more than OP can afford on top of still needing a MacBook
superSmitty9999@reddit
Yeah maybe my math was wrong there lol I’m like isn’t Apple ram thousands of dollars haha
wewerecreaturres@reddit
For real. Their ram prices have always been out of control
superSmitty9999@reddit
Yeah so I looked it up, if you went crazy and got a Neo + spark it’s only a $900 difference for double the vram lol
Koalateka@reddit
Go for the max you can buy within a reasonable price for you
chibop1@reddit
Remember, you have to share 64GB with MacOS as well as whatever you are running background and foreground.
You can allocate up to 56/54GB to GPU, and leave 8GB for everything else.
BitXorBit@reddit
I wouldn’t go for 64gb, it’s too low to run anything good
Cergorach@reddit
I have a M4 Pro (20c gpu) with 64GB memory. This is not just an advantage with LLM, but also other things like multiple VMs, large files for image/video editing, etc. That means that a maxed out model will often be in more demand, while in fewer supply, so if you ever decide to sell the device, it will keep it's value better (at least of Apple devices).
For LLM it's the difference between being able to run something big (slowly) or not at all. My mac mini can run a 70b MLX model at \~6t/s. The new machine will probably be a bit faster, the previous issue was with context processing being pretty darned slow, there has been a big improvement with the M5 (pro) generation on this front. More memory also means larger context windows (if the model supports that).
Is that M5 Pro going to be fast with such big models? No! But when you need them, you can run them...
If this is reaching or exceeding your budget limit, wait a while, save a bit more before pulling the trigger. Folks are currently going nuts over the new stuff, and in depth reviews and comparisons are sparse to say the least. Wait a bit, watch some reviews, and then pull the trigger. Just keep in mind that what you buy you're stuck with for years, unless you replace the whole machine.
nborwankar@reddit
Maximize RAM but drop to M4
YourVelourFog@reddit
I was looking at the same thing but the difference between the 48GB and 64GB of RAM was $400, which IMO is worth it for the difference in 16GB. I think when looking at the overall price of the system it was an extra 11% of the cost?
c64z86@reddit
No I would say 64GB is a very good starting point if you want to use the 27B/35B at the higher quants, or maybe even the 122B at the lower quants. On my 64GB laptop with a 12GB GPU I can just about run the UD 3Q quant of the 122B at 15 tokens a second.
BumblebeeParty6389@reddit
Yes. 64gb is overkill for 20-30B models and not enough for 100B models. Unless we get 60-70B models again with your 64GB ram you'd be running 30B models like 32-48gb RAM people. So either get 48gb ram or change your plans and go for 96gb mac studio
wewerecreaturres@reddit
I always go for the most ram available for the given chip choice. If nothing else, it’s faster for longer.
bad_detectiv3@reddit
I should have done this with my ddr5 purchase
AdEnvironmental4189@reddit (OP)
Thank you for your advice
DoodT@reddit
I went for 2x48gb for my setup without hestitation, as that stuff aint gonna got cheaper
Theres no overkill, get your 64gb
Tomatillo_Impressive@reddit
Buy it
Icaruszin@reddit
I would go for the 64GB. Local models are like a drug, I have a M1 with 64GB thinking it was enough and now I would love to get a 128GB...
Conscious_Cut_6144@reddit
Depends what you are doing, but big MOE's are kind of the ideal model for M* silicon
And to run big moe's you need lots of ram.
Due_Net_3342@reddit
that is not even a kill for local llms. you need vram for cache also, unless you plan start new conversations each time. Also it depends on what type of model you want to use, MoE vs dense(30b moe? yes maybe, 30b dense? probably not and in any case the tps would be very bad at any 20-30k context). Also don’t believe people here saying that 4bit quants are almost lossless cause they are not. If you plan to do professional work i recommend q5 at a minimum. So in short it depends, you want just to play around? 48 is enough. You want to do something professionally? not even 64 is enough
AdEnvironmental4189@reddit (OP)
Thanks buds, so even if m5pro is slower than the max model, I can still get benefit from 64gb right?
ttkciar@reddit
Yes, it would enable you to use 49B Nemotron models at decent context, 72B at restricted context, and 27B/30B/32B models at full context.