Macbook M3 MAX 64 vs M5 PRO 48, or wait for spark/studio

Posted by Holiday_Leg8427@reddit | LocalLLaMA | View on Reddit | 18 comments

I’m choosing between two refurbished MacBooks, both around $3,100.

Option 1: 14” M3 Max, 16-core CPU / 40-core GPU, 64GB RAM, 1TB SSD.

Option 2: 16” M5 Pro, 18-core CPU / 20-core GPU, 48GB RAM, 1TB SSD.

Main use is work/dev, lots of tabs, multitasking, maybe Docker. But I’m making this post mostly because I want to know which one is better for local AI/LLMs.

I don’t plan to train models or do anything too crazy. I just want to run local models for coding help, writing/debugging scripts, and maybe working with sensitive data that I don’t really want to send to cloud AI tools. I work in the EU, so I also need to be careful with GDPR.

Longer term, I want to build some kind of local personal brain / RAG system that can index my files, notes, docs and code, then let me ask questions about them. Maybe later I would try some local agent that can go through folders and help me find/summarize things, probably read-only at first.

I’m completely new to this, so any tips about system requirements, setup, or good-to-know things before buying would be really helpful.

Currently I have a MacBook Air 16GB and a Mac mini 16GB, both base M4 models. I’m thinking about selling them, or at least selling the MacBook Air if I buy one of the MacBooks above.

Or do you think it makes more sense to keep the MacBook Air, sell the Mac mini, and put more money later toward something more AI-focused, like Nvidia Spark / Mac Studio when it releases?

Basically I’m trying to decide if I should get one strong laptop for everything(if you guys think this is a good starting place, or just get a stronger desktop machine later for local LLM/RAG stuff.)

[-]

VectorD@reddit

4bit quants are not really desirable imo. At least 8bit.

[-]

AXYZE8@reddit

Idk, I'm using oQ4 in Qwen/Gemma models and they work very good in coding.

What is important here I'm still talking about dense models (with DFlash for speed boost), not MoE with 3B active and about oQ quants, not typical MLX static quants. Are we on the same page?

[-]

VectorD@reddit

You don't see any value in 8bit at all for a dense model? You think extra ram is useless? Im very confused about this position

[-]

AXYZE8@reddit

OP compares 64GB M3 Max to 48GB M5 Pro.

M5 Pro has 3x faster processing speeds (because matmul) while using 60% of the power (because it's smaller Pro chip).

If we goes with 64GB M3 Max he is about to witness prompt processing that takes ages, while his 14" chassis gets toasty to the toach and fans get very loud. Battery? dead in 1h.

If he goes with 48GB M5 Max it wont be toast and it will be very quiet.

He will lose 12GB of RAM, sure but there is no model in between 30B (that fit on both laptops) and 120B (that wont fit on either) that would make a good use of it.

Whatever fits in that remaining 12GB doesnt matter that much, because: a) there is no MoE models in 50-80B range that could make use of it in 4bit b) 120B models require you to use 2bit quant at which point you can ignore them c) 30B models fit both at oQ4. d) M3 Max has a lot slower processing. even if new Qwen 4 will have 512K context and you could make use of that extra 12GB RAM, will you wait like 20 minutes for first token?

[-]

VectorD@reddit

Hey man, we are talking about RAM here, not cpu.

[-]

AXYZE8@reddit

[-]

VectorD@reddit

You are just stepping into the territory of autism now

64GB RAM doesnt give you more options than 48GB RAM in terms of LLMs. 120B models wont fit either, 30B models fit both. There is nothing worth using in between since Llama 3.3 70B.

M3 Max has more bandwidth, but M5 Pro completely destroys it in prompt processing, like by a factor of 3x even tho it has a lot less of GPU cores. Pro chip also will use WAY less power. Max chip in 14" Macbook is toasty and can get loud, especially with long prompt processing.

I would say that M5 Pro is the way to go and I would put Qwen 3.6 27B / Gemma 4 31B on oMLX. Both have DFlash models released on HF which will give you nice TG boost.

In terms of Mac mini/Air/Studio it depends entirely on your workflow, cant recommend anything here

64GB gives you a lot more options for sure, you need space for context, not just space for fitting the model alone.

Gemma 4 uses 11GB for full 256K context at fp16 because of SWA. Qwen has DeltaNet so its efficient too, IIRC takes like 16GB.

11GB KV + 19GB for Gemma on oQ4 quant = 30GB

30GB obviously fits within 48GB. Then in oMLX you also have SSD cold KV caching and TurboQuant.

I mean if he has momey to get M5 series with 64GB RAM then yea sure more RAM is alwyas better, but its M3 Max vs M5 Pro comparision.

M5 Pro will do it 3x faster while using like 60% of the power. That difference is absolutely massive, because Apple added matmul acceleration in M5. On one MacBook you have quiet and fast inference even on the go, on other an opposite of that. Cherry on top is 14" chassis for M3 Max - enjoy 7000RPM for 8 minutes when it processes that long context.

Due_Duck_8472@reddit

The M5 will die if it runs fans at max for such a long time. The lifetime of the computer will be in months, not years

MacBook Pro 16" with M5 Pro won't run fans at max speed ever, that cooler has the capacity to run M5 Max that eats twice as much power.

MacBook Pro 14" on the other hand has way less cooling capacity than 16" that doesn't have capacity to cool Max and it has that Max chip.

What are you trying to write? That M5 Pro has the most powerful cooling Apple makes and that chip uses half as much power as Max.

Did you make a typo and meant M3?

itsmunzir@reddit

m3 max 64gb wins for local lIms because vram is the bottleneck. 64gb runs a 40b q4 model comfortably; 48gb forces compromises.

Long_comment_san@reddit

Either you run smaller dense models slowly at 48gb or you run large moe models at 64+. 64 is a deadzone to be honest. Zero change from 48 on dense front and not enough to run quants of larger MOE models.

I say get 48 and run local dense models on the fly and maybe build yourself a proper server so you can slap 256gb ram + 48gb VRAM there and use more or less all modern models.

JLeonsarmiento@reddit

M5Pro.

Famous_Lime6643@reddit

I’d say if you’re going to use local models for coding the RAM on either of those may not be enough. I currently have a a Mac Studio M3 ultra with 96 GB and it works pretty well, but I’m always a little afraid of what would happen if a bunch subagents got spun up. My MacBook M4 pro with 48 gb drags a bit on the same models.

TBH I’ve been curious about local coding agents and how the steering works so that’s why I’ve been playing around with them — but for anything serious I still use a cloud model. Curious what others are doing? Particularly system hardware//llm//harness combos that work well for you?

SexyAlienHotTubWater@reddit

Just buy an M1. They're much cheaper and have similar bandwidth.

a-babaka@reddit

M5 pro is great but 48 isn't enough if you plan to run something else except of llm's at the same time.