M1 Max vs M4 Max vs M5 Max

Posted by br_web@reddit | LocalLLaMA | View on Reddit | 4 comments

I have an M1 Max 64GB, and I am planning to buy something newer and with more memory, that will allow me to run LLMs faster and maybe bigger size, not MoE. The M1 Max, gives me the following results:

LLM: Gemma 4 26B A4B MoE GGUF

Question: What is an LLM?
Thought: 13.89
39.30 tok/sec
1399 tokens
0.39s

Maybe in the future an MLX version of Gemma 4 will be even better, is it worth to spend $6K+ on a new MacBook Pro 16 M5 Max? Will I get 3x or 4x better performance, thoughts? Thanks

[-]

shbong@reddit

You can aim to max 50/80 token/sec not more so 3x or 4x are not on the table

TheShawndown@reddit

I think that the memory bandwidth of the M5 Max is around 50% higher of the M1 Max and around 10% of the M4 Max.

As far as I'm concerned, the speed increase is lineal.

I have an M1 Max and just ordered an M3 Max, same speed, but more RAM.

The irony here is that most commercial models are around 50 tokens per second at most. However, the weakness of Macs is prompt processing. M5 improves it a lot, but it is still much slower than real GPUs ( and that is why I am sort of wondering between buying m5 ultra if it shows up, or 5090/6000pro.

Mr_Moonsilver@reddit

Make sure to also focus on prompt processing / prefill speed, especially with long context prompt.