Another how much ram should i get on an M4 Max MBP thread
Posted by flying_unicorn@reddit | LocalLLaMA | View on Reddit | 42 comments
I'm gonna get an M4 Max MBP, i'm just debating how much ram. I'm debating between 64gb and 128gb and leaning toward 64gb. My thought process is any models i could run on 128gb will just be too slow, and thus pointless, so 64gb is probably the sweet spot. Would you all agree?
As far as my use case: I play around with Local LLM's a bit as a fun hobby, and i use them a bit to help me with my small business on my rtx3080ti in my gaming rig. Cloud LLM's are an issue to do the sensitive nature of my work, but it also doesn't make sense for me to spend 10k on a dedicated llm rig.. yet.
I'm not interested in waiting for the m4 ultra mac studio.
NEEDMOREVRAM@reddit
+1
koalfied-coder@reddit
48gb to me is the best play. I purchased the 128 but I'ma return it. Either I'm getting a 48gb model or an air.
NEEDMOREVRAM@reddit
Why not the 64GB? Can you run Qwen 2.5 32b coder?
koalfied-coder@reddit
It can have enough headroom. For me 48 is sweet so I can code in jetbrains and run a smallish LLM. By the time I upgrade to 64 it's just not worth it to me. I would sooner get a Macbook air and a 3090 machine to offload inference and maybe training. Currently on my 128gb I only use about 24-36 GB on the regular for work. So for me it's not worth it. If I was like on an island with only one machine and no cloud options etc. Then I would pay extra for ram and all that. Btw the MacBook air is pretty great for most everything. However if going M4 the 14" pro is more compelling to me.
NEEDMOREVRAM@reddit
Well, if I can afford it I will probably get the 64GB M4. But will most likely get the 48GB.
I see that you're a coder...I'm considering starting to learn Python. Do you think it's too late now that AI can code?
MaxDPS@reddit
It’s not too late. At the very least, knowing how to code will help you write better prompts.
NEEDMOREVRAM@reddit
Why not the 64GB? Isn't the Pro not as powerful to run LLMs?
Vegetable_Sun_9225@reddit
128 no question. My work MB is 64 my personal MB is 128 and I get annoyed by what I can do with my work MB. Remember long context takes a lot of ram. So even if you're using smaller models you can eat up the memory real fast with long context.
VibrantOcean@reddit
Would you say 128gb is worth 50% extra cost vs 96gb? (Mac Studio)
Vegetable_Sun_9225@reddit
Can you provide links to the two things you're comparing. Are you comparing the Max with the Ultra? Cause the M2 Ultra is crazy fast. The Ultra's memory bandwidth is like double.
VibrantOcean@reddit
It was Studio Max 64gb vs Studio Ultra 128gb on the Apple Configurator
Vegetable_Sun_9225@reddit
Yeah you're getting a lot more than extra memory with the ultra. The max has 400 GB/s the Ultra is 800 GB/s. LLMs are memory bandwidth bound so your inference speed will be double. Definitely work it to pay the extra for the ultra.
VibrantOcean@reddit
Didn’t think about that. Good point. And I assume at these speeds every token per second will be a big QOL improvement.
koalfied-coder@reddit
Not for LLM a GPU cluster is a much better value. You can get a Macbook air for the rest and be straight.
Strange-History7511@reddit
Yes
zra184@reddit
I got the 128GB, no regrets. However I forgot to increase the SSD size, wish I would’ve gone up to 2TB so I could fit more than a few 70b+ models.
flying_unicorn@reddit (OP)
yeah, i'm thinking 2tb is the sweetspot for me. Anything else i will use cloud drive, or an external ssd over TB.
rythmyouth@reddit
At 2TB wishing I got 4
koalfied-coder@reddit
Umm for your use case a Mac isn't great coming from a M4 max owner. 2-4 3090s/ a5000s is about 5-10x faster when accounting for processing. While my Max is "usable" I wouldn't say pleasant. I would sooner get a GPU machine and MacBook air than a max for LLM.
flying_unicorn@reddit (OP)
Due to apple limitations and the way i want to use a mbp I'm stuck getting a max anyways. I figure i'll use it for LLM until i get to a point where building a dedicated LLM rig makes sense. I have a dual 4k display which is basically treated as 2x4k displays. And i have 4 external displays at my desk off my docking station.
koalfied-coder@reddit
However if buying Mac 64gb seems like the smart play if not 48gb. This would allow llama 3.1 7b to run quite nice.
CompetitiveYak5863@reddit
I purchased the 128GB model. The main advantage I've found is it can run Wizard 8x22B from VRAM with good performance. MoE models are uniquely suited to the mix of high VRAM vs more limited compute you get.
koalfied-coder@reddit
Please, what are your tokens a second with a large context window and prompt processing?
CompetitiveYak5863@reddit
I asked it to summarize the Declaration of Independence:
Processing Prompt [BLAS] (1906 / 1906 tokens)
Generating (281 / 512 tokens)
CtxLimit:2187/4096, Amt:281/512, Init:0.01s, Process:20.61s (10.8ms/T = 92.47T/s), Generate:21.38s (76.1ms/T = 13.15T/s), Total:41.99s (6.69T/s)
koalfied-coder@reddit
I mean 6.6 tokens a second isn't that bad. Assuming your machine is multi use it's a good value proposition.
GimmePanties@reddit
Declaration of Independence is like 1776 tokens so hardly a stretch
koalfied-coder@reddit
Exactly that's terrible now that you mention the context size
GimmePanties@reddit
yeah, I have the 64Gb, and I don't find myself using the larger models because life is too short for 6 t/s.
Life_Tea_511@reddit
how much did you pay for that?
davewolfs@reddit
Save your money seriously. Just get 48gb and skip the max if you want it purely for LLM.
TooCasToo@reddit
Same here.. m4max(40) 128G 4TB....
Kinda wish it was 256G....
More memory is Gooder and bestest. :)
fallingdowndizzyvr@reddit
Neither. Spend a little more and get a 192GB M2 Ultra.
koalfied-coder@reddit
https://github.com/ggerganov/llama.cpp/discussions/4167 This is a speed comparison.
koalfied-coder@reddit
Tldr Mac is slow AF. I'm selling 128gb Mac to get a 48gb for general development work and smaller models.
SniperDuty@reddit
128GB - you wont regret it.
I'm running Llama-3.1-Nemotron-70B-Instruct-HF-abliterated-i1-GGUF at 6 tokens per second at around 100GB memory usage. On paper it said it should be much less than this. Also, MPS is kicking CUDA's ass on image gen and audio gen.
32b runs at about 18-20 tokens per second on full power mode.
koalfied-coder@reddit
This is not true comparison. If the Macs has 5x the available VRAM of course it will do better. Compare it VRAM to VRAM and I would bet NVIDIA claps it. My 4 a5000 are easily 20x as fast as my M4 max with 128gb ram.
chibop1@reddit
You can increase the max GPU memory limit to 56gb and run a 72b model, but it's pretty tight. If you want to be able to keep other apps running as well as running 72B+ model with long context simultaneously, definitely get 128GB.
datbackup@reddit
Definitely 128GB. It’s context size rather than number of parameters that will increasingly become more valuable. And more context requires more RAM.
JacketHistorical2321@reddit
The most you can afford
ttkciar@reddit
If you can afford 128GB, get 128GB.
If you don't, you will eventually run up against a task which needs it, and will feel regretful and sad.
My T7910 has 256GB, and I have on occasion wished for a little more.
Dry_Parfait2606@reddit
Running an llm at 1-2t/s is still a real thing... If you get more precision out of it, I would mean that it's worth having a spare ram...
If it doesn't rip a hole into your pockets, get more performance... If you know that you are happy with 64gb models and you don't plan to test and play around with different models, then go for less ram.
It all depends on the usecase... My mindset was: give me the quality of Chat-GPT 3 and I'm happy... Im now more eager to open up for what is coming in the future...
vert1s@reddit
I have a 96GB M2 Max and I don’t find the large models too bad. You only get about 80% of that number by default so can run a 70B model at q8/6. At about 2 tokens a second. Or a 100-120B model at a lower quant and about the same speed.
64GB is going to limit you to the ~30B models, which might be fine for your needs.