Thinking of buying a mac to get into local LLMs

Posted by BestSeaworthiness283@reddit | LocalLLaMA | View on Reddit | 41 comments

I want to buy a macbook pro m5 with 32 gb of ram. That being the max ram for the pro with only the m5 chip.

Currently i have a gaming laptop with an rtx 4060 and i have a problem with the vram not being enough.

Do you guys think this is the way to go if i want to get into LLMs or Ai? If so is this laptop a good choice?

[-]

Pither404@reddit

Why you ppl love this crap apple devices??? Jesus look for V100 32GB use double of them pay half price od this garbage apple and you are good to go remember apple its CPU NOT GPU

[-]

Only-An-Egg@reddit

V100 are EOL and no longer supported in latest CUDA drivers

[-]

-dysangel-@reddit

That's not technically correct. There are dedicated GPU cores alongside the CPU cores. The two have very different architectures and properties.

I think you're trying to make the point that those GPU cores are not as fast as dedicated GPUs, which is true. Especially for non Ultra chips.

The M5 Ultra will be getting pretty close to dedicated cards though in terms of speed and bandwidth - but it will have more RAM and lower energy usage, so it will be a very serious contender in the space.

[-]

Pither404@reddit

Yeah ARM = 8GPU cores Versus 5000Cuda Cores 😂

[-]

-dysangel-@reddit

My M3 Ultra has 80 GPU cores.

The core comparison is not 1:1. A single Apple Silicon GPU core does more work than a single CUDA core.

Your point that a dedicated GPU is faster is correct - but you clearly have not researched the details.

[-]

synn89@reddit

Not really for that amount of RAM. The Mac's have downsides compared to your Nvidia card, but the upside is they have a lot more RAM available. Being able to access 48/64/128GB of space for models makes Mac pretty attractive. For 32GB or RAM, you'd probably be better off getting a Nvidia 3090/4090.

[-]

redpandafire@reddit

I was planning to go the macOS route but I would say 64GB is the minimum nowadays. The MacBook Pro is expensive for that, and I would do the Mac mini m4 pro that’s about 25% less. But where I’m at the m4 pro mini is like 2/3 of the way to a dgx spark which has native cuda and 128GB total memory and much more memory bandwidth.

[-]

-dysangel-@reddit

plus there should be M5 or M6 minis coming out eventually, which will be way more performant and in a better price range since they don't need batteries or displays

[-]

JLeonsarmiento@reddit

M5Pro mini with 64gb RAM is basically Local Hermes/OpenClaw machine.

[-]

JLeonsarmiento@reddit

minimum for mental sanity is Pro or Max chip, and 48GB ram or more.

[-]

Equal_Television_894@reddit

I have 48GB M4 Pro and that vram is just barely enough to run a good model like qwen3.6-35B-A3B mlx 4-bit with 128k context as I start working the browser, opencode, ide and few other application just crash it sometimes. I will say atleast go for 64 or 128 and consider max. If budget constraint then you are anyways stuck with it like me.

[-]

Relative_Rope4234@reddit

How is the promt processing speed on m4 pro ?

[-]

Equal_Television_894@reddit

Slow

[-]

lolwutdo@reddit

Seems about on par with cuda 5070ti with cpu offloading

[-]

DehydratedDuckie@reddit

I’ve recently made the transition that OP has made. I now have an m5 pro with 48gb. I run qwen3 32b 4bit and it runs fine at the moment

[-]

BestSeaworthiness283@reddit (OP)

Thank you a lot

[-]

hurdurdur7@reddit

I agree. I wouldn't look at 32gb devices at all. For a good experience 64gb (to fit your model, context cache and everything else you do) will probably be minimum.

[-]

jacek2023@reddit

Consider desktop PC, then you can just add more GPUs. And it will be cheaper.

[-]

BestSeaworthiness283@reddit (OP)

How can it be cheaper?

[-]

jacek2023@reddit

MacBook is cheap?

[-]

BestSeaworthiness283@reddit (OP)

No but i think that the vram equivalent woukd be more, wouldnt it?

[-]

jacek2023@reddit

24GB card GPU will be much better than 32GB MacBook, check the prices, also you can buy multiple GPUs

[-]

BestSeaworthiness283@reddit (OP)

Thanks!

[-]

jacek2023@reddit

look at the downvotes :)

[-]

BLOCK__HEAD4243@reddit

Waiting for the M5 studio 256 to drop. 512 if I’m lucky!

[-]

blackjacketw@reddit

Try out the models with openrouter.ai first. If those models fit your use case and the machine, then you can make the informed decision.

[-]

BestSeaworthiness283@reddit (OP)

Thank you! Good ideea.

[-]

slavetothesound@reddit

Also consider the quant of those models. Sometimes a q8 feels a lot better than a q4

[-]

slavetothesound@reddit

I don’t know anything about training, but regarding inference:

I just upgraded from 32gb m1 that I couldn’t run any worthwhile models with. Don’t consider anything under the m5 series because the prompt processing is far slower and you will wait substantially longer between each response

My new m5 pro 64 fits many of the popular models and can do 30B models in q8 with lots of context Dense models are still feeling slow and I can’t fit the 120B MoE models that I hear such good things about even at q4. When I want to talk to a dense model on the m5 pro it involves waiting for about 5 minutes between responses (~10 tps at low context).

An m5 max allows 128gb ram but the difference in price is pretty substantial. I wish I could have afforded the cost but double the memory bandwidth of the pro means the difference between running larger models at all, even though they’ll be quantized, the big MoEs will still feel fast. 27B class models can be run unquanted or at q8 and feel actuallyi useable for discussion.

[-]

gutard@reddit

I got 48gig wished I got 64gig

[-]

-dysangel-@reddit

I got 512GB, and am still waiting for M5 Ultra to 4x prompt processing..

[-]

BestSeaworthiness283@reddit (OP)

Dang.

[-]

AdLumpy2758@reddit

Look at PCs with AI 395+ it is 128 gb of unified memory.

[-]

hurdurdur7@reddit

But these are dog slow compared to m5 ...

[-]

AdLumpy2758@reddit

For me it is better to run Qwen witout quantization locally ( i mean I started agent and went away) then not to run it at all. But I agree it is slower.

[-]

shokuninstudio@reddit

You want memory not just for the model but also the system and the apps you are running at the same time. All of these consume more memory every year or two.

[-]

Sparescrewdriver@reddit

I’d say 48GB minimum.

M4 Pro Mini 48GB

I can load the ‘popular’ ~30b range models. Get pretty good speeds MoE models like Gemma 26B A4B and Qwen 35b a3b. Forget about dense models in that range, it will load but the speed will be agonizingly slow. But that’s more about a combination of the slower bandwidth and base M4 CPU

[-]

BestSeaworthiness283@reddit (OP)

Thank you!

[-]

-dysangel-@reddit

Only 32GB of RAM means that you get the worst of both worlds, because Macs are pretty compute poor compared to standalone GPUs. IMO either get a fast GPU with 24GB or more of RAM, or get a Mac with at least 96GB of RAM for it to be worth it.

If you can't afford either of those yet, just wait a couple of years and either the models will be more efficient, or there will be spare hardware floating around as datacenters and businesses upgrade. Though there is the chip shortage atm so maybe prices won't go down as much as they usually do.

[-]

BestSeaworthiness283@reddit (OP)

Thanks!

[-]

Equal_Television_894@reddit

Poor after initial prompt and only moe are good