Would a MacBook M5 16/24/32GB be an upgrade, complement, or waste next to my RTX 4060 laptop?

Posted by heitortp0@reddit | LocalLLaMA | View on Reddit | 38 comments

Hi everyone,

I’m trying to understand whether buying a future/possible MacBook M5 with 16GB, 24GB, or 32GB unified memory would make sense for my local AI workflow, or whether it would mostly be a waste given my current setup.

My main machine is:

Acer Nitro laptop

RTX 4060 Laptop GPU, 8GB VRAM

Intel i7-13620H

32GB RAM

Around 1.5TB SSD

Windows 11, with WSL2/Linux available

My current/desired local AI use cases are:

Running local LLMs through LM Studio, Ollama, llama.cpp, etc.

RAG over legal/jurisprudence documents

Transcription with faster-whisper

Document processing and summarization

Possible local agents / automation

Maybe voice assistant experiments

General AI tinkering without relying entirely on cloud APIs

I understand that the RTX 4060’s 8GB VRAM is the main limitation for larger models, but it is still a real NVIDIA GPU and works well with many local AI tools. On the other hand, Apple Silicon has unified memory, great efficiency, battery life, and seems attractive for running larger quantized models that do not fit in 8GB VRAM.

My question is: would an M5 MacBook with 16GB, 24GB, or 32GB unified memory actually improve my local LLM experience in a meaningful way?

More specifically:

Would a 16GB M5 be pointless for local LLMs compared to my RTX 4060 laptop?
Is 24GB unified memory enough to make the MacBook a useful complement?
Is 32GB the minimum where Apple Silicon starts to make real sense for local LLMs?
Would the MacBook be better as a secondary portable/efficient machine rather than a replacement?
For my use case, would I be better off spending the money on a desktop GPU with more VRAM instead?
Are there workflows where the MacBook + RTX 4060 laptop combination makes sense, or would I just be duplicating capabilities?

I’m not trying to train large models. I mostly care about inference, RAG, document workflows, transcription, and experimentation.

I’d especially appreciate opinions from people who have both an NVIDIA 8GB VRAM laptop and an Apple Silicon Mac with 16–32GB unified memory.

Is the MacBook a real improvement, a nice complement, or just not worth it for this setup?

[-]

SeoFood@reddit

I’d think of the Mac as a complement, not a replacement for the 4060 laptop.

Your NVIDIA machine is still the better “I want CUDA support and maximum compatibility” box. For a lot of local AI tooling, that matters. The Mac starts making sense if you specifically value portability, battery life, quiet operation, unified memory for larger quantized models, and a smoother daily-driver experience.

For your listed use cases:

Transcription / faster-whisper: either machine can be useful, but the Mac is nice if you want quiet portable transcription.
Local LLMs: 16GB feels limiting pretty quickly. 24GB is more comfortable. 32GB is where it starts to feel like a serious local-AI secondary machine.
RAG / document workflows: RAM and SSD matter more than raw peak GPU speed once the pipeline is set up.
Tinkering: keep the NVIDIA laptop if you care about broad compatibility.

If money is tight, I wouldn’t buy a 16GB Mac mainly for local LLMs. If you want a portable complement and can stretch to 24/32GB, it becomes a lot easier to justify.

[-]

ProfessionalSpend589@reddit

Laptops are for travelling.

Do you travel a lot and have many hours without stable/fast internet, but have electricity nearby to keep the laptop battery from draining? - A more powerful laptop may suit you.

Everything else - probably not. Actually I would say a strong not. Why would you want 2 batteries, 2 displays, 2 sets of keyboard and touchpad and speakers for running whatever type of workload you’re running now? None of those contribute with tokens, yet they are an expense.

[-]

heitortp0@reddit (OP)

I have a motor disability, which makes laptops easier to use for me. Also, my idea was not to carry two laptops. The MBA would be my main machine daily, and the Nitro would work as a more "brute force" workstation.

[-]

ProfessionalSpend589@reddit

I’m not sure it’ll be good for the batteries to be plugged in 24/7. Does the Nitro laptop allow you to limit battery charge to 60% or 80%?

A mini pc + eGPU and 32GB GPU would have been an obvious suggestion last year.

[-]

heitortp0@reddit (OP)

Yeah, I limited the Nitro battery to 80%. You're absolutely right about the eGPU suggestion; it's a shame those prices have skyrocketed.

[-]

neuromacmd@reddit

For your stated mix, a base M5 is a nice complement, not an upgrade, and only at 24GB minimum, ideally 32GB. If portability and silence matter, get the 32GB and keep the 4060 for whisper and fast small-model work. If they don’t, a used 24GB desktop GPU is the better spend by a wide margin. The one config I’d actively talk you out of is the 16GB it’s the worst of both worlds for this workload. The Mac does not necessarily give you a lot faster inference and more importantly, does not buy you faster prompt processing which is under appreciated everywhere. The only way there is fast video cards (ideally Nvidia) this is why the 3090 is still so popular. What it does get you is a very efficient computer with great battery life that can do some llm inference on the side. I have tried all sorts of hardware for local inference and if I had to start again I would go the Nvidia route from the get go if that is your primary intended use. This is coming from someone who’s laptop is a MacBook Pro.

[-]

heitortp0@reddit (OP)

Maybe replacing my Nitro with a PC with a better GPU would be the best alternative, but I'd still like to test the MBA in the workflow.

[-]

neuromacmd@reddit

Not sure where you are geographically. Some stores allow you to try laptops for a while. Local laptop ai is useful for simple chat and things like whisper and auto completions but real agentic workflows and multistep workflows are limited by the memory bandwidth. Take a look at the strix halo laptops. They have higher bandwidth than the non pro MacBook Air and the 64gb is around the same price. I am playing with an Asus Proart px13 and I am loving the power and form factor. The advantage is the x86 compatibility including Linux but you are giving up on battery life and efficiency.

[-]

libregrape@reddit

Yes, it would be pointless. Remember, that in addition to the model itself you need to also run your software (OS, etc.), and store things that aren't models itself or context (e.g. checkpoints, cache). That would end up taking quite a bit, and possibly enough to make the difference negligible.
It would be nice I suppose, but you have to keep in mind that the MacBook M5 chip only has 153GB/s bandwidth (assuming base M5). This would make LLMs abomitably slow. But probably a bit useful for whisper and TTS.
It starts making sense when you up the bandwidth. If you get M5 Max, it will give you 460GB/s theoretical, which is basically like RTX 5060 Ti 16GB but with more VRAM. But keep in mind that you will be missing on the CUDA advantage. You could probably run Qwen 3.6 35B-A3B in Q5 on the 32GB M5 Max at ~60 T/s.
Very personal. IMO I would absolutely hate to carry 2 laptops. Also keep in mind that LLMs would likely still be eating the battery like crazy even on M5.
Absolutely. For 1.7k (base MacBook Pro M5 price), you can get a PC with 32GB DDR5, RTX 5070 Ti 16GB, which will give you 670GB/s bandwidth, which will be a noticeable advantage. In addition you will have more total memory (16+32=48GB), which means you can run larger MoE models, and other demanding apps in the background if you want to. Or you can go the 3090 route and get 24GB VRAM + 32GB DDR4, which may not be as good at things other than LLM inference, but will give you almost 1TB/s bandwidth and 24GB VRAM. Or, if you are feeling completely adventurous you may look into Mi50 rigs, which at this price might give you a 2x Mi50, which means 64GB VRAM, though you will have to fiddle with cooling quite a bit. But if we consider the M5 Max laptop, then we are looking at 3.6k minimum, which gets you into a 4090 setup territory, or you can try constructing the 2x 3090 (2x24GB VRAM) space heater like many others do on this sub. For 2.5k you may also be looking at alternative unified memory setups, like Strix Halo, but their bandwidth (256GB/s) is waaay too low to run large models at reasonable speeds, and you would be loosing CUDA.
Maybe, but remember that in addition to the mentioned issues you will also be juggling two completely different tech stacks: Apple and Windows or Linux. That will get annoying quicky. That said, you can probably concoct some configuration of LLMs on one device, and TTS/STT on the other, etc.

[-]

heitortp0@reddit (OP)

First, thank you for such a complete answer. I was very excited about the idea of the MBA because (i) I was always a Windows user and (ii) I thought that unified memory would give me a gain over my VRAM. But I'm starting to think it's not such a good idea.

[-]

MrPecunius@reddit

Get 32GB for a regular M5. According to oMLX data, you should be able to get \~50t/s generation and >1,000t/s prefill with a 4-bit MLX quant:

https://omlx.ai/c/aa2ktmf

With 32GB you should have all the RAM you need to run the OS+apps+LLMs appropriate for the processor.

[-]

heitortp0@reddit (OP)

It looks useful to me thx

[-]

xraybies@reddit

Forget anything < 32GB. Even then the biggest problem is MacOS. OOTB it will consume 6GB as soon as you load any app Chrome, OpenCode you're hitting 12GB. So you have \~24GB usable @ <400Gb/s which is 3090 at best. You can clawback another 2-3GB by disabling everything you can in MacOS... like https://github.com/rayone/machete/blob/main/disable.sh

So from your perspective it's like you have an RTX 4060 laptop where you can choose between 8-24GB VRAM. I would say totally not worth it.

My M5 Max w/ 128GB is usually in the \~30GB of memory used without even loading a LLM, just Chrome, Edge, VS Code, OpenCode + skills. As soon as I load oMLX + Qwen 3.6 mxfp8 it's hot, loud, using \~90GB and much slower than my i9 1300k + 4090, except the SSD which is fast <16GB/s.

So M5 with >64GB only starts to make sense from a usage perspective... cost is subjective.

The only aspects of the M5 which impress me are the SSD and battery life, when not running an LLM, everything else is avg, and the audio and macOS are a joke.

[-]

heitortp0@reddit (OP)

I'm getting a little disappointed with the idea I had about the MBA if that's the case :/

[-]

jcdoe@reddit

Apple silicon 16 gb feels like 8, because you have to run your os and web server too. FYI

[-]

heitortp0@reddit (OP)

Yeah, I've read about it, but I'm looking for the practical knowledge of this sub

[-]

jcdoe@reddit

I own an m1 mbp with 16 gb ram. It feels like 8 gb because of the overhead. I’m not speaking hypothetically, this is real experience.

Get at least 48 gb if you go Mac, you won’t regret the extra ram.

[-]

boston101@reddit

I’m on m1 8gb and running ternanry models - can’t wait to upgrade

[-]

jcdoe@reddit

I upgraded my M1 Pro 16 gb to an m5 pro 48 gb recently. I liked the idea of running llms on the old machine with a web interface for personal assistant type stuff.

But it sucks at 16 gb. I can safely run models up to 10 gb before i have to worry about it going into swap space (it still happens tho).

Oh, also, Apple limits memory bandwidth for their binned chips. So his 16 gb will run slow, and he’ll be pulling down 30/40 t/s even with the model fully loaded into ram.

But don’t listen to me, he wants “practical knowledge” lmao

[-]

boston101@reddit

Ugh, I wanted to use my Mac’s as old servers I wish Apple would allow no limits and ability to turn off so many processes I don’t need for the server.

TBH I’m going nvidia route instead of osx upgrade.

[-]

jcdoe@reddit

You should. Nvidia and Linux are the primary platforms for local LLMs, the best bang for buck, and use standard bash commands (I spent too much time rewriting a bash script because Macs don’t use apt-get. No no. They use “brew”, which takes slightly different command line arguments than Linux.

I like my Mac, but I bought it as a computer. LLMs are just a bonus.

[-]

boston101@reddit

lol I have access to cloud gpus and and developer , I just want to do it all in house now.

Fuck I’m old and full circle , on prem is back

[-]

heitortp0@reddit (OP)

I didn't think you were the kind of person who wouldn't know how it works in practice. What I meant was I came here to know exactly what people like you, who happen to use this exact hardware, could say about it. No need to shade like this. Thank you.

[-]

MrPecunius@reddit

48GB isn't available for M5 and would be too much for it to use effectively in any case.

48GB isn't really enough for a M5 Pro, either (source: I had a M4 Pro/48GB and now have a M5 Pro/48GB).

[-]

FineClassroom2085@reddit

If you get a 32gb M5 pro, it will outperform your PC in everything but prompt processing. Honestly if you can swing it, go 64gb, that opens full weights/context Gemma 4 and Qwen 3.6 27b which are staggeringly good for their weight class.

You probably won’t use your PC any more except for maybe stable diffusion work.

[-]

heitortp0@reddit (OP)

You mean a macbook pro?

[-]

MrPecunius@reddit

Speaking as a Macbook Pro (M5 Pro/64GB/2TB) owner, I'd get a Macbook Air for a regular M5.

[-]

FineClassroom2085@reddit

Yes

[-]

heitortp0@reddit (OP)

I'm from Brazil. Not sure if I could afford. But what do you think about the 24 and 32 ram versions?

[-]

Barbaricliberal@reddit

I have the binned M5 Pro, 48 gb of ram, and it's been great.

The 48gb binned M5 Pro (14 in) MBP seems to be a good value if price is an issue.

[-]

heitortp0@reddit (OP)

It doubles the price I'm willing to spend unfortunately

[-]

itsappleseason@reddit

What does the secondhand M1 Max market look like in your area?

[-]

Barbaricliberal@reddit

Ooof, fair enough.

Have you considered getting a MacBook Air? You almost certainly can get the same specs as the M5 MBP (including the RAM) for cheaper.

For instance, in the US it's $1500 for the MBA vs $2100 for the MBP for the 10 core M5 and 32gb of ram.

[-]

heitortp0@reddit (OP)

Yeah, the 32 MBA is the one I'm more interested in. Good choice?

[-]

Barbaricliberal@reddit

I'd say so, it's the same specs-wise for the most part as the base M5 MBP, but cheaper.

The only difference performance-wise vs the MBP are the thermals are better on the MBP since it has a fan. But the difference isn't a big deal.

[-]