MBP M5 Max 128GB Owners: is 2TB internal enough, or will I regret not going bigger?

Posted by _derpiii_@reddit | LocalLLaMA | View on Reddit | 22 comments

I'm set on the 128GB M5 Max, and deciding between storage options (2TB or 4TB)?

Question: What have been your actual LLM workflow centric storage requirements? Any regrets going with the baseline 2TB?

And yes, I know it's more economical to go with 2TB and add an external 2TB NVMe w/ TB5 enclosure - but there's downsides to that (bandwidth, thermals, bus .

This is a new domain for me, so I'm just looking for real user insights.

Due diligence check: yes I did reddit search, yes I asked Claude

Some random thoughts and things I'm considered (let's call it human thinking section).

Bandwidth storage bandwidth comparisons (sustained):

4TB internal (sustained): 13.6/17.8 GB/s read/write
external storage with fastest NVMe: \~ 6GB/s (with heavy caveats below)

2TB internal writes slower due to less NAND modules in parallel

TB5 enclosures use PCIe Gen4 (not Gen5) => 6-7 GB/s

Real world, best sustained, non-raid, properly cooled: OWC Express 1M2 80G + 4TB... but at that point it's $600+, so moot for now.

Normally I would go for the base 1-2 TB because my heaviest need has been video editing. But that's a workflow where you don't need entire corpus in one spot. You just use internal disk as a local editing buffer while offloading old projects to external. And you can even edit directly off the external drive because the connection is fast enough. Having more internal storage is strictly a convenience. It does not block any workflows.

A not so obvious one (Claude couldn't even think of it): you use up a port + PCIe lane.

[-]

tmvr@reddit

Upgrade from 2TB to 4TB cost 750eur, so that's 750 for 2TB worth of storage. You can get a 4TB NVMe SSD with 14GB/s write speed for 500eur and a TB5 enclosure for under 200eur. For about 800-900 you can get an 8TB SSD with 7-8GB/s speeds which will still be fine even with an 80Gbps TB5 enclosure because you can't get faster speeds through there anyway.

So you can have 8TB with 7-8GB/s externally or 2TB with 14TB/s internally for roughly the same price.

[-]

tomByrer@reddit

Yes, I was thinking he could get more storage for the same upgrade price.

[-]

pmttyji@reddit

Go for 4TB if you could afford. This year we'll be getting more models so future-proof is better.

[-]

SpicyLentils@reddit

Ignorant question: aren't model files, even MoE ones, accessed only when first loaded?

[-]

_derpiii_@reddit (OP)

I'm thinking along those lines as well. The models are going to get bigger because people are asking for larger open weight models.

So I expect 90GB models to be the meta at some point.

[-]

pantalooniedoon@reddit

Even if its 90GB, then you ll have 1.9T remaining. You can always offload or download a new model. Would not worry about storage space at 2T.

[-]

pmttyji@reddit

Apart from models, we may need space for other things such as Coding stuff, Data, Document, Image/Audio/Video if we're using Image/Audio/Video models. So the buffer is big.

[-]

HopePupal@reddit

not an M5 Max owner but i went thru the same option shopping process with my Strix Halo. 2 TB has been more than enough, even during the phase when i was trying a dozen models a day.

my day to day requirements are probably under 300 GB. i now offload unused models to a storage server over GigE if there's an chance i might want to use them again. it's still faster than redownloading from HF.

i do have cheap magnetic storage attached over USB for datasets, but you don't need NVMe to do linear scans of (in this case) the entire history of Reddit.

[-]

_derpiii_@reddit (OP)

Awesome, that’s good to hear. How are you liking your strix? What kind of inference speeds are you getting?

[-]

HopePupal@reddit

it's fine for the price and a great little desktop and build server, but it's objectively not great at inference. i use it mostly to run medium-sized models slowly: MiniMax M2.x Q3_K_S processes 20 ktok prompts at about 100 tok/s average, and generates at around 20 tok/s.

i eventually upgraded an old gaming desktop with an AMD R9700 to run small dense models like Qwen 3.x 27B, which is a lot closer to interactive speed. compared at Q6_K, you can see a pretty big difference in prompt processing:

Strix Halo

test	depth	t/s
pp2048	0	202.08
tg32	0	10.59
pp2048	8000	208.17
tg32	8000	10.33
pp2048	16000	196.45
tg32	16000	10.16
pp2048	32000	159.31
tg32	32000	9.67
pp2048	48000	112.52
tg32	48000	9.20

R9700

test	depth	t/s
pp2048	0	841.58
tg32	0	24.58
pp2048	8000	652.03
tg32	8000	24.23
pp2048	16000	789.47
tg32	16000	23.68
pp2048	32000	732.68
tg32	32000	22.76
pp2048	48000	686.29
tg32	48000	22.01
pp2048	64000	645.11
tg32	64000	21.10
pp2048	80000	598.14
tg32	80000	20.36
pp2048	96000	559.87
tg32	96000	19.75
pp2048	112000	526.61
tg32	112000	19.10

[-]

Varmez@reddit

I kept mine with 2TB. I’ve been playing with a few models, and the cache is set to get up to 200GB but I really don’t see storage being a problem for me. Right now I have 1.4TB available and World of Warcraft is also responsible for taking up 125GB..

Important note though, for like media / archiving I do have a 20TB NAS.

[-]

chisleu@reddit

I bought the 4tb version to maximize the throughput

[-]

_derpiii_@reddit (OP)

Oh interesting! Is that something that you’ll see a benefit at any file size? For some reason, I was thinking it only applies for large files.

[-]

smilodonis@reddit

At least go 4TB

[-]

StardockEngineer@reddit

I find I only end up using a few models at a time. No reason to keep around models that are not the best. So 1TB works for me.

[-]

ibhoot@reddit

2TB internal. 4TB external TB4 or TB5 NMVE. Prefer to keep IO heavy stuff on easily replaceable external SSD, internal SSD for main OS/apps.

[-]

epicycle@reddit

I bought the 2 TB configuration and haven been doing a decent amount of coding with it since purchasing. Qwen 3.6 27B 8 bit and 35B 8bit are my primary coding drivers with the Qwen 3.5 122B 6.5bit being used for creating my PRDs and Plans, executed by the smaller models. I use oMLX and sometimes LM Studio to try new models.

Overall, as long as you’re not a model hoarder then 2TB works fine. I have 15-20 different models to play with. If you want more, get yourself a Thunderbolt 5 cable and a drive that can use it and you can move models around as needed. If you want to get into trying a lot of model varieties all the time then go 4TB.

Either way, the machine is amazing and the latest batch of model releases are stellar. Nothing but props to companies releasing them.

[-]

No-Juggernaut-9832@reddit

Exactly the same advice I would give & we have the same choices of models. I sometimes use Gemma4 31B dense model with the 2B as SpecFill drafter (to speed it up) for conversational & code reviews in addition to Qwen 3.5 27B dense (connected to a DF draft model to speed it up)

[-]

_derpiii_@reddit (OP)

> 2TB works fine. I have 15-20 different models to play with.

This was exactly the type of real usage insight I was looking for. Thank you :)

[-]

redboy33@reddit

I bought the 1TB m5 pro 14” MBP and it’s full already. When I ordered it I was already $770 over budget. So 2TB wasn’t possible.

[-]

_derpiii_@reddit (OP)

oof. How much RAM did you get?

The good news is you can still use it as clustered inference in the future.

[-]

Zestyclose_Yak_3174@reddit

I started with 512, then 1TB. That's way too little for me. I would say 2TB is the right starting point.