Mac Studio 128/256GB for local LLM coding?
Posted by TechDude12@reddit | LocalLLaMA | View on Reddit | 14 comments
Hello,
I'm a developer with side projects. Lately, I'm thinking of buying a Mac Studio with 128 or 256GB ram in order to support my projects.
My logic is to be able to define goals to local llm and let it do it's job while I'm sleeping or running other projects.
How feasible is that? Will this work? Does it worth the cost or should I stick to subscriptions without having overnight autonomous coding sessions?
jhov94@reddit
If you're used to subscriptions, definitely go with 256GB. With that, you can run Minimax M2.5 and Stepfun 3.5 Flash, which are in line with Claude Sonnet 4.5 and GPT-5 mini. It's perfectly feasible to run them overnight if you have a good setup and clearly defined tasks, and if this is something you do regularly, running local will save you money in the long run while also giving you consistent results.
Glittering-Past2826@reddit
I have a Mac Studio 256gb too. Been playing with Minimax 4bit quant. I’m not sure how to setup overnight tasks though, new to llm stuff. What is the best way to do this?
nomorebuttsplz@reddit
I would be curious how you define a whole night's worth of tasks and hook up the agent to do it all without checking in. There is a reason that autonomous task length is a benchmark of model ability and current SOTA is about 15 hours. But that's 15 human hours. How long does that actually take Opus 4.6 to do? 20 minutes or something?
TechDude12@reddit (OP)
Interesting. My thinking of "whole night's worth of tasks" is to define eg 5 features that it needs to develop and let it develop/test/refine them sequentially. Like having 2 junior developers that you assign them tasks and give them a checklist that they have to meet. What do you think? is it possible or will I waste my money?
Curious, since you have 512GB, what's your opinion on ram sizes? I can't afford 512 but I'm torn between 128 and 256.
nomorebuttsplz@reddit
I'm not sure if it's possible. Depends on how easy it is to really test fully and accurately. For me when I am making a game, it's impossible because LLMs are bad at games right now, so I need to test everything myself. But if you were sure that what you are making is easily verifiable it might work. You could also try using claude code and then multiply however long it takes by 5 or so to give you a sense of whether you can delegate that much at once.
I would go with 256. It seems like near SOTA performance for the last year requires 300b+ parameters.
TechDude12@reddit (OP)
Got it. Thank you so much, appreciate your help
success83@reddit
This guy is a hater. Nothing is impossible. Keep going
nomorebuttsplz@reddit
Remember that local inference is about privacy/security/flexibility rather than per-token cost savings
phoiboslykegenes@reddit
You might want to wait for the M5 refresh of the Mac Studio. The new base M5 released last fall has a much higher prefill performance.
Salty_Yam_6684@reddit
honestly that sounds like a pretty wild setup but i'm not sure you'll get the overnight autonomous coding thing you're dreaming of. even with 128gb+ you're still gonna hit walls with current llm capabilities - they're great at helping with code but full autonomous overnight sessions are still pretty sketchy
the mac studio with that much ram would absolutley crush at running big models locally though, and you'd save a ton on api costs if you're doing heavy llm work. but for the price of those configs you could run a lot of claude/gpt4 calls
maybe start smaller and see how much actual autonomous work you can get out of current models before dropping 8k+ on the dream machine?
TechDude12@reddit (OP)
Yes, I don't really know. I guess the big question is how much will I be able to automate my work in order for the machine to pay for itself. Doing overnight sessions using cloud isn't possible cost-wise but I'm not sure how productive can it be in a local setup.
HopePupal@reddit
this is good advice. right now subscriptions are artificially cheap and you are not going to make the cost of a Mac Studio back by the time it becomes obsolete in a few years (or even a Strix Halo rig tbh) so from a pure cost standpoint, local isn't the best option. definitely worth playing around a bit first to see what the cheapest smallest model that can meet your needs is, and then if you still want to run local, buying appropriate hardware.
Easy-Unit2087@reddit
Pretty damn. The more memory, the better of course, but 128GB will already let you run pretty capable coding models like qwen3-coder-next with large context.
Claude Code CLI can do a lot with these local models. Of course, you should let Opus 4.6 work during the night too or you're just wasting tokens.
TechDude12@reddit (OP)