Mac Studio Performance Suggestion For minimax

Posted by DetailPrestigious511@reddit | LocalLLaMA | View on Reddit | 15 comments

I need help. I want to self-contain my MiniMax 2.7 and Qwen 3.5 (122 billion parameter) models. I have checked, and these two models can handle 80-90% of the work I do. Right now, I am using an Ollama subscription in order to get the performance I need, and I am on the $100 plan.

The thing is, I am thinking about planning for an M3 Ultra with 256 GB. I am just asking if anyone can help me:

Can that setup sustain one of these models running all the time?
If MiniMax can give 50 tokens per second on 256 GB, I guess I can easily run a Quantization 6 model, which is enough for my use case.

Please suggest, as that is a significant investment and I wanted to ask beforehand. The other solution is buying 128 GB of M4 Max, but I don't want that because MiniMax will not work or there will be no space, and I would need to compromise on quantization.

There is an M5 Ultra also coming in two to three months. I can wait for that as well, but the main question is just regarding that heavy usage. Let's imagine usage will be 10-15 hours of coding the whole day with two codebases running simultaneously.

Is there anyone who is using the same kind of setup who can give honest feedback?

[-]

-dysangel-@reddit

I have a 512GB. M2.7 is running great on it:

https://www.reddit.com/r/LocalLLaMA/comments/1sk70ph/local_minimax_m27_gta_benchmark/
https://www.reddit.com/r/LocalLLaMA/comments/1sjkovr/minimax_27_running_subagents_locally/

I'm running the IQ2_XXS quant of M2.7 and it's working well - that quant is 65GB, so a 128GB Mac can run it with a decent amount of context (I don't know the numbers, I don't ever have to care).

Mac Studios have great built in cooling so you don't need to worry about running all day. I saw a video where they water cooled one and it barely shifted the needle on performance.

Since you said you're ok to wait, I'd definitely wait for the M5 Ultra. It's going to be 4x the performance.

If you're going the laptop route instead, make sure you get an M5 Max and not M4 Max, because of the 4x matmul performance. Effectively, M5 Max should already be 2x as fast as the current M3 Ultra for prompt processing.

[-]

cmndr_spanky@reddit

if you have 512gb, why on earth would you only run it at q2? You're going to get a lot more errors than q4, you'd probably be better off just choosing a smaller / different model at that point.

[-]

-dysangel-@reddit

if you have 512gb, why on earth would you only run it at q2?

It's faster. And the quality is "good enough". If I were to switch to more complex tasks I'd maybe boot up a smarter model, but this one is a very good workhorse.

you'd probably be better off just choosing a smaller / different model at that point.

Depends what you want out of it. This model is the sweet spot for me currently in terms of speed vs intelligence trade-off. I'm going to try some larger quants, mlx etc and compare side by side - but so far I really love this quant.

[-]

DetailPrestigious511@reddit (OP)

Thanks for the clarity. You are right; I should wait for the M5 Ultra.

I am not going the laptop route because I already have an M4 Pro. I am running Qwen3.5 (35 billion parameter), and it works fine initially, but as soon as I get into my coding tasks, the thermals kick in and everything slows down. Laptops will always have that problem.

Mac Studio is meant for that kind of work, and I want a Mac Studio specifically for these tasks. For a laptop, I am okay with my MacBook Air.

[-]

Cergorach@reddit

My M4 Pro in my Mac Mini runs fine at 70W, but the cooling in laptops...

[-]

mrpena@reddit

an M3 Ultra with 256GB is $6k, or $500/month at 0%. IMO it's easier and cheaper to just pay the max plan unless your sole requirement is keeping everything local.

[-]

cmndr_spanky@reddit

yeah it'll take years of a Claude max plan to eventually have a $6 or $7k Mac pay for itself. If I'm just messing around learning / hobby stuff, there's no fk-ing way that Mac hardware is worth it.

However, if I'm running a business that will consistently use a powerful local LLM constantly for my use case. As a business expense $6k is nothing, totally worth it if you're actually going to us it (and can be a tax write off).

[-]

DetailPrestigious511@reddit (OP)

Yeah, I agree. On any day, paying a subscription amount is the cheaper option, but I'm trying to build something offline.

[-]

PracticlySpeaking@reddit

The other risk with subscriptions is the price can change at any time.

[-]

benevbright@reddit

I’m on the same path. I currently own a 64GB Mac and I'm running Qwen-3-Coder-Next. It’s very fast for small tasks with a coding agent, but it’s not quite smart enough for professional work. I've switched to using MiniMax 2.7 via OpenRouter instead, and I’m very happy with it. I’m also looking to upgrade to a 256GB Mac, but I’m waiting for the M5 Ultra.

[-]

br_web@reddit

the subscription will break even with purchase around 5-6 years, at that you will have to buy new hardware, so at the end cost wise is the same, but, with the subscription you get much better models, you loose the privacy though

[-]

DetailPrestigious511@reddit (OP)

I guess if you consider the resale value of the machine after three years, the break-even can happen even sooner. The initial investment is on the higher side, but otherwise, this is a great investment for someone doing coding or agentic tasks at least 10-12 hours a day.

[-]

DetailPrestigious511@reddit (OP)

the subscription will break even with purchase around 5-6 years, at that time you will have to buy new hardware, so at the end cost wise is the same, but, with the subscription you get much better models, you loose the privacy though

[-]

DetailPrestigious511@reddit (OP)

If you'll consider the refurbished price at which I will sell the hardware, then the break-even point can happen much faster.

If every three years I can buy a better machine and every three years the cost can be covered, I guess this is a better approach. However, the first-time investment is high.

[-]

DetailPrestigious511@reddit (OP)

On any day, paying on a cloud system will be much cheaper. I agree, but I am just trying to build something offline.