Mac Studio Performance Suggestion For minimax

Posted by DetailPrestigious511@reddit | LocalLLaMA | View on Reddit | 15 comments

I need help. I want to self-contain my MiniMax 2.7 and Qwen 3.5 (122 billion parameter) models. I have checked, and these two models can handle 80-90% of the work I do. Right now, I am using an Ollama subscription in order to get the performance I need, and I am on the $100 plan.

The thing is, I am thinking about planning for an M3 Ultra with 256 GB. I am just asking if anyone can help me:

  1. Can that setup sustain one of these models running all the time?

  2. If MiniMax can give 50 tokens per second on 256 GB, I guess I can easily run a Quantization 6 model, which is enough for my use case.

Please suggest, as that is a significant investment and I wanted to ask beforehand. The other solution is buying 128 GB of M4 Max, but I don't want that because MiniMax will not work or there will be no space, and I would need to compromise on quantization.

There is an M5 Ultra also coming in two to three months. I can wait for that as well, but the main question is just regarding that heavy usage. Let's imagine usage will be 10-15 hours of coding the whole day with two codebases running simultaneously.

Is there anyone who is using the same kind of setup who can give honest feedback?