Best models for M3 Max 48gb?
Posted by Good_Educator_3719@reddit | LocalLLaMA | View on Reddit | 5 comments
I'm a hobbyist developer using opencode to build personal productivity tools and work on a basic SaaS platform idea.
I've tried to use lmstudio and the various big models for building but it's so slow that I only really use it as a planning and chat agent, then switch over to the web opencode zen models when I need the agent to build stuff.
I have a MBP M3 Max with 48gb ram / unbinned (16-core CPU / 40-core GPU ) and in my head i'm convinced I should be getting better results with this hardware.
For example Gemma 4 26b a4b (gguf - I can't run the mlx versions on the latest lmstudio yet) runs incredibly fast (80-120tk/s) for general chatting and planning work, but asking it to build anything through opencode grinds it to a halt and the fttk speed is like 5+ minutes.
I guess i'm asking what models people with the same/similar hardware are running so I can benchmark my results. thanks!
El_Hobbito_Grande@reddit
Do you mean using it through the API is slow vs using the built-in chat?
Good_Educator_3719@reddit (OP)
exactly - i'm talking about using lmstudio as the provider in opencode then choosing models from the list I populate in opencode.json. I don't use the lmstudio chat tbh, but after a few quick tests this morning I can see it runs about the same as general chat inference through opencode.
El_Hobbito_Grande@reddit
I’ve noticed the same thing. Part of the issue at least is that tools like Opencode and openclaw, etc., can inject large system prompts that are not injected when you use the built-in chat. I’m still working on optimizing workflow myself. I do prefer OMLX to LM studio On apple Silicon as it’s just faster. It’s a fairly recent project – open source of course, but they’re making good progress. For example, they recently fixed issues with Gemma 4. Definitely worth looking into. If I come up with some good optimization tips, I’ll be sure to share them.
FusionCow@reddit
There is a piece of software called inferencer or inferencer pro which is basically lm studio for mlx, you should give that a shot. I would try gemma 4 26b and 31b, alongside qwen 3.5 35b and 27b
Excellent_Koala769@reddit
How many tps do you get on Gemma 4 31b dense thinking on?