Best local model for MBP 48GB UM
Posted by BABA_yaaGa@reddit | LocalLLaMA | View on Reddit | 4 comments
I have been toying with GLM 4.7 flash mlx a while ago using lmstudio. I had integrated it successfully with openclaw and it was kinda stable in tool calling. But when it came to browser use, the model would crash after a few steps.
Anyway, what is the best latest model i can use locally for variety of tasks. Qwen 3.6 comes to minds but I have been out of loop for a while.
Throughput is also a consideration so whats the best settings i can use in lmstudio for mlx models with max possible context window.
Machine is MBP m4 max with 48 gb of unified memory
Toastti@reddit
Qwen 3.6 35B. Basically it. As the 27b is too slow I less you have like an rtx 5090
the-username-is-here@reddit
That, run it in omlx on M3 Max, decent(ish) speed, quite usable once you figure out prompts and tools.
Stunning_Inside5182@reddit
Hi, what UI is this that shows these stats?
the-username-is-here@reddit
That is omlx. It does some neat stuff like persistent prefill cache too.