OpenClaw like setup with local only models - can I run this on an M1 Max with 64GB mem?

Posted by arnieistheman@reddit | LocalLLaMA | View on Reddit | 7 comments

Hi there,

I have a Mac Studio M1 Max with 64GB of unified memory. I want to experiment with an agent like OpenClaw (or the multitude of altermatives) with local only models.

What kind of setup would you recommend?

Thanks a lot in advance.

[-]

ai_guy_nerd@reddit

64GB on an M1 Max is a great spot to be in. Most of the heavy lifting for local agents is just the memory for the model and the context window. You can comfortably run quantized 30B or even 70B models via Ollama and still have plenty of RAM for the orchestrator.

The trick is picking a model that doesn't hallucinate the tool calls. Qwen 2.5 is surprisingly good at that. If the goal is a setup like OpenClaw, focus on the coordination layer and make sure the model you pick is optimized for function calling.

Looking into the Agent Client Protocol (ACP) might be worth it too. It helps keep the tool definitions standardized so you don't have to rewrite everything when you swap models.

[-]

arnieistheman@reddit (OP)

Thanks for getting back. So Qwen 2.5 is better than 3.6?

[-]

kaal-22@reddit

64GB on M1 Max is solid for this. You'll want to run a quantized model — something like Qwen 2.5 32B Q4 or Llama 3.1 70B Q4 (tight but doable with 64GB). For the agent framework, OpenClaw works with local models through Ollama or LM Studio. The tricky part isn't the model, it's the tool calling — smaller local models are worse at structured tool use, so expect more retries and weird failures than you'd get with Claude or GPT. Start with a 32B model and work up from there if your memory handles it.

[-]

egomarker@reddit

Good bot

[-]

arnieistheman@reddit (OP)

Hey, thanks for getting back. Would you recommend open claw or an alternative? Can I expect anything actually useful from e.g. Qwen 32B model? Can it really "read" the screen and use the computer?

[-]

SexyAlienHotTubWater@reddit

Both Qwen 2.5 and Llama 3.1 are extremely out of date. Use Qwen 3.6 or something, with DFlash. You don't need to go down to Q4 with 64GB, you'll be fine with Q6 (much better quality) or Q8. Qwen's dense models are pretty fast, especially with DFlash. You can also batch.

If you just want to run it in the background, fire-and-forget tasks, slow is fine, so go dense with the largest quant you can fit.

[-]

arnieistheman@reddit (OP)

Didn't even know about Dflash. Thanks.