Running OpenClaw with Gemma 4 TurboQuant on MacAir 16GB
Posted by gladkos@reddit | LocalLLaMA | View on Reddit | 21 comments
Hi guys,
We’ve implemented a one-click app for OpenClaw with Local Models built in. It includes TurboQuant caching, a large context window, and proper tool calling. It runs on mid-range devices. Free and Open source.
The biggest challenge was enabling a local agentic model to run on average hardware like a Mac Mini or MacBook Air. Small models work well on these devices, but agents require more sophisticated models like QWEN or GLM. OpenClaw adds a large context to each request, which caused the MacBook Air to struggle with processing. This became possible with TurboQuant cache compression, even on 16gb memory.
We found llama.cpp TurboQuant implementation by Tom Turney. However, it didn’t work properly with agentic tool calling in many cases with QWEN, so we had to patch it. Even then, the model still struggled to start reliably. We decided to implement OpenClaw context caching—a kind of “warming-up” process. It takes a few minutes after the model starts, but after that, requests are processed smoothly on a MacBook Air.
Recently, Google announced the new reasoning model Gemma 4. We were interested in comparing it with QWEN 3.5 on a standard M4 machine. Honestly, we didn’t find a huge difference. Processing speeds are very similar, with QWEN being slightly faster. Both give around 10–15 tps, and reasoning performance is quite comparable.
Final takeaway: agents are now ready to run locally on average devices. Responses are still 2–3 times slower than powerful cloud models, and reasoning can’t yet match Anthropic models—especially for complex tasks or coding. However, for everyday tasks, especially background processes where speed isn’t critical, it works quite well. For a $600 Mac Mini, you get a 24/7 local agent that can pay for itself within a few months.
Is anyone else running agentic models locally on mid-range devices? Would love to hear about your experience!
Sources:
OpenClaw + Local Models setup. Gemma 4, QWEN 3.5
https://github.com/AtomicBot-ai/atomicbot
Compiled app: https://atomicbot.ai/
Llama CPP implementation with TurboQuant and proper tool-calling:
https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant
Anonymous_Unkown@reddit
Will you make it for Linux as well?
gladkos@reddit (OP)
Sure! Gonna release Linux
Comrade_United-World@reddit
I hope it stays free forever :(
Wildcard355@reddit
It will, but that's not the main issue. More powerful and efficient models are coming out in the future, the hope is those models become free as well.
gladkos@reddit (OP)
We will support any frontier models for sure
Vegetable-Cold-6603@reddit
+1
MrPinguv@reddit
Sorry If it's too dumb, but is there any setup needed for enabling TurboQuant? I check both GitHub readme but im not sure if any of the steps are needed for this.
gladkos@reddit (OP)
Hi! Turboquant is enabled by default in the app
jacobgt8@reddit
Is it also for Windows or only MacOS?
Also, what did you use to create this video?
gladkos@reddit (OP)
Hi! Atomic Bot is available for both. However Local Models are available only for MacOS now. Working to make it possible for Windows very soon.
video created with inshot and kling ai.
jacobgt8@reddit
Great, thank you! I’ll keep an eye out on the local models for Windows, I don’t have a Mac Mini like everyone out there, but I do have a windows machine with NPU and 96gb unified memory.
Currently using LM Studio for local models but would love to try this out as well!
Open-Impress2060@reddit
Is openclaw safe though? I feel like I keep seeing it disobey orders from people
gladkos@reddit (OP)
hi! OpenClaw is an experimental product. It depends on the permissions you gave for your agent.
iamck_dev@reddit
Hey. How did you make that video!! It’s great!
Sensitive-Fruit-7789@reddit
Free solutions like OpenScreen, there are better paid ones which I don't remember right now
FarmerQueasy8588@reddit
Context caching and cache compression are definitely needed if you're trying to push agentic tasks on 16GB of RAM. Standard machines usually struggle once the agent adds a large context to every request. I ended up trying bluestacks.ai mostly because I wanted the agent running in a more contained space instead of wiring the whole local environment together myself.
Comrade_United-World@reddit
I tested it this software is too good, it will replace lmstudio :0. Omg so good looking and fast af
gladkos@reddit (OP)
Great you like atomic! Ty
Alternative_One_1736@reddit
hi. I cannot find an opportunity to add a custom local model that is already running in mlx for example, will there an option for this?
gladkos@reddit (OP)
Great question, thank you! It’s not available yet. However we consider to add option for custom local models. At the moment you can try atomic.chat they have custom models and connection to openclaw or any other agent with local server api.
Comrade_United-World@reddit
Thank you big dong, thats so nice.