Running OpenClaw with Gemma 4 TurboQuant on MacAir 16GB

Posted by gladkos@reddit | LocalLLaMA | View on Reddit | 21 comments

Hi guys,

We’ve implemented a one-click app for OpenClaw with Local Models built in. It includes TurboQuant caching, a large context window, and proper tool calling. It runs on mid-range devices. Free and Open source.

The biggest challenge was enabling a local agentic model to run on average hardware like a Mac Mini or MacBook Air. Small models work well on these devices, but agents require more sophisticated models like QWEN or GLM. OpenClaw adds a large context to each request, which caused the MacBook Air to struggle with processing. This became possible with TurboQuant cache compression, even on 16gb memory.

We found llama.cpp TurboQuant implementation by Tom Turney. However, it didn’t work properly with agentic tool calling in many cases with QWEN, so we had to patch it. Even then, the model still struggled to start reliably. We decided to implement OpenClaw context caching—a kind of “warming-up” process. It takes a few minutes after the model starts, but after that, requests are processed smoothly on a MacBook Air.

Recently, Google announced the new reasoning model Gemma 4. We were interested in comparing it with QWEN 3.5 on a standard M4 machine. Honestly, we didn’t find a huge difference. Processing speeds are very similar, with QWEN being slightly faster. Both give around 10–15 tps, and reasoning performance is quite comparable.

Final takeaway: agents are now ready to run locally on average devices. Responses are still 2–3 times slower than powerful cloud models, and reasoning can’t yet match Anthropic models—especially for complex tasks or coding. However, for everyday tasks, especially background processes where speed isn’t critical, it works quite well. For a $600 Mac Mini, you get a 24/7 local agent that can pay for itself within a few months.

Is anyone else running agentic models locally on mid-range devices? Would love to hear about your experience!

Sources:

OpenClaw + Local Models setup. Gemma 4, QWEN 3.5
https://github.com/AtomicBot-ai/atomicbot
Compiled app: https://atomicbot.ai/

Llama CPP implementation with TurboQuant and proper tool-calling:
https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant

[-]

Anonymous_Unkown@reddit

Will you make it for Linux as well?

[-]

gladkos@reddit (OP)

Sure! Gonna release Linux

[-]

Comrade_United-World@reddit

I hope it stays free forever :(

[-]

Wildcard355@reddit

It will, but that's not the main issue. More powerful and efficient models are coming out in the future, the hope is those models become free as well.

[-]

gladkos@reddit (OP)

We will support any frontier models for sure

[-]

Vegetable-Cold-6603@reddit

[-]

MrPinguv@reddit

Sorry If it's too dumb, but is there any setup needed for enabling TurboQuant? I check both GitHub readme but im not sure if any of the steps are needed for this.

[-]

gladkos@reddit (OP)

Hi! Turboquant is enabled by default in the app

[-]

jacobgt8@reddit

Is it also for Windows or only MacOS?

Also, what did you use to create this video?

[-]

gladkos@reddit (OP)

Hi! Atomic Bot is available for both. However Local Models are available only for MacOS now. Working to make it possible for Windows very soon.

video created with inshot and kling ai.

[-]

jacobgt8@reddit

Great, thank you! I’ll keep an eye out on the local models for Windows, I don’t have a Mac Mini like everyone out there, but I do have a windows machine with NPU and 96gb unified memory.

Currently using LM Studio for local models but would love to try this out as well!

[-]

Open-Impress2060@reddit

Is openclaw safe though? I feel like I keep seeing it disobey orders from people

[-]

gladkos@reddit (OP)

hi! OpenClaw is an experimental product. It depends on the permissions you gave for your agent.

[-]

iamck_dev@reddit

Hey. How did you make that video!! It’s great!

[-]

Sensitive-Fruit-7789@reddit

Free solutions like OpenScreen, there are better paid ones which I don't remember right now

[-]

FarmerQueasy8588@reddit

Context caching and cache compression are definitely needed if you're trying to push agentic tasks on 16GB of RAM. Standard machines usually struggle once the agent adds a large context to every request. I ended up trying bluestacks.ai mostly because I wanted the agent running in a more contained space instead of wiring the whole local environment together myself.

[-]

Comrade_United-World@reddit

I tested it this software is too good, it will replace lmstudio :0. Omg so good looking and fast af

[-]

gladkos@reddit (OP)

Great you like atomic! Ty

[-]

Alternative_One_1736@reddit

hi. I cannot find an opportunity to add a custom local model that is already running in mlx for example, will there an option for this?

[-]

gladkos@reddit (OP)

Great question, thank you! It’s not available yet. However we consider to add option for custom local models. At the moment you can try atomic.chat they have custom models and connection to openclaw or any other agent with local server api.

[-]

Comrade_United-World@reddit

Thank you big dong, thats so nice.