I tried running Gemma 4 on my phone. llama.cpp failed, LiteRT‑LM didn’t.
Posted by GeeekyMD@reddit | LocalLLaMA | View on Reddit | 12 comments
I wanted Gemma 4 as a usable local model on my Android phone, not a benchmark screenshot.
- llama.cpp in Termux: \~2–3 tok/s, CPU pegged, basically unusable
- Google’s on‑device LiteRT runtime with Gemma 4: suddenly smooth on the same phone
- I wrapped it in a local HTTP server and point my Termux agent (OpenClaw) at it
If you’re thinking about serious local models on phones, I wrote up the full experiment and open‑sourced the Android side and the Termux side.

SupremeLisper@reddit
Sounds good, have you checked the off grid app? On another note, Are you sure its using both the CPU and GPU for generation? It says CPU or GPU for generation in parameters.
I get 4 tok/s on average with CPU vs 10 tok/s in Edge gallery AI. The only issue is stability if you do anything in the background which requires a GPU you may cut off the generation.
CPU is much more stable but twice as slow vs GPU.
GeeekyMD@reddit (OP)
yes i can say it runs smoothly now exactly like how it runs on edge gallery
SupremeLisper@reddit
How? Can you share your setup? The repo you linked only shares things about openclaw setup.
GeeekyMD@reddit (OP)
https://github.com/Mohd-Mursaleen/LiteRT-Server
arnaudfr78@reddit
How do you connect the LiteRT-server with OpenClaw on Termux on android ? What are the parameters at the onboarding stage ?
mapleaikon@reddit
Can you share how to implement LiteRT with HTTP server wrapper. I'm trying to build an Android app but not yet finish
GeeekyMD@reddit (OP)
https://github.com/Mohd-Mursaleen/LiteRT-Server
Ok_Warning2146@reddit
try compile llama.cpp with vulkan. That can give u a few t/s
GeeekyMD@reddit (OP)
the performance gap is huge between llama.cpp and LiteRT
GeeekyMD@reddit (OP)
Details + code:
Experiment write‑up: https://geekymd.me/blog/running-local-llm-on-android
Termux / OpenClaw setup: https://github.com/Mohd-Mursaleen/openclaw-android
Android automation agent: https://github.com/Mohd-Mursaleen/android-automation-agent
New_Comfortable7240@reddit
So to be clear, you use your computer via ADB to run the model on the phone? Maybe next step is create the APK and add to releases on your repo
GeeekyMD@reddit (OP)
noooo,,
the app is in the phone ,, everything runs on phone no computer needed