I tried running Gemma 4 on my phone. llama.cpp failed, LiteRT‑LM didn’t. | TheaterFire

I tried running Gemma 4 on my phone. llama.cpp failed, LiteRT‑LM didn’t.

Posted by GeeekyMD@reddit | LocalLLaMA | View on Reddit | 12 comments

I wanted Gemma 4 as a usable local model on my Android phone, not a benchmark screenshot.

llama.cpp in Termux: \~2–3 tok/s, CPU pegged, basically unusable
Google’s on‑device LiteRT runtime with Gemma 4: suddenly smooth on the same phone
I wrapped it in a local HTTP server and point my Termux agent (OpenClaw) at it

If you’re thinking about serious local models on phones, I wrote up the full experiment and open‑sourced the Android side and the Termux side.

[-]

SupremeLisper@reddit

Sounds good, have you checked the off grid app? On another note, Are you sure its using both the CPU and GPU for generation? It says CPU or GPU for generation in parameters.

I get 4 tok/s on average with CPU vs 10 tok/s in Edge gallery AI. The only issue is stability if you do anything in the background which requires a GPU you may cut off the generation.

CPU is much more stable but twice as slow vs GPU.

[-]

GeeekyMD@reddit (OP)

yes i can say it runs smoothly now exactly like how it runs on edge gallery

[-]

SupremeLisper@reddit

How? Can you share your setup? The repo you linked only shares things about openclaw setup.

[-]

GeeekyMD@reddit (OP)

https://github.com/Mohd-Mursaleen/LiteRT-Server

[-]

arnaudfr78@reddit

How do you connect the LiteRT-server with OpenClaw on Termux on android ? What are the parameters at the onboarding stage ?

[-]

mapleaikon@reddit

Can you share how to implement LiteRT with HTTP server wrapper. I'm trying to build an Android app but not yet finish

[-]

GeeekyMD@reddit (OP)

https://github.com/Mohd-Mursaleen/LiteRT-Server

[-]

Ok_Warning2146@reddit

try compile llama.cpp with vulkan. That can give u a few t/s

[-]

GeeekyMD@reddit (OP)

the performance gap is huge between llama.cpp and LiteRT

[-]

GeeekyMD@reddit (OP)

Details + code:

Experiment write‑up: https://geekymd.me/blog/running-local-llm-on-android

Termux / OpenClaw setup: https://github.com/Mohd-Mursaleen/openclaw-android

Android automation agent: https://github.com/Mohd-Mursaleen/android-automation-agent

[-]

New_Comfortable7240@reddit

So to be clear, you use your computer via ADB to run the model on the phone? Maybe next step is create the APK and add to releases on your repo

[-]

GeeekyMD@reddit (OP)

noooo,,

the app is in the phone ,, everything runs on phone no computer needed