We just shipped Gemma 4 support in Off Grid 🔥- open-source mobile app, on-device inference, zero cloud. Android live, iOS coming soon.

Posted by CamusCave@reddit | LocalLLaMA | View on Reddit | 16 comments

We shipped Gemma 4 (E2B and E4B edge variants) in Off Grid today — our open-source, offline-first AI app for Android and iOS.

What makes this different from other local LLM setups:

→ No server, no Python, no laptop. Runs entirely on your phone's NPU/CPU.
→ Gemma 4's 128K context window, fully on-device — finally useful for long docs and code on mobile.
→ Native vision: point your camera at anything and ask Gemma 4 about it.
→ Whisper speech-to-text, Stable Diffusion image gen, tool calling — all in one app.
→ ~15–30 tok/s on Snapdragon 8 Gen 3 / Apple A17 Pro.
→ Apache 2.0 model, MIT app — genuinely open all the way down.

Gemma 4's E2B variant running in under 1.5GB RAM on a phone is honestly wild. The E4B with 128K context + vision is what we've been waiting for.

Android (live now): https://play.google.com/store/apps/details?id=ai.offgridmobile
iOS: coming soon
GitHub (MIT): https://github.com/alichherawalla/off-grid-mobile-ai

Would love to hear tok/s numbers people are seeing across different devices. Drop them below.

[-]

kmorg80@reddit

this is a great app. but please improve voiceover screen reader accessibility for blind users. lots of unlabelled buttons. run apple's accessibility tool cheers

[-]

MrSilencerbob@reddit

The context window length is not 128k.. the model is but not in Off Grid, or Google's Edge app either.. it's only 32k....

[-]

AddendumHot6863@reddit

What are you using the local models for?

[-]

Omnimum@reddit

There is a way to use MCP servers or search the web from models ?

[-]

CamusCave@reddit (OP)

Interesting! Absolutely we are figuring out ways to increase our coverage to most models! - curious: are you currently using bonsai and what are you using it for?

[-]

TheWaywardOne@reddit

Mostly that it's an 8B 1-bit model that can fit on a phone!

Was primarily using it as a general text/tool calling agent, and figured it was worth trying for a "bigger" model on mobile.

[-]

mtmttuan@reddit

Runs entirely on your phone's NPU/CPU.

No NPU support. Seems like you guys are using llama cpp under the hood but that's just lying.

[-]

mr_Owner@reddit

Does it provide also a http api endpoint?

[-]

CamusCave@reddit (OP)

Hey, you can use Ollama, LMstudio etc on your laptop and use powerful models on a network using an http endpoint.

[-]

mr_Owner@reddit

Reason i was asking was to serve a smartphone as an llm http api openai compatible endpoint.

Models are getting smaller and efficient, i find it worth to repurpose smartphones for small specific tasks.

[-]

CamusCave@reddit (OP)

Bang on! I think there's a use case here! - noting this down - thank you.

[-]

Its not very good, just crashes every time I tried to load any version of gemma 4 e4b from the download list and then suddenly the download told me no models were compatible with my phone despite downloading only a few minutes before. Had to spend over 5 minutes importing my local model only to be told I can't import an mmproj for it because the repair feature doesn't work. To add insult to injury when I did finally load my local text only version of e4b anyways it just refused to give me a response and or ever process a token. Do you just not support 8 elite gen 5? Either way I'm going back to pocketpal. Please try harder next time.

[-]

CamusCave@reddit (OP)

I'm sorry you had to face this experience! We are experiencing some issues with snapdragon 8 elite gen 5 - Trying to fix this in the next release!

[-]

austhrowaway91919@reddit

How does this compare to the official 'Edge Gallery's release?

[-]

CamusCave@reddit (OP)

Honestly they are gods and I don't think we can compete with them on inference at this point.