Unused phone as AI server
Posted by Ok_Fig5484@reddit | LocalLLaMA | View on Reddit | 27 comments
If you have an unused phone lying around, you might be sitting on a tiny AI server
I’ve been working on a project where I modified Google AI Edge Gallery and turned it into an OpenAI-compatible API server: [Gallery as Server](https://github.com/xiaoyao9184/gallery)
Your phone can run local AI inference
You can call it just like an OpenAI API (chat/completions, etc.)
Instead of letting that hardware collect dust, you can turn it into a lightweight inference node.
So yeah—if you have more than one old phone, you can literally build yourself a cluster.
Mac_NCheez_TW@reddit
I've been looking for something like this to run small local LLMs on an ROG 8 with 24gb of ram. I have a bunch of phones I wanted to do this with. Tool usage with them would be nice.
Qwen30bEnjoyer@reddit
I would be more interested in using old androids as an ubuntu VPS, especially if they have 24gb of RAM.
Has anyone done similar?
Mac_NCheez_TW@reddit
That would be cool.
Ok_Fig5484@reddit (OP)
The officially recommended model, Gemma-4-E4B-it, requires 12GB of memory. Due to the design of the Gallery App, it can only load one model at a time, and concurrent inference is also not supported, so 24GB is really too much.
Mac_NCheez_TW@reddit
No, more ram! More! Lol.
moneylab_ai@reddit
This is a really clever use of hardware that would otherwise just sit in a drawer. The OpenAI-compatible API layer is the smart part -- it means you can slot it into existing toolchains without rewriting anything. I am curious about the practical throughput though. Even with something like a Snapdragon 8 Gen 3 and 12GB+ RAM, you are probably limited to smaller models (3-7B). For a phone cluster setup, have you looked into any kind of load balancing or request routing across multiple devices? That could make the aggregate throughput actually useful for lightweight local inference tasks like classification or summarization.
AtypicalComputers@reddit
This is great! I spent some time trying to get ollama deployed as a docker on the built in terminal in a pixel. This seems to be a much easier way of accomplishing the same thing. Excited to try it out!
Ok_Fig5484@reddit (OP)
One of the more challenging issues is that the model is in lithelm format, and there aren't many available models on https://huggingface.co/litert-community.
AtypicalComputers@reddit
I'm not seeing the server option when downloading the app from the play store. Is the apk in the GitHub more up to date?
Ok_Fig5484@reddit (OP)
The original repository does not currently accept community contributions. Please use version 1.0.11-as0.1.0 released from my forked repository.
AtypicalComputers@reddit
Yup, got it! Running and inferring. Much easier than having to go through the terminal! If there's any way to add metrics similar to llama.cpp, that would be a great addition! Looking forward to the project!
Dazzling_Equipment_9@reddit
I saw you posted this yesterday: "Open source Android app for native tool calling with Claude", but I noticed you deleted it today. Your demo video also used the same one :)
Ok_Fig5484@reddit (OP)
What are you talking about? You've got it wrong, that's not me.
Dazzling_Equipment_9@reddit
Isn't that right? Uh, maybe someone else is doing the same thing as you. Don't worry about it, bro.
Ok_Fig5484@reddit (OP)
It’s definitely not that, because this app has very limited ability to call native Android system features unless a lot of additional coding is done. When used as an API server, it only returns structured function outputs. From what I’ve observed, the model doesn’t return a response—instead, the tool function gets called directly. I have to admit I haven’t fully figured this out yet, and if anyone has solved this issue, I’d be very interested in taking a look.
Danmoreng@reddit
I would recommend to not use the edge gallery app as base, but only as reference and implement a much simpler server app from scratch. With whatever you used to make your modifications (I assume Claude/Codex/Gemini), it should be easy to do a clean from scratch implementation as well. For example, I did something similar for my transcription app where I let codex first analyse the edge ai gallery app vs what my app had already, to figure out how to implement the new Gemma models into my app: https://github.com/Danmoreng/vox-transcribe/tree/main/docs
Ok_Fig5484@reddit (OP)
Yes, I started creating it directly without analyzing the core principles of the gallery. Only during the creation process did I discover that the model's loading lifecycle follows the UI, and only one model is used at a time. This ultimately led me to add a custom task icon.
Illustrious-Lake2603@reddit
Im interested in the cluster idea. Will this work to link 4 phones together?
Ok_Fig5484@reddit (OP)
Clusters can only be placed behind load balancers to increase concurrency.
moneylab_ai@reddit
This is a really clever use of hardware that would otherwise just sit in a drawer. The OpenAI-compatible API layer is the smart part -- it means you can slot it into existing toolchains without rewriting anything. I am curious about the practical throughput though. Even with something like a Snapdragon 8 Gen 3 and 12GB+ RAM, you are probably limited to smaller models (3-7B). For a phone cluster setup, have you looked into any kind of load balancing or request routing across multiple devices? That could make the aggregate throughput actually useful for lightweight local inference tasks like classification or summarization.
Uriziel01@reddit
Yeah I also love the idea, discussed this yesterday as I think this can be really useful for a generic agent for basic google-like questions https://www.reddit.com/r/LocalLLaMA/comments/1sfvy4x/could_gemma_4_breathe_new_life_into_cheap/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
Uriziel01@reddit
Hahaha u/Ok_Fig5484 we did the exact same thing :D https://github.com/Uriziel01/gallery/
niyandathaal@reddit
Good, idea Im currently using a raspberry pi
ArcadiaBunny@reddit
Pretty genius
Lumienca@reddit
Good idea 😊
ghulamalchik@reddit
Really nice idea.
Ok_Fig5484@reddit (OP)
Since there's no quiet GPU, let's use a mobile phone.