PAI: your personal AI 100% local inspired by Google's Project Astra
Posted by Such_Advantage_6949@reddit | LocalLLaMA | View on Reddit | 17 comments
Inspired by Google's Project Astra, I have created an App for audio + video chat bot that is 100% local and open source.
Features:
- iOS app
- 100% locally hosted
- Open Source
- Visual Question answer
- Streaming via RTC & Livekit for low latency
- Screen Sharing
- Live transcription
- Change LLM to any model supported by Exllama v2
Here is a short 2 mins demo: https://youtu.be/pNksZ_lXqgs
Repo: https://github.com/remichu-ai/pai.git
This is a STT + LLM + TTS, so feel free to skip if it is deal breaker for you.
Numerous-Aerie-5265@reddit
A bit confused about the backend, do I just install gallama on my pc and input its server and authentication IP into the mobile app?
Such_Advantage_6949@reddit (OP)
You can refer to the pai-agent repo. On top of the gallama end, there need to be the pai agent running, as well as livekit and authentication server. The setup is more complicate than normal llm backend cause it need to handle audio and video live streaming (which is via webrtc protocol)
I suggest the easiest way to start is to look into the readme and docker compose file of pai-agent repo, the docker-compose outline everything needed to run. if you have further question, just raise an issue on github and i will try to assist as much as i can.
bennmann@reddit
Will you make an HTML5 front end GUI that is future proof regardless of handheld OS?
Such_Advantage_6949@reddit (OP)
Thank for the idea, i think it is a good direction but might not be at the priority at the moment.
Because think most of the value to be unlocked at the moment are at the backend level. Such as: have the chatbot trigger function calling to complete task such as send email, check calendar, build a memory system so that it can remember the conversation etc.
Once the backend is solid, i think front end can be further developed.
bennmann@reddit
it would be good to implement a secondary "better text" backend for text only domain
maybe have a simple toggle so a user could elect to have a Qwen2.5 instruct 32B 3-4 bit load up on the server for text-only domain
Such_Advantage_6949@reddit (OP)
For text only there are some good one around like enchanted. Do check it out. Of course, i would like to add it someday also, however the text only api and the audio/ video api works in totally different way.
Barry_Jumps@reddit
Impressive work!
winkler1@reddit
Really well done video! It's rare someone puts that much care/attention into one
Such_Advantage_6949@reddit (OP)
Really appreciate your kind words!
Puzzled-Purple5@reddit
Sorry for the noob question: What input do I provide for Main server & Authentication?
Such_Advantage_6949@reddit (OP)
In the repo, there is url to another repo named pai-agent. Which are the service u need to run on the machine. The setup is more complicated as it is using webrtc similar to openai. The benefit is it work well even outside your house. U can use tailscale and use the app outside the house
ProfessorCentaur@reddit
Does it support vocal interrupt? Way cool!
Such_Advantage_6949@reddit (OP)
Yes it supports
Mandelaa@reddit
Nice!
In future, we'll be planned to make Android app?
Such_Advantage_6949@reddit (OP)
Thanks, but probably no at the moment, as i would like to focus on building up the personal agent part e.g.memory, function calling. To expand the app to android, i can work on that after the agent is more matured
You_Wen_AzzHu@reddit
Genuine question: why do you choose gallama as the backend?
GreatBigJerk@reddit
This is super cool, and looks like it works well. I don't use iOS, otherwise I'd give it a spin.
... That said, you should probably cut your nails. You're going to take an eye out with those claws.