Python package for working with LLM's over voice

Posted by Ok_Train_9768@reddit | Python | View on Reddit | 10 comments

Hi All,

Have setup a python package that makes it easy to interact with LLMs over voice

You can set it up on local, and start interacting with LLMs via Microphone and Speaker

What My Project Does

The idea is to abstract away the speech-to-text and text-to-speech parts, so you can focus on just the LLM/Agent/RAG application logic.

Currently it is using AssemblyAI for speech-to-text and ElevenLabs for text-to-speech, though that is easy enough to make configurable in the future

Setting up the agent on local would look like this

voice_agent = VoiceAgent(
   assemblyai_api_key=getenv('ASSEMBLYAI_API_KEY'),
   elevenlabs_api_key=getenv('ELEVENLABS_API_KEY')
)

def on_message_callback(message):
   print(f"Your message from the microphone: {message}", end="\r\n")
   # add any application code you want here to handle the user request
   # e.g. send the message to the OpenAI Chat API
   return "{response from the LLM}"

voice_agent.on_message(on_message_callback)
voice_agent.start()

So you can use any logic you like in the on_message_callback handler, i.e not tied down to any specific LLM model or implementation

I just kickstarted this off as a fun project after working a bit with Vapi

Has a few issues, and latency could defo be better. Could be good to look at some integrations/setups using frontend/browsers also.

Would be happy to put some more time into it if there is some interest from the community

Package is open source, as is available on GitHub and PyPI. More info and installation details on it here also

https://github.com/smaameri/voiceagent

Target Audience

Developers working with LLM/AI applications, and want to integrate Voice capabilities. Currently project is in development phase, not production ready

Comparison

Vapi has a similar solution, though this is an open source version