Building a chatbot with ASR - Need Advice

Posted by Excellent-Couple-394@reddit | LocalLLaMA | View on Reddit | 5 comments

I’ve been working on building a chatbot, and one of the features I want to include is speech-to-text. Since I’m part of a startup, budget is definitely a constraint. At the same time, due to security and compliance requirements, I’d prefer to avoid relying on external APIs.

For an MVP or pilot launch, I’m trying to figure out which ASR approach or architecture would make the most sense to start with. I’ve been looking into options like Whisper, Parakeet, etc., but I’m a bit unsure about the best starting point given my constraints but also having the low latency criteria.

Would really appreciate any suggestions or insights from people who’ve worked on something similar, especially around trade-offs between self-hosted models vs APIs, performance, and ease of deployment (I am ready to take on the challenge for deployment).

[-]

jtjstock@reddit

ASR and TTS can run on a potato, pick small dedicated models that are optimized, it will be fast and highly accurate

Forgot to mention, moonshine is good for asr, it does diarization pretty well, kokoro is good for tts

Excellent-Couple-394@reddit (OP)

I was checking out moonshine. It seems pretty impressive. But how well does it work with mobile applications? Does it stay smooth when multiple apps are running in the background, or does it tend to slow down?

So far I’ve been hosting it on a server rather than the edge device though.. I may give it a go to see how it does

KokaOP@reddit

there is this nano-parakeet a pure PyTorch implem i would use it,
with

(RNNoise or deepfilterNetV2) + vad (silero) -> parakeet