Building a chatbot with ASR - Need Advice
Posted by Excellent-Couple-394@reddit | LocalLLaMA | View on Reddit | 5 comments
I’ve been working on building a chatbot, and one of the features I want to include is speech-to-text. Since I’m part of a startup, budget is definitely a constraint. At the same time, due to security and compliance requirements, I’d prefer to avoid relying on external APIs.
For an MVP or pilot launch, I’m trying to figure out which ASR approach or architecture would make the most sense to start with. I’ve been looking into options like Whisper, Parakeet, etc., but I’m a bit unsure about the best starting point given my constraints but also having the low latency criteria.
Would really appreciate any suggestions or insights from people who’ve worked on something similar, especially around trade-offs between self-hosted models vs APIs, performance, and ease of deployment (I am ready to take on the challenge for deployment).
jtjstock@reddit
ASR and TTS can run on a potato, pick small dedicated models that are optimized, it will be fast and highly accurate
jtjstock@reddit
Forgot to mention, moonshine is good for asr, it does diarization pretty well, kokoro is good for tts
Excellent-Couple-394@reddit (OP)
I was checking out moonshine. It seems pretty impressive. But how well does it work with mobile applications? Does it stay smooth when multiple apps are running in the background, or does it tend to slow down?
jtjstock@reddit
So far I’ve been hosting it on a server rather than the edge device though.. I may give it a go to see how it does
KokaOP@reddit
there is this nano-parakeet a pure PyTorch implem i would use it,
with
(RNNoise or deepfilterNetV2) + vad (silero) -> parakeet