Nvidia Parakeet-Realtime-EOU-120m-v1
Posted by nuclearbananana@reddit | LocalLLaMA | View on Reddit | 10 comments
Parakeet-Realtime-EOU-120m-v1 is a streaming speech recognition model that also performs end-of-utterance (EOU) detection. It achieves low latency (80ms~160 ms) and signals EOU by emitting an
ShengrenR@reddit
Somebody go test it - how's it compare to kyutai's solution
Miserable-Dare5090@reddit
If its parakeet, it’s not TTS…it is Speech TO text.
ShengrenR@reddit
Lol.. continue with your smug look please https://kyutai.org/next/stt
nuclearbananana@reddit (OP)
Initial look tells me kyutai is ~8 times larger, but supports french.
ShengrenR@reddit
Yea - the size certainly adds some slow - the real win in their stt package was the built in VAD to my eyes; looks like this new model is going to fill a similar niche, but with much faster response and inference times thanks to the size.. but at the cost of higher WER etc.
Its-all-redditive@reddit
Kyutai is great for the most part but having the End of Turn baked into the training makes it very hard to correct shortcomings without retraining. For example, it cannot consistently determine end of turn when the user speaks consecutive single digit numbers like a phone number. Most of the time the model just end of turn while the user is still speaking numbers. I have extensively worked on this issue alongside some of the team/contributors and it’s still not figured out.
Miserable-Dare5090@reddit
Parakeet runs faster and has 5% WER on the same benchmarks.
Miserable-Dare5090@reddit
Ok, plant for Kyutai or whatever. I’m not running this model, but I know Kyutai is not beating parakeet V2/V3. RTFx is 3332 for Parakeet and 80 for Kyutai—check the huggingface leaderboard.
Who is a smug motherf**ker now?
Flashy_Management962@reddit
Voice activated teleprompter please
no_witty_username@reddit
Bruh, this is perfect! EOU is such an issue with speech to text models, I was NOT looking forward to hacking my own implementation of this feature for my agent so this is perfect. i hope its good. Thanks.