Gemma 4 vs Whisper
Posted by HuntKey2603@reddit | LocalLLaMA | View on Reddit | 6 comments
Working on building live Closed Captions for Discord calls for my TTRPG group.
With Gemma being able to do voice transcription and translation, does it still make sense to run Whisper + a smaller model for translation? Is it better, faster, or has some non obvious upside?
Total noob here, just wondering. Asking what the consensus is before tackling it.
PersonalityBusy9022@reddit
I’ve had great luck with NVIDIA Parakeet v3. It can do 25 languages. For live closed captions you would need streaming though, so maybe check out this one based on the same technology? https://huggingface.co/nvidia/multitalker-parakeet-streaming-0.6b-v1
Looks cool. Thinking of using it for a meeting notes feature in my local speech to text app.
HuntKey2603@reddit (OP)
I see, so there's still value in using specific models instead of Gemma, I see?
Thanks for your response!
PersonalityBusy9022@reddit
Yes, specific ASR models will still beat Gemma at this task. The way I’m using Gemma in that speech to text app is:
Parakeet v3 -> Gemma for text cleanup (filler removal, formatting, self correction, etc.)
For use case it will be interesting to see how you can manage the local translation + transcription running real time. A fun challenge!
Ok-Distance-2735@reddit
HuntKey2603@reddit (OP)
Indeed, I hope it works! Would do wonders to help break the language barriers in my TTRPG game.
Adventurous-Paper566@reddit
Parakeet a l'avantage de pouvoir fonctionner directement sur CPU.