Any recent alternatives for Whisper large? English/Hindi STT
Posted by dnivra26@reddit | LocalLLaMA | View on Reddit | 16 comments
Have been using whisper large for my STT requirements in projects. Wanted get opinions and experience with
- Microsoft Vibevoice
- Qwen3 ASR
- Voxtral Mini
Needs to support English and Hindi.
WhisperianBerries@reddit
There is Sarvam for Hindi/Hinglish but those are cloud models, not local
here's a small benchmark I found that has a couple of local models, but nothing recent:
https://github.com/AI4Bharat/vistaar
dnivra26@reddit (OP)
repo is quite outdated. and looking for open source ones
WhisperianBerries@reddit
https://voice-of-india.ai.joshtalks.com/ lists AI4Bharat IndicConformer (The only local model in those rankings)
Anxious_Serve_8520@reddit
my own homemade TTS for hinglish, it's not voice cloning, it's serious TTS for hinglish specially designed for India, natural as hell, architecture is novel, took me 6 months to make, 5.5 months just to record audio and transcribing ..and bla bla..chk please
https://x.com/ramanbose82/status/2042178238982783128
Anxious_Serve_8520@reddit
my own homemade TTS for hinglish, it's not voice cloning, it's serious TTS for hinglish specially designed for India, natural as hell, architecture is novel, took me 6 months to make, 5.5 months just to record audio and transcribing ..and bla bla..
https://x.com/ramanbose82/status/2042178238982783128
KokaOP@reddit
cohere is there for english
dnivra26@reddit (OP)
i mentioned required support for Hindi :|
KokaOP@reddit
u mentioned both
InitialFox8963@reddit
may I know if you have resources ? if yes, what exactly? plus you can try mms-1b or mms-300m params.
dnivra26@reddit (OP)
yep have a p5 48x large
InitialFox8963@reddit
The requirement is only hindi and english, correct? then I'd say go for xlsr or mms models. they are open-source as well.
draconisx4@reddit
I've swapped out Whisper for some newer open-source ASR options and found they handle English and Hindi well with a bit of fine-tuning for dialects. Go with the lighter models if you're watching resources, but test for accuracy on real-world audio. Community benchmarks can point you to the best performers right now.
dnivra26@reddit (OP)
so helpful without mentioning the actual models that worked for you
draconisx4@reddit
Yeah, you're right I kept it vague to avoid endorsing specifics, but Wav2Vec2 has been a solid pick for me with English and Hindi, especially after a little fine-tuning. Give it a shot and check the latest benchmarks.
TheActualStudy@reddit
I know Parakeet doesn't work in Hindi, but have you tried it for English? It's quite good.
dnivra26@reddit (OP)
Hindi is a must have for some of my projects.