CrisperWhisper ranks #2 on Open ASR Leaderboard
Posted by vaibhavs10@reddit | LocalLLaMA | View on Reddit | 10 comments
Hi All,
I'm VB, GPU Poor at Hugging Face. We ran the speech recognition benchmarks for a relatively new Whisper-large-v3 fine-tune and it now ranks #2 on the Open ASR Leaderboard. 🔥
CrisperWhisper aims to transcribe every spoken word exactly as it is, including fillers, pauses, stutters and false starts.
Fine-tuned from Whisper Large V3 it beats it by roughly ~1 WER margin âš¡
Kudos NyraHealth team - Open Speech Recognition scene is heating up!
You can find the Leaderboard here: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
What would you like to see on the leaderboard next? Keen on your feedback!
keniget@reddit
CW is awesome, thanks a lot for providing it out in the open!
for my iOS app recently I went through all the transcription services (elevenlabs, etc) and models, and landed finally on crispy whisper + openai tts.
My only challenge left is how to highlight the text when they are rendered in markdown as the text transcribed and the markdown rendered are different.
Zemanyak@reddit
Thanks for this fine-tune. We need Whisper 4 or any new model. Whisper 3 was not much of an improvement.
az226@reddit
They thought unsupervised data would help. It did not.
rangerrick337@reddit
How are people using these to record a meeting g for example? Do you just record the audio in a meeting and then feed the file to the llm via something like Open WebUI?
herozorro@reddit
you just provide it the file in the correct wav format (if i recall 16000) and it will output a transcript
TheActualStudy@reddit
On the leaderboard, CrisperWhisper produces the best results on the AMI dataset, which is audio of meetings. That specific use case is the one I normally use Whisper for, so this sounds like something I should definitely try.
Psychedelic_Traveler@reddit
Any model that is pre trained for removing fillers / being contextual aware ? I really like superwhisper but would prefer to run my own
grim-432@reddit
Yesss!!! We need more progress here!
NoJellyfish6949@reddit
Whisper with word-level timestamps. Great!
YearZero@reddit
Looks great! Adding that to my list of leaderboards!