Best model for speech to text Transcription for including filler words ?
Posted by Similar-Camp9685@reddit | LocalLLaMA | View on Reddit | 2 comments
Hey everyone, I want to perform speech-to-text transcription in which I have to include filler words like: um, ah, so etc. which highlight confidence. Is there any type of model which can help me? I tried WhisperX but the results are not favorable. This is very important for me as I'm writing a research paper.
Elibroftw@reddit
Following. I just used Windows' built in voice typing 10 minutes ago and the output is schizophrenic. I can't even use Copilot to rewrite it (i.e. Content Blocked) because I wrote about sex, drugs, and swearing. The voice typing is so garbage that when I said fuck, fucking, or fucked, it bleeps it out ****.
I need a good open source or self-hosted solution to do dictation -> clean up -> rewrite with oomph.
thejoyofcraig@reddit
Filler words don’t seem a priority with most current models. Check out nvidia’s parakeet series. 1.1b tdt does better at filler words in my personal experience. Beyond that API ones like speechmatics (not open) or eleven labs recent STT might give you better results.