Best open TTS/ASR model with accurate timestamps

Posted by pvrlek@reddit | LocalLLaMA | View on Reddit | 5 comments

WhisperX with large-v2 works okay-ish for my use case, for the most part, with timestamp accuracy only dipping with slightly chaotic audio. I haven't been able to keep up with what the SOTA is here, just wondering what your guys' real world experiences are.

I'd appreciate any info here, this community has been immensely helpful. Thank you all!