Speaker Diarization

Posted by mtwn1051@reddit | Python | View on Reddit | 5 comments

Hi, I am building a transcript solution with AI Analytics over it.

I need to perform speaker diarization on an call recording file.

I have explored cloud solutions from Azure and Google per they are bad.

I have then tried opensource solutions Pyannote Audio and currently trying Nvidia Nemo.

In my testing so far, Nvidia Nemo is better in terms of accuracy as well as performance.

Can I take this to production? What other options should I try?