Speaker Diarization

Posted by mtwn1051@reddit | Python | View on Reddit | 5 comments

Hi, I am building a transcript solution with AI Analytics over it.

I need to perform speaker diarization on an call recording file.

I have explored cloud solutions from Azure and Google per they are bad.

I have then tried opensource solutions Pyannote Audio and currently trying Nvidia Nemo.

In my testing so far, Nvidia Nemo is better in terms of accuracy as well as performance.

Can I take this to production? What other options should I try?

[-]

Python-ModTeam@reddit

Hi there, from the /r/Python mods.

We have removed this post as it is not suited to the /r/Python subreddit proper, however it should be very appropriate for our sister subreddit /r/LearnPython or for the r/Python discord: https://discord.gg/python.

The reason for the removal is that /r/Python is dedicated to discussion of Python news, projects, uses and debates. It is not designed to act as Q&A or FAQ board. The regular community is not a fan of "how do I..." questions, so you will not get the best responses over here.

On /r/LearnPython the community and the r/Python discord are actively expecting questions and are looking to help. You can expect far more understanding, encouraging and insightful responses over there. No matter what level of question you have, if you are looking for help with Python, you should get good answers. Make sure to check out the rules for both places.

Warm regards, and best of luck with your Pythoneering!

[-]

thatphotoguy89@reddit

Maybe try this? Works with pyannote as well https://github.com/Vaibhavs10/insanely-fast-whisper

[-]

mtwn1051@reddit (OP)

I want solutions for Diarization. For Speech to Text using Gemini LLM, found it much better than OpenAI Whisper and Google's own Cloud Speech to Text

[-]

thatphotoguy89@reddit

Looks like this has its own diarization pipeline here: https://github.com/Vaibhavs10/insanely-fast-whisper/blob/main/src/insanely_fast_whisper/utils/diarization_pipeline.py

[-]

mtwn1051@reddit (OP)

This uses Pyannote Audio only.