How to transcribe a video and summarize it?

Posted by bullerwins@reddit | LocalLLaMA | View on Reddit | 29 comments

Hi!

I’m looking for a tool to locally summarize a video. It doesn’t really need to be a video, just the audio portion is fine. As the topic of the videos would be meetings, what’s going on in the visual part doesn’t really matter.

I guess the best way would be to first transcribe the audio to text. And then ask an LLM to summarize it (that should be as easy as copying, pasting and asking any decent model to do it once you have the text). It’s the audio/video to text part the one I’m missing.

Are there any decent ones? I’m mostly used to textgen-webui, silly tavern etc.