How to transcribe a video and summarize it?

Posted by bullerwins@reddit | LocalLLaMA | View on Reddit | 29 comments

Hi!

I’m looking for a tool to locally summarize a video. It doesn’t really need to be a video, just the audio portion is fine. As the topic of the videos would be meetings, what’s going on in the visual part doesn’t really matter.

I guess the best way would be to first transcribe the audio to text. And then ask an LLM to summarize it (that should be as easy as copying, pasting and asking any decent model to do it once you have the text). It’s the audio/video to text part the one I’m missing.

Are there any decent ones? I’m mostly used to textgen-webui, silly tavern etc.

[-]

Majestic_Mission9845@reddit

If you’re already planning to feed the text into an LLM, you might want something that gives structure upfront. This online tool gives key-moments plus transcript, and then I do the summary separately. The Free Video Transcript piece handles casual stuff without needing local installs or GPU setups.

[-]

Previous_Button_3258@reddit

[-]

Away_You9725@reddit

Maybe you can try Vizard. I used to use Otter to transcribe the content of my podcast videos. Later, because I wanted to cut my blog content into shorts, I found Vizard to meet this editing need. It can use AI to generate the text content of the video while helping me edit the video quickly. Also, its transcription accuracy is good, and the text recognition precision is quite high.

[-]

SympathyAny1694@reddit

If you're mainly after transcription + summarization for video meetings, this transcribing app might be worth a look. You just drop in a YouTube link (or upload audio directly), and it transcribes the whole thing. no time limits. Then it can summarize, extract action items, or even chat with the transcript using GPT-4o.

[-]

Remarkable-Rub-@reddit

Yeah, that’s pretty much the flow I use too, transcribe the audio first, then summarize with GPT. I’ve been using this AI note taker that works well with audio files and YouTube links (it pulls and transcribes the audio automatically).

[-]

Stepfunction@reddit

You could look into using OpenAI's Whisper model for transcription. The instructions on their repo are pretty easy to follow. You can then just take the output and throw it at your LLM of choice.

[-]

Remarkable-Rub-@reddit

Whisper is definitely a solid choice for transcription. If you’re looking for a more hands-off solution, you might want to check out VOMO AI—it automatically transcribes videos (like YouTube) and generates structured summaries in one go.

[-]

Swimming_Treat3818@reddit

I’ve been using VOMO AI for this. It lets you import video or audio, transcribes it automatically, and even generates a structured summary with key points. Super useful for meetings or long discussions

[-]

Swimming_Treat3818@reddit

Yeah, ChatGPT can’t pull transcripts from YouTube directly. If you’re looking for something that does, VOMO AI lets you paste a YouTube link, automatically imports captions, and generates a structured summary. Super helpful for long videos

[-]

Hinged31@reddit

Do not want to steal u/kryptkpr 's thunder, but his tldw is what you're looking for: https://github.com/the-crypt-keeper/tldw

[-]

bullerwins@reddit (OP)

did he write the code? just to thank him

[-]

Swimming_Treat3818@reddit

For your use case, VOMO AI could be a great fit. It’s an all-in-one solution that transcribes audio from videos or meetings into text with high accuracy, and it can also summarize the content directly. Plus, it’s designed to handle meeting contexts, saving you the hassle of switching between tools.

[-]

kryptkpr@reddit

Yep this is my project. Glad you found it helpful!

[-]

Spiritbreake@reddit

Hi
Is this working with other languages as well? I want to get transcripts of some Youtube videos in Turkish

[-]

kryptkpr@reddit

Whisper supports lots of languages, as long as the LLM you use marches it should be ok but never tried not English

[-]

Spiritbreake@reddit

Thank you, I will check this out asap

[-]

orkutmuratyilmaz@reddit

Have you tried it on Turkish content too?

[-]

hotmerc007@reddit

hi, quick newbie question - the windows installer script seems to 404. Do you have an updated copy by chance?
Many thanks
Adam

[-]

can_a_bus@reddit

Hey there! This work you have done is amazing. I was hoping to install it and was wondering what kind of LLM's do you run in order to process videos? I am currently trying to get this to run in a docker container alongside my ollama instance and openweb-ui container. I believe it requires launching the webserver under 0.0.0.0 instead of 127.0.0.1 in order for someone to hit it from outside of docker. I will make a PR if I get it to work.

[-]

kryptkpr@reddit

Check out the "LLMs for Offline/Private Use" section of the README for a detailed list of suggested models, and yes you will need to either bind to 0.0.0.0 and use your LAN IP to hit your machine from inside docker.

[-]

timothy-102@reddit

try this: videochat.timcvetko.com

[-]

bhamada@reddit

Hey!

I was in the same boat, needing a way to transcribe videos easily. I ended up making Transcrib.ee—it’s free and super quick for turning YouTube videos into text. You just paste the link, and it gives you a transcript.

If you’re on Chrome, there’s also a Chrome extension that lets you grab transcripts right from the YouTube player.

Once you have the text, you’re right—it’s easy to throw it into any LLM for a summary. Hope this helps!

[-]

dinoleif@reddit

In case you're still searching, I'm the creator of TurboScribe (https://turboscribe.ai) which you might find useful for your videos.

It's free up to 3 files per day (30 minutes per file). If you need more, you can upgrade for unlimited transcriptions (up to 10 hours long each). We support all major video and audio formats. You can download as text transcripts or as subtitles depending on your needs as well.

Afterwards, you can also use the TurboScribe GPT to easily summarize or "chat" with any of your transcripts as well.

I hope that helps and I hope you land on a great solution 😃

[-]

klapauciosa@reddit

https://downsub.com/ me funciona súper. después puedes escuchar el video en rápido y verificar que se haya escrito todo ok, porque hay que corregir algunas cosas mínimas.

[-]

fourwindz123@reddit

If you are looking for an existing toll and don't want to code then www.nutshellpro.com will create a transcript and summarize it with bullet points.

[-]

AIToolsMaster@reddit

I've tried many tools personally as a working student. For transcribing and summarizing videos, Tactiq is a solid choice. It not only transcribes audio to text but also provides summaries and action items from your meetings. This makes it easier to get a quick overview without having to go through the entire transcript. 👌🏻

[-]

Public-Remove4229@reddit

Hey there! You might want to check out Lstn for transcribing your meeting videos. It’s really user-friendly and great for converting audio to text quickly. Plus, it uses advanced AI to ensure accuracy.

[-]

Jumper775-2@reddit

Easy peasy Use ffmpeg to convert the video to a 16khz mono wav formatted file, then download whisper.cpp and a whisper model (use whisper.cpp). Next just copy paste that into whatever llm you want or pipe it to llama.cpp with whatever model you want with a system prompt telling it to summarize the provided text as if it were the transcript of a video. This could probably be made in a sweet bash one liner.

[-]

ali0une@reddit

Thanks mate, i finally got started, i had wanted to try it for a long time. The results are fascinating!