Best way to process TikTok/IG Reels URLs -> Extract Audio -> Transcribe to Text for LLM?
Posted by Sea_Lawfulness_5602@reddit | learnprogramming | View on Reddit | 0 comments
Hey everyone,
I'm currently building an AI-powered fact-checking app (Flutter frontend + Python/FastAPI backend). Right now, the app successfully analyzes long-form YouTube videos by fetching their transcripts and running them through an LLM pipeline.
However, I want to expand the app to support short-form content (TikTok, Instagram Reels, YouTube Shorts) where captions aren't always reliably available via APIs.
The desired workflow:
- User pastes a TikTok or IG Reel URL into the app.
- The backend downloads/extracts only the audio (e.g., MP3/M4A) from that URL.
- The backend runs the audio through a Speech-to-Text model (like Whisper) to get the transcript.
- The transcript is fed into my existing LLM fact-checking pipeline.
My questions for the community:
- Extraction: I know IG and TikTok are notoriously aggressive against scraping. Is
yt-dlpstill the most reliable tool for extracting audio from these platforms in a production backend, or are there better alternatives/APIs? - Transcription: For the STT part, is it better (cost/speed-wise) to use OpenAI's Whisper API directly, or host a smaller Whisper model locally on my server (e.g., using
faster-whisper) since these are just 15-60 second clips? - Infrastructure: Any tips on handling the temporary audio files? Should I process this entirely in memory (RAM), or save to
/tmpand delete after transcription?
Any advice, libraries, or architectural tips would be greatly appreciated. Thanks!