Best way to process TikTok/IG Reels URLs -> Extract Audio -> Transcribe to Text for LLM?

Posted by Sea_Lawfulness_5602@reddit | learnprogramming | View on Reddit | 0 comments

Hey everyone,

I'm currently building an AI-powered fact-checking app (Flutter frontend + Python/FastAPI backend). Right now, the app successfully analyzes long-form YouTube videos by fetching their transcripts and running them through an LLM pipeline.

However, I want to expand the app to support short-form content (TikTok, Instagram Reels, YouTube Shorts) where captions aren't always reliably available via APIs.

The desired workflow:

  1. User pastes a TikTok or IG Reel URL into the app.
  2. The backend downloads/extracts only the audio (e.g., MP3/M4A) from that URL.
  3. The backend runs the audio through a Speech-to-Text model (like Whisper) to get the transcript.
  4. The transcript is fed into my existing LLM fact-checking pipeline.

My questions for the community:

  1. Extraction: I know IG and TikTok are notoriously aggressive against scraping. Is yt-dlp still the most reliable tool for extracting audio from these platforms in a production backend, or are there better alternatives/APIs?
  2. Transcription: For the STT part, is it better (cost/speed-wise) to use OpenAI's Whisper API directly, or host a smaller Whisper model locally on my server (e.g., using faster-whisper) since these are just 15-60 second clips?
  3. Infrastructure: Any tips on handling the temporary audio files? Should I process this entirely in memory (RAM), or save to /tmp and delete after transcription?

Any advice, libraries, or architectural tips would be greatly appreciated. Thanks!