player2text – local audio/video transcription API with auto-compression (FastAPI + faster-whisper)

Posted by FishermanScared4585@reddit | Python | View on Reddit | 3 comments

**What My Project Does**

player2text is a local REST API that transcribes audio and video files to text using faster-whisper. Before transcription, it automatically runs every file through ffmpeg — stripping the video stream and downsampling audio to 16kHz mono WAV (Whisper's native rate). A 300MB video becomes \~5MB before the model ever touches it, which makes CPU transcription actually practical.

Send a file to `POST /api/v1/transcribe`, get back the full transcript, detected language, duration, and timestamped segments. No API keys, no cloud, nothing leaves your machine.

Supports: mp3, mp4, mov, avi, mkv, wav, m4a, flac, webm and more.

**Target Audience**

Developers who need a self-hosted transcription API they can drop into a project without a cloud dependency. Useful for transcribing meeting recordings, interview audio, lecture videos, or any content where you don't want to send data to a third party. Production-ready as a backend service — a React frontend is in progress.

**Comparison**

- **OpenAI Whisper API** — cloud-based, costs money per minute, data leaves your machine. player2text is fully local and free.

- **whisper (openai/whisper on PyPI)** — the original library, slower and heavier. player2text uses faster-whisper (CTranslate2 backend) which is \~4x faster with \~50% less RAM on CPU.

- **whisper.cpp** — great C++ implementation, but requires more setup. player2text wraps everything in a FastAPI service so it's immediately usable over HTTP with zero extra config.

GitHub: https://github.com/fashan7/audio-to-text