I built AmicoScript: A local-first Whisper UI with Speaker Diarization and Ollama integration for summaries.
Posted by seamoce@reddit | LocalLLaMA | View on Reddit | 0 comments
Hey everyone,
I wanted a streamlined way to turn my audio recordings into usable data without touching the cloud. I put together AmicoScript, a FastAPI-based tool that glues together Whisper, Pyannote for speaker ID, and Ollama for the final LLM processing.
The Workflow:
- Transcription: Uses
faster-whisper(supports everything fromtinytolarge-v3). - Diarization: Labels speakers (Speaker 0, Speaker 1, etc.) locally via Pyannote.
- Inference: Once the transcript is ready, you can send it to your local Ollama instance.
- Refining: Use your favorite local models (Llama 3, Mistral, etc.) to summarize the meeting, extract action items, or clean up the transcript.
Tech Specs:
- Backend: Python / FastAPI.
- Frontend: Clean Vanilla JS UI (no huge node_modules folders).
- Containerized: Docker-ready (
docker compose up --build). - Privacy: Everything stays on your metal.
I'm particularly interested in how you guys find the prompt-to-Ollama transition and if you'd like to see more structured output options (like JSON schema for the summaries).
Repo:https://github.com/sim186/AmicoScript
Would love to hear what you think or if you have ideas on how to optimize the pipeline further!