I built AmicoScript: A local-first Whisper UI with Speaker Diarization and Ollama integration for summaries.

Posted by seamoce@reddit | LocalLLaMA | View on Reddit | 0 comments

Hey everyone,

I wanted a streamlined way to turn my audio recordings into usable data without touching the cloud. I put together AmicoScript, a FastAPI-based tool that glues together Whisper, Pyannote for speaker ID, and Ollama for the final LLM processing.

The Workflow:

Transcription: Uses faster-whisper (supports everything from tiny to large-v3).
Diarization: Labels speakers (Speaker 0, Speaker 1, etc.) locally via Pyannote.
Inference: Once the transcript is ready, you can send it to your local Ollama instance.
Refining: Use your favorite local models (Llama 3, Mistral, etc.) to summarize the meeting, extract action items, or clean up the transcript.

Tech Specs:

Backend: Python / FastAPI.
Frontend: Clean Vanilla JS UI (no huge node_modules folders).
Containerized: Docker-ready (docker compose up --build).
Privacy: Everything stays on your metal.

I'm particularly interested in how you guys find the prompt-to-Ollama transition and if you'd like to see more structured output options (like JSON schema for the summaries).

Repo:https://github.com/sim186/AmicoScript

Would love to hear what you think or if you have ideas on how to optimize the pipeline further!