I built AmicoScript: A local-first Whisper UI with Speaker Diarization and Ollama integration for summaries.

Posted by seamoce@reddit | LocalLLaMA | View on Reddit | 0 comments

I built AmicoScript: A local-first Whisper UI with Speaker Diarization and Ollama integration for summaries.

Hey everyone,

I wanted a streamlined way to turn my audio recordings into usable data without touching the cloud. I put together AmicoScript, a FastAPI-based tool that glues together Whisper, Pyannote for speaker ID, and Ollama for the final LLM processing.

The Workflow:

  1. Transcription: Uses faster-whisper (supports everything from tiny to large-v3).
  2. Diarization: Labels speakers (Speaker 0, Speaker 1, etc.) locally via Pyannote.
  3. Inference: Once the transcript is ready, you can send it to your local Ollama instance.
  4. Refining: Use your favorite local models (Llama 3, Mistral, etc.) to summarize the meeting, extract action items, or clean up the transcript.

Tech Specs:

I'm particularly interested in how you guys find the prompt-to-Ollama transition and if you'd like to see more structured output options (like JSON schema for the summaries).

Repo:https://github.com/sim186/AmicoScript

Would love to hear what you think or if you have ideas on how to optimize the pipeline further!