Built a Voice Agents from Scratch GitHub tutorial: mic > Whisper > local LLM (GGUF) > Kokoro > speaker, fully local, no API keys

Posted by purellmagents@reddit | LocalLLaMA | View on Reddit | 11 comments

Been building this for a while and finally cleaned it up enough to share.

voice-agents-from-scratch is a numbered, chapter-by-chapter repo that walks the full real-time pipeline:

Everything streams - you don't wait for the full LLM response before TTS starts speaking. That's the part that makes it feel like a real conversation instead of a chatbot with a voice skin.

Each chapter is a runnable script + a short CODE.md walkthrough. There's also a small shared library so you can see how the pieces compose into a real system, not just isolated calls.

Why fully local matters here: you can actually see where latency lives. Warm-up, first-audio time, streaming chunk size - these aren't abstractions when you're running it on your own machine.

Repo: https://github.com/pguso/voice-agents-from-scratch

Happy to answer questions about the architecture or tradeoffs I ran into.