Fulloch V2: 100% Local Voice Assistant for Home Assistant & Obsidian (Runs on 16GB VRAM)
Posted by liampetti@reddit | LocalLLaMA | View on Reddit | 7 comments
Hey everyone, following up on my r/LocalLLaMA post from a while back, I have spent some time testing how far I can push my 5060ti as a personal voice assistant.
The stack is Qwen3.5-9B GGUF Q5_K_M, Qwen3-1.7B ASR, and Qwen3-1.7B TTS, delivering fast, real-time responses with acoustic barge-in and follow up for better conversations. On top of driving your Home Assistant, V2 now features agentic long-term memory and seamlessly integrates with your local Obsidian vault (or other markdown notes) to read, write and append notes (it won't delete or modify anything). Semantic search of your markdown notes is also available through voice search using the bge embedding model.
Public repo at https://github.com/liampetti/fulloch
I've linked a quick video demo showing the response speed, conversational, barge-in, and semantic note searching features through an included Chat UI.
It also has a bash/bat file for creating your own voices and you can add your own custom wakeword by just typing it into the config (no special wakeword models needed). Everything tested on Linux but Windows supported.
Cluzda@reddit
Cool project and impressive results for those small models! I just looked at the repo.
Would love to spin this up on my server. Unfortunately, I handle my voice with a Home Assistant Voice PE, which seems strictly tied to the HA ecosystem. Meaning I can only use the Wyoming Protocol for TTS and STT.
If I want to host the models on my server anyway. How did you allocate the Qwen models. Do they fit all at once into 16GB VRAM?
synthmike@reddit
The Voice PE firmware is ESPHome and fully open source. If this project has an API, it should be possible to link the two.
Cluzda@reddit
I guess that would be easier than wiring it through Wyoming. But can it do both?
I would like to keep the Voice PE inside of HA too, because some of my automations run with the intent detection and speaker replay.
dangerous_inference@reddit
Qwen3 1.7B ASR is the shit. Did you know you can prompt it with weird words you use and it will properly transcribe them? I just found this out myself.
I made an assistant that runs on very thin context, because it's so much faster. Are you doing transient memory or tool responses? This is my own terminology for the practice of pulling information up for one turn only and then immediately dumping it from context. Should really help VRAM poors.
Infamous_Pause_3856@reddit
This is insanely cool, especially the Obsidian + semantic search bit. That basically turns your notes into an actual second brain you can talk to, locally, on a 5060ti lol.
Love that you went full Qwen stack too. Bookmarked the repo, this is exactly the kind of "Jarvis but actually practical" setup I’ve been wanting to mess with.
liampetti@reddit (OP)
Thanks bot 😁
liampetti@reddit (OP)
Thanks! Definitely give me your feedback once you have tested it.