I built an open source, terminal first, voice-to-text tool for Linux desktops because most dictation tools are Mac-first
Posted by stengods@reddit | linux | View on Reddit | 27 comments
When switching to Linux from Mac, I missed having a nice easy to use speech-to-text tool.
The apps I found either didn’t work very well, didn’t support many providers, or only supported local models, which doesn’t work well for me since I speak Swedish and those local models are mostly English. I also like the idea of it being terminal-first and scriptable. I couldn’t really find a good option, so I did the obvious thing and set out to build the tool myself. 😁
OSTT:
- open source and MIT licensed
- works well on Linux desktops, with setup docs for Hyprland/Omarchy, GNOME, KDE, and macOS too
- bring your own API key instead of being locked into one transcription provider
- output to clipboard, file, or stdout
- scriptable enough to fit into existing shell/CLI workflows
The recent release adds a few things that make the Linux workflow much better:
ostt launchopens a small terminal popup that can be bound to a global hotkey- pressing the hotkey once starts recording, pressing it again stops and transcribes
ostt process/-pcan run the transcription through an AI prompt or a shell command.deb,.rpm, AUR, Homebrew, and shell installer paths are documented
The provider-agnostic part is important I think. OSTT currently supports OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, and ElevenLabs. The point is not that one provider is the right one, but that you should be able to choose based on quality, latency, price, language support, or data location. (I also plan to add support for local models)
The scriptable part is also a big part of why I wanted this to exist on Linux. OSTT can be used as a small transcription engine inside other workflows. You can pipe output to another CLI, write transcriptions to a file, copy them to the clipboard, use it from a script, process meeting recordings, or connect it to AI agent workflows like OpenClaw, Hermes, OpenCode, Claude Code, Codex CLI, etc.
This is not trying to be some polished GUI dictation app startup. It doesnt do streaming transcription or screen-aware text insertion. The niche is more: voice-to-text that behaves like a CLI tool.
Install:
curl -fsSL https://ostt.ai/install | bash
Docs: https://ostt.ai
GitHub: https://github.com/kristoferlund/ostt
Happy to hear feedback, especially from folks using different Linux desktops/window managers. I have not been able to test installation on more than a few Linux flavours so far.
Andr1yTheOne@reddit
Whisper is free. So is Vosk.
Mountain_Anxiety_461@reddit
Cool, NOW MAKE IT WORK ON WINDOWS OR I WILL MAKE IT WORK MYSELF(not really a good threat, but, still)
PMCReddit@reddit
wrong subreddit, buddy. no one here really supports microslop/windows in any way. Youll have to figure it out on your own.
Necessary-Summer-348@reddit
Terminal-first makes so much sense for this—way easier to pipe into other tools or bind to hotkeys. What'd you use for the speech recognition backend?
dspdroid@reddit
Thats why I built whisper dictate
But I needed it to work 100% offline its using local tiny models to run.
https://github.com/dalpat/whisper-dictate
Necessary-Summer-348@reddit
the "feels like it's built in" bar is exactly right bc if it's noticeable as a separate tool people stop using it. local tiny models is the right tradeoff for that kind of latency
stengods@reddit (OP)
Currently, various providers from big ones (OpenAI) to smaller more privacy centric (Berget AI). Local models coming.
Necessary-Summer-348@reddit
local model path is the one that'll matter most for this user base bc terminal users generally don't want audio going to an external api
dspdroid@reddit
Exactly.
stengods@reddit (OP)
Thanks, will look into that and see if I can make the setup easy enough. I want the tool to remain a one line install.
Scheeseman99@reddit
I distrust any Linux application that focuses on third party outsourcing for core functionality over local, on-device processing. It's diametrically opposite to the core principles of open source.
DaftPump@reddit
Please detail in your posts this util needs an API key to function.
stengods@reddit (OP)
"OSTT currently supports OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, and ElevenLabs. The point is not that one provider is the right one, but that you should be able to choose based on quality, latency, price, language support, or data location. (I also plan to add support for local models)"
DaftPump@reddit
You misunderstand my suggestion. :/
stengods@reddit (OP)
Sorry if I did. Help me understand. Do you mean one of the below:
You would like to see support for local models? (that is coming)
Online model is fine but you don't want to have provide an API key yourself? That is, you would prefer something like a $5 a month paid service?
Something else.
DaftPump@reddit
Not everyone on this sub(or any sub) is going to automatically realize this current offering requires an API. In other words, not all are going to be interested in signing up. This isn't a fault of yours. I originally made the suggestion because I installed it and then stopped once I got to the API part. Is it my bad for not following the docs? Sure. Still, my suggestion stands.
Hope this makes better sense.
tldr; not everyone is hip on AI prerequisites for such utils
JayTurnr@reddit
A good alternative could be to ship faster-whisper with it as a "no-key" and/or offline fallback.
stengods@reddit (OP)
Thanks, I will clarify the docs.
JayTurnr@reddit
If it's just speech to text, can you support whisper?
aloobhujiyaay@reddit
Would be really interesting paired with tmux/neovim workflows or even terminal AI agents
stengods@reddit (OP)
I have used it a lot with tmux/nvim. Works great. Any particular way you would like to see it integrated beyond the popup recording window etc?
bew78@reddit
I see you mention many external providers, but what about local transcription with whisper?
stengods@reddit (OP)
Coming soon I hope. If time permits.
dspdroid@reddit
Thats amazing. A lot of work you've put into it. I've built the same around the idea but its personal based on what i use, yours is generic and provider agnostic. My focus was on Offline use with hotkey toggle.
here is my version :)
https://github.com/dalpat/whisper-dictate
stengods@reddit (OP)
Nice tool you created there! Local models has been on my todo list as well, hope to find time soon.
marcellusmartel@reddit
I have been looking for this.
ZunoJ@reddit
Isn't this just a Fanny replacement for piping mic input to an llm cli?