I built an open source, terminal first, voice-to-text tool for Linux desktops because most dictation tools are Mac-first

Posted by stengods@reddit | linux | View on Reddit | 27 comments

When switching to Linux from Mac, I missed having a nice easy to use speech-to-text tool.

The apps I found either didn’t work very well, didn’t support many providers, or only supported local models, which doesn’t work well for me since I speak Swedish and those local models are mostly English. I also like the idea of it being terminal-first and scriptable. I couldn’t really find a good option, so I did the obvious thing and set out to build the tool myself. 😁

OSTT:

open source and MIT licensed
works well on Linux desktops, with setup docs for Hyprland/Omarchy, GNOME, KDE, and macOS too
bring your own API key instead of being locked into one transcription provider
output to clipboard, file, or stdout
scriptable enough to fit into existing shell/CLI workflows

The recent release adds a few things that make the Linux workflow much better:

ostt launch opens a small terminal popup that can be bound to a global hotkey
pressing the hotkey once starts recording, pressing it again stops and transcribes
ostt process / -p can run the transcription through an AI prompt or a shell command
.deb, .rpm, AUR, Homebrew, and shell installer paths are documented

The provider-agnostic part is important I think. OSTT currently supports OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, and ElevenLabs. The point is not that one provider is the right one, but that you should be able to choose based on quality, latency, price, language support, or data location. (I also plan to add support for local models)

The scriptable part is also a big part of why I wanted this to exist on Linux. OSTT can be used as a small transcription engine inside other workflows. You can pipe output to another CLI, write transcriptions to a file, copy them to the clipboard, use it from a script, process meeting recordings, or connect it to AI agent workflows like OpenClaw, Hermes, OpenCode, Claude Code, Codex CLI, etc.

This is not trying to be some polished GUI dictation app startup. It doesnt do streaming transcription or screen-aware text insertion. The niche is more: voice-to-text that behaves like a CLI tool.

Install:

curl -fsSL https://ostt.ai/install | bash

Docs: https://ostt.ai

GitHub: https://github.com/kristoferlund/ostt

Happy to hear feedback, especially from folks using different Linux desktops/window managers. I have not been able to test installation on more than a few Linux flavours so far.

[-]

Andr1yTheOne@reddit

Whisper is free. So is Vosk.

[-]

Mountain_Anxiety_461@reddit

Cool, NOW MAKE IT WORK ON WINDOWS OR I WILL MAKE IT WORK MYSELF(not really a good threat, but, still)

[-]

PMCReddit@reddit

wrong subreddit, buddy. no one here really supports microslop/windows in any way. Youll have to figure it out on your own.

[-]

Necessary-Summer-348@reddit

Terminal-first makes so much sense for this—way easier to pipe into other tools or bind to hotkeys. What'd you use for the speech recognition backend?

[-]

dspdroid@reddit

Thats why I built whisper dictate

But I needed it to work 100% offline its using local tiny models to run.

https://github.com/dalpat/whisper-dictate

[-]

Necessary-Summer-348@reddit

the "feels like it's built in" bar is exactly right bc if it's noticeable as a separate tool people stop using it. local tiny models is the right tradeoff for that kind of latency

[-]

stengods@reddit (OP)

Currently, various providers from big ones (OpenAI) to smaller more privacy centric (Berget AI). Local models coming.

[-]

Necessary-Summer-348@reddit

local model path is the one that'll matter most for this user base bc terminal users generally don't want audio going to an external api

[-]

dspdroid@reddit

Exactly.

[-]

stengods@reddit (OP)

Thanks, will look into that and see if I can make the setup easy enough. I want the tool to remain a one line install.

[-]

Scheeseman99@reddit

I distrust any Linux application that focuses on third party outsourcing for core functionality over local, on-device processing. It's diametrically opposite to the core principles of open source.

[-]

DaftPump@reddit

Happy to hear feedback

Please detail in your posts this util needs an API key to function.

[-]

stengods@reddit (OP)

"OSTT currently supports OpenAI, Deepgram, Groq, DeepInfra, AssemblyAI, Berget, and ElevenLabs. The point is not that one provider is the right one, but that you should be able to choose based on quality, latency, price, language support, or data location. (I also plan to add support for local models)"

[-]

DaftPump@reddit

You misunderstand my suggestion. :/

[-]

stengods@reddit (OP)

Sorry if I did. Help me understand. Do you mean one of the below:

You would like to see support for local models? (that is coming)
Online model is fine but you don't want to have provide an API key yourself? That is, you would prefer something like a $5 a month paid service?
Something else.

[-]

DaftPump@reddit

Not everyone on this sub(or any sub) is going to automatically realize this current offering requires an API. In other words, not all are going to be interested in signing up. This isn't a fault of yours. I originally made the suggestion because I installed it and then stopped once I got to the API part. Is it my bad for not following the docs? Sure. Still, my suggestion stands.

Hope this makes better sense.

tldr; not everyone is hip on AI prerequisites for such utils

[-]

JayTurnr@reddit

A good alternative could be to ship faster-whisper with it as a "no-key" and/or offline fallback.

[-]

stengods@reddit (OP)

Thanks, I will clarify the docs.

[-]

JayTurnr@reddit

If it's just speech to text, can you support whisper?

[-]

aloobhujiyaay@reddit

Would be really interesting paired with tmux/neovim workflows or even terminal AI agents

[-]

stengods@reddit (OP)

I have used it a lot with tmux/nvim. Works great. Any particular way you would like to see it integrated beyond the popup recording window etc?

[-]

bew78@reddit

I see you mention many external providers, but what about local transcription with whisper?

[-]

stengods@reddit (OP)

Coming soon I hope. If time permits.

[-]

dspdroid@reddit

Thats amazing. A lot of work you've put into it. I've built the same around the idea but its personal based on what i use, yours is generic and provider agnostic. My focus was on Offline use with hotkey toggle.

here is my version :)

https://github.com/dalpat/whisper-dictate

[-]

stengods@reddit (OP)

Nice tool you created there! Local models has been on my todo list as well, hope to find time soon.

[-]

marcellusmartel@reddit

I have been looking for this.

[-]

ZunoJ@reddit

Isn't this just a Fanny replacement for piping mic input to an llm cli?