Handy - a simple, open-source offline speech-to-text app written in Rust using whisper.cpp

Posted by sipjca@reddit | LocalLLaMA | View on Reddit | 41 comments

I built a simple, offline speech-to-text app after breaking my finger - now open sourcing it

TL;DR: Made a cross-platform speech-to-text app using whisper.cpp that runs completely offline. Press shortcut, speak, get text pasted anywhere. It's rough around the edges but works well and is designed to be easily modified/extended - including adding LLM calls after transcription.

Background

I broke my finger a while back and suddenly couldn't type properly. Tried existing speech-to-text solutions but they were either subscription-based, cloud-dependent, or I couldn't modify them to work exactly how I needed for coding and daily computer use.

So I built Handy - intentionally simple speech-to-text that runs entirely on your machine using whisper.cpp (Whisper Small model). No accounts, no subscriptions, no data leaving your computer.

What it does

Press keyboard shortcut → speak → press again (or use push-to-talk)
Transcribes with whisper.cpp and pastes directly into whatever app you're using
Works across Windows, macOS, Linux
GPU accelerated where available
Completely offline

That's literally it. No fancy UI, no feature creep, just reliable local speech-to-text.

Why I'm sharing this

This was my first Rust project and there are definitely rough edges, but the core functionality works well. More importantly, I designed it to be easily forkable and extensible because that's what I was looking for when I started this journey.

The codebase is intentionally simple - you can understand the whole thing in an afternoon. If you want to add LLM integration (calling an LLM after transcription to rewrite/enhance the text), custom post-processing, or whatever else, the foundation is there and it's straightforward to extend.

I'm hoping it might be useful for:

People who want reliable offline speech-to-text without subscriptions
Developers who want to experiment with voice computing interfaces
Anyone who prefers tools they can actually modify instead of being stuck with someone else's feature decisions

Project Reality

There are known bugs and architectural decisions that could be better. I'm documenting issues openly because I'd rather have people know what they're getting into. This isn't trying to compete with polished commercial solutions - it's trying to be the most hackable and modifiable foundation for people who want to build their own thing.

If you're looking for something perfect out of the box, this probably isn't it. If you're looking for something you can understand, modify, and make your own, it might be exactly what you need.

Would love feedback from anyone who tries it out, especially if you run into issues or see ways to make the codebase cleaner and more accessible for others to build on.

[-]

Less-External-1778@reddit

Hey, I love the app, thanks for building it!

Is it possible to add a mode similar to Windows voice typing, where:

I click into a text box, press my shortcut once, start speaking, and the transcription automatically appears in that box without pressing the shortcut again, and
if I start typing with the keyboard or click away/change focus, Handy automatically stops listening?

[-]

GreggBlazer@reddit

BUG ISSUE: Why does Handy just stop working after some time and fail to launch from hotkeys?
Handy is in taskbar and running processes, updated, and runs GREAT after a PC restart. But not long after the initial restart Handy just decided to give me a one finger wave instead of all five.. but joking aside,

I LOVE the app, when it works, just trying to see if this is a known BUG issue and or if there is a fix please, I really need it more after I restart than right when i restart. That kinda defeats the purpose of saving time talking instead of texting IF i have to restart my entire Win11 PC just to speak into a text field.

Please let me know what I can do. Thx and for HANDY!

[-]

sipjca@reddit (OP)

Please report it on github and open the debug menu (ctrl+shift+d) and upload the logs. Please provide as much system detail as this is the first report

[-]

Linkto91@reddit

Hello ! Really nice app, I also discover it with Korben. I am French, and the app works really well. I didn't use it too much, it's more for curiosity and test.

It's possible to have the possibility to translate in other languages that English ?

Really thanks for your work and your really nice app.

[-]

WriterHorrible@reddit

Stumbled upon this when looking for a dictation solution, amazing stuff.
Years of computer use has made typing for extended periods of time quite painful, so I've been looking towards dictation.

Most stuff online indeed requires a subscription, which I was prepared to pay, but I vastly prefer local stuff.
This is solving all my issues at time of writing and I'm deeply grateful for it.

[-]

sipjca@reddit (OP)

glad you like it!

[-]

UnluckyAnt9048@reddit

rien

[-]

Joshuazax@reddit

This is a really awesome project! The only thing is, as a bilingual person, I’d love to be able to select just the two languages I actually use. When I choose auto-detect, it often misidentifies short messages as an unrelated language. But I also can’t pick just one language, since I use both of mine almost equally often. Thank you again for this amazing program. After trying many local transcription tools, it’s one of the best out there, if not the best.

[-]

Coolraoulus@reddit

plus pratique que speech note, plus simple que voice hotkeys, vraiment vous avez fait un formidable travail, merci beaucoup

[-]

Coolraoulus@reddit

Bonjour, merci pour cette application !
petit problème : je suis sur Linux Mint 22 Cinnamon , quand je lance l'application j'ai bien la fenêtre qui s'ouvre , mais rien ne s'affiche, quand je survole la souris sur cette fenêtre, en revanche, j'ai le pointeur de la souris qui change selon où je me positionne, ce qui me fait penser que c'est juste un problème d'affichage et que le menu est bien là.
edit : bon du coup j'ai cliqué au pif aha, et maintenant quand je fait le raccourci clavier ctrl+espace, Handy m'écoute et fonctionne. Cela dit je n'ai aucune idée de quel modèle de langage j'utilise ni possibilité de faire des modifications, etc, j'avance à l'aveugle.

[-]

JuByr19@reddit

Hi ! My dad just show me your app and I'm amazed by its simplicity of using. I wanted to know if you are going to add the option to "ask" for punctuation like saying "bla bla bla insert coma" and it transcribes "bla bla bla , " (sorry for any mistakes i'm french)

[-]

schmurtzm@reddit

I was about to ask the same question! I discovered Handy thanks to Korben, and yes it’s already way better than the famous “Win + H” shortcut!
It wouldn’t even take that much effort from Microsoft to make its STT truly usable on a daily basis, but as it stands, it’s just not practical. The main issues are the lack of proper shortcuts and poor handling of punctuation in different languages (For example, in French you have to say precisely “saut de ligne” for cariage return and not “à la ligne” or "retour à la ligne". Be able to choose our own way will be awesome!)

Microsoft actually lists the supported punctuation commands for various languages here; it’s far from perfect, but it might serve as inspiration.
I’m curious to see how punctuation handling could be implemented in the post-processing 😉

Awesome tool, thank you !

[-]

schmurtzm@reddit

Can we imagine a simple list of rules (that we can import) to replace one set of words with another ?
It could allows to manage punctuation in a very strict way (without using a LLM for post processing which could inject additional error to the dictation).
I mean a simple file (or json or anyway) like:

Open the parentheses -> (

Close the parenthesis -> )

It would be easily customizable by the user, and users could exchange their punctuation lists.

[-]

sipjca@reddit (OP)

Hi, the next feature will allow post processing with an LLM which will allow for whatever you would like

[-]

JuByr19@reddit

Oh that's awesome ! Thank you so much for your work ! My dad is looking foreward to the evolution of your app (he is a huge nerd bahaha)

[-]

Icy-Repair-1496@reddit

Bonjour,

J'ai trouvé Handy suite à un article sur Korben. C'est très prometteur.Le problème que j'ai est que les caractères accentués ne se retrouvent pas à leur place dans les mots.C'est très frustrant. Je suis sous Linux Ubuntu. Je soupçonne un problème de paramétrage au niveau de LibreOffice.En effet, j'utilise Handy pour dicter le présent texte et tout se passe correctement.Dans LibreOffice, j'ai utilisé Parakeet ou Whisper, mais avec les mêmes résultats décevants. Grand merci à qui pourrait me dire ce qu'il convient de faire pour régler ce problème !

[-]

sipjca@reddit (OP)

Ceci est dû à la méthode de saisie "directe". Vous pouvez essayer la méthode de saisie control+v à la place. Avec Handy ouvert, faites ctrl+shift+d pour ouvrir le menu de débogage. Cliquez sur le panneau de débogage, puis changez la méthode de collage en control+v.

[-]

vvhitecoder@reddit

much much easier than "speech notes". are you considering use the whisper-ctranslate2, is times faster than any other whispers?

[-]

sipjca@reddit (OP)

I would like to at some point I just don't have the bandwidth at the moment. If someone implements it in cjpais/transcribe-rs I would love to pull it in

[-]

BoringSFWAccount@reddit

Understand that I'm late to the party and this is an old post, but I want to thank you. This is a very nice, lightweight package for Whisper. One that, I would say, is the best I've seen as an alternative to Windows speech-to-text recognition for general use.

[-]

sipjca@reddit (OP)

Oh no problem, thank you for the kind words, glad you like it! Spread the word!

[-]

Holiday-Contact8698@reddit

Great app, I use it and I can not be happy. But I would like to have a hot key for Translate to English.

[-]

sipjca@reddit (OP)

Thanks, there's some open suggestion on hotkeys as well. For now it is a low priority development item as many people have different kind of hot keys they want and supporting them all requires some thought. Hopefully at some point I can get it in as well.

For now if you fork the project I think claude code could implement it

[-]

xyapus@reddit

I was able to build & run it (was not easy on 22.04) but it's not inserting any text anywhere. I've trying debugging with RUST_LOG=enigo::platform=debug and I can see it's trying to do Ctrl+V using enigo, but there are several issues with this approach and I cannot really get it to work. Nothing is in the clipboard to paste, and in the terminal I have different shortcut to paste - it's Ctrl+Shift+V instead

[-]

sipjca@reddit (OP)

Could you add an issue to GitHub for this? There is some experimental support I have laying around for using a insert text method, but it has some bugs on macOS so I’ve not been using it

[-]

fflarengo@reddit

Woah, this is JUST what I was looking for! Thank you for this. You're the man!

[-]

sipjca@reddit (OP)

Hey! Thanks :)

What is the feature set that you want to have on the phone? It should be fairly possible to have iOS and Android version as well

[-]

fflarengo@reddit

Usually, I open the ChatGPT application, click on the microphone icon, and use it to transcribe my prompts, emails, and sometimes even WhatsApp messages.

You know the native Apple speech-to-text? It's really bad. If it's possible to either replace that or install a third-party keyboard from the App Store, which can add the functionality of transcribing using Whisper, using the same microphone icon. I believe it could also be done locally on the device.

I am really looking forward to this!

[-]

maveduck@reddit

I was searching for a good app to use and this fits my needs perfectly! Thank you very much :)

[-]

sipjca@reddit (OP)

Awesome! So happy to hear

[-]

silenceimpaired@reddit

You should see if you can create an applet for the new COSMIC DE that is being created. Get it some visibility on that DE.

[-]

7657786425658907653@reddit

First, can I say I absolutely love this. I've recently lost the ability to type so I've been stuck using speech-to-text programs.I guess in a way I'm lucky that AI has become so prevalent during this time as previous speech or text has been very unreliable.

That said, I'm very aware that every word I utter to friends or family using current AI speech to text is going through somebody else's computer. So I came looking for a completely offline solution that will use my own computing power.

I've not been able to get Handy to work on my computer. I'm not sure what the problem is. It's either macroing to a key that I can press or access to the microphone. Perhaps I need to use the GitHub repository rather than the exe.

That said, I'm currently using WhisperTyping Which is free for the moment Though I'm sure they will start charging soon, I do like that it has a small icon to show when you're using it, That might be my one wish for handy if you're granting wishes. <3

[-]

sipjca@reddit (OP)

Thank you and I would absolutely love to help you get it working and learn what is going wrong so it can be fixed for others. It sounds like you're using Windows, And I've done less testing on Windows, however I have gotten it to work. So hopefully we can resolve it together. Please reach out to me at contact@handy.computer

[-]

htrowii@reddit

may I know why send_paste is using hardcoded ctrl/cmd v? seems like a pretty inefficient method ;w; or is it so to be more "universal"

[-]

sipjca@reddit (OP)

I wrote a bit on this here: https://github.com/cjpais/Handy/issues/12

It's generally very universal. If there are better libraries to handle pasting cross platform I'm very open to contributions and suggestions.

[-]

DeProgrammer99@reddit

Very nice!

Okay, I have a feature request! Haha, of course.

I spoke to it in Japanese and was amused that it translated what I said into English. Apparently, Whisper has to be told the language in advance if you don't want it to translate.

[-]

sipjca@reddit (OP)

I may be able to mark 'translate' specifically to false, but I kind of feel like I tried this and it is as you say

[-]

shamen_uk@reddit

In whisper.cpp you set a language. The default might be "en". If it is set to "en", it will translate any lang to English.

You have two options - allowing the user to set language. Or you can set to "auto". This is a whisper param.

[-]

sipjca@reddit (OP)

it's using the default params in the whisper-rs bindings, ill change it to auto and make sure translate is off

[-]

no_witty_username@reddit

Thanks. I just finished my own implementation of speech to text that uses Nvidias parakeet v2 model and was gonna make a whisper one to compare the performance claims of the model vs the whisper standard. This will be a good starting point in making something of my own.

[-]

You_Wen_AzzHu@reddit

👏👏