Hacker News new | ask | show | jobs
by daanzu 1601 days ago
I have been coding entirely by voice for approximately 10 years now (by hand long before that). Most of that time I have been using the Dragonfly (https://github.com/dictation-toolbox/dragonfly) library to construct my own customized voice coding system. The library is highly flexible and open source, allowing you to easily customize everything to suit what you need to be productive. It is perhaps the power user analogue to Dragon Naturally Speaking. With it, you can certainly be highly productive coding by voice. However, it does require work to setup and customize to suit you, so it isn't really for the "general population" of computer users to just sit down and use. With regard to accuracy of speech recognition, being open allows you to (with sufficient motivation) to train a custom acoustic speech model that recognizes your voice specifically extremely well.

Regarding the software packages you referenced: Yes, Dragon is trash that I want nothing to do with, because of its inefficient interface, its complete inability to accurately understand my voice, and its generally shoddy software quality. Voice Computer (which I hadn't seen before) is therefore eliminated as well, though it doesn't look terrible as a front end to Dragon to better use the OS GUI-accessibility info. Many people like Talon, but I demand something open, which I can modify to suit my needs.

Background: I develop kaldi-active-grammar (https://github.com/daanzu/kaldi-active-grammar), a free and open source speech recognition backend usable by Dragonfly, itself entirely by voice. There's also a community of voice coders using Dragonfly and other tools that build on top of it, such as Caster (https://github.com/dictation-toolbox/Caster).

1 comments

what sr engine do you use for your personal setup? is it kaldi?(assuming you helped developed it :-) ) .
Yep, I have been using my Kaldi backend through Dragonfly exclusively ever since I got v0.1.0 working.

I bootstrapped writing it initially using the Dragonfly WSR (windows speech recognition) backend, because that gave me the best accuracy out of the available options at the time. All of my development of it since the initial working version has been done using each previous version, so now it is basically bootstrapped itself. My productivity skyrocketed once I switched to Kaldi, due to being able to use my custom trained speech model just for my voice for orders of magnitude better accuracy, plus dramatically lower latency. (And it freed me from being dependent on closed software out of my control.)

I bootstrapped my personal speech model by retaining the commands from me using WSR. My voice is quite abnormal, and it took only 10 hours of speech data to train a model dramatically more accurate than any generic model I've ever used. And of course, I retain much of my usage now with Kaldi, so my model improves more and more over time. A virtuous flywheel!