Hacker News new | ask | show | jobs
by JonathonW 4255 days ago
Watching the PyCon talk the author mentions as inspiration, the thing that seems like it would really kill this for me (even given a good macro set for the editors and applications I need) is the input lag. The whole time, it seems like he's either pausing between commands and able to react to incorrectly interpreted input, or he's speaking a long string of commands and something goes wrong in the middle that invalidates the remainder of his sentence (he ends up in the wrong mode, or in the wrong pane, or something along those lines).

The huge advantage that the keyboard has as an input device is that there's zero delay-- if I make a mistake, I can go back and fix it as soon as I'm able to see and react to it. Speech recognition has this inherent delay to it-- it has to delay execution of a command until it's concluded that there's no other possible interpretation of what you just tried to say. Speech is a lot slower than I can react-- the inherent lag there just seems intolerable.

I suppose one could get used to it if it came down to "use speech recognition or find another job", though.

4 comments

Yeah, it's pretty common for someone to explain why the keyboard is a better input method. And you're right it probably is at this time for healthy developers. However, you should probably stop to consider that the solution isn't for you. It's for people who can't type. Many people have disabilities, for example. This opens up a new world to them. And as for the author, he suffers from RSI, as do many other developers.

http://ergoemacs.org/emacs/emacs_hand_pain_celebrity.html

Like the author of the article, I've been making the switch to programming by voice. At first, the delay was really jarring, but you get used to it.

You're also completely correct about how when you speak a chain of commands, one of the commands in the middle getting messed up can invalidate the rest. That happens a lot. You learn to speak in shorter chains, and also to make the commands phonetically distinct.

That said, both the delay and the inaccuracy problems can be greatly ameliorated by a fast CPU and a good sound card. I don't have any benchmarks, but I have noticed the difference since upgrading.

I know someone who does a lot of voice input, he tends to use simple sounds that are distinct as a trained syntax... like: "woof" for moving to the end of a line, and "bark" for moving back.. and other words/sounds that aren't common in conversation... he said it took some getting used to but the accuracy got a lot better.
I use the word "bark" too, albeit for a different purpose. I think that sort of spec selection is just a habit that you naturally get into when creating a lot of voice commands.

Incidentally, I've learned a bit of Korean, and it's caused me to notice that Dragon recognizes words which don't end in multiple unvoiced consonants more easily than those that do. (For example, "taze" is better than "taste" and "pad" is better than "pact".)

Yes, there is a little lag but I have not found it to be a big deal. If you are getting excessive lag just make sure that the environment is mostly quiet and you have given the Windows VM enough CPU and RAM. Turns out voice recognition is hard.
Corrections on a keyboard can actually be ahead of when you see them - particularly during laggy SSH sessions - when you realize you've struck a key incorrectly a tiny moment after doing so, and hit backspace quickly.