This guy's not wrong. You can speak clearly and comfortably at 250 words per minute. Most folks will type at less than half that.
Even shortcuts (which peer comments are relying upon) aren't all that fast - they require additional selection movement with the keyboard or mouse before they can be used.
People do much more than narrating natural language. They navigate menus, highlight text, launch apps, type commands on the terminal etc... I don't see how voice can best keyboard and mouse when considering all interactions.
Strange, I can see it with no problems. Probably because I use VIM quite a bit, which makes use of fairly natural language gestures.
Copy two words
Select line
Paste before word
etc.
Opening apps is ever simpler: "open spotify". Compare the complexity and time required to say those two words against moving your hand to the mouse, moving the mouse to a 100x100 pixel target, and clicking twice within 100ms. Even compare it against using "Cmd-Space Spotify".
It'd require a learning period, but so does - for example - teaching the concept of the mouse to someone who's only ever used a tablet.
EDIT: And I'll copy this from another of my posts - getting good voice control won't take our keyboards and mice away from us.
When I did hands-free coding, I named my variables things that I could say as words. So you'd be saying 'copy file-num-one file-num-two' or something, rather than spelling it out letter by letter. I actually ended up naming things more verbose names because I didn't have to type it all out. So it might be:
versus typing: 'cp gearyStreetFinancialReport divisaderoStreetFinancialReport'
If you're trying to exactly replicate something designed (and named) for text input, you're absolutely right, but I thought we were talking about hypothetical designed-for-voice systems.
Tab completion relies on a limited context. If you're trying to type gearyStreetFinancialReport and the two names in context are gearyStreetFinancialReport and unrelated, you're right, but if there's a very large number of choices, it benefits you less. And new names aren't going to be in context, so even in the best case of my example, you're going to end up typing:
'cp g-[TAB] divisaderoStreetFinancialReport'
I'd expect that to be an advantage of voice stuff; that you can go fast in new kinds of large scope contexts, maybe even whole-machine context. A system designed from the ground up could exploit that in interesting ways.
Even shortcuts (which peer comments are relying upon) aren't all that fast - they require additional selection movement with the keyboard or mouse before they can be used.