The problem I think comes from a gap between distribution of levels of efficiency for computer-human interfaces.
Take “Open Hacker News” for example. One user might Click Browser > Open bookmarks tab > “Hacker News”.
Another, having set up a series of hotkeys, will go (on a windows machine, taskbar set for Browser pinned in position 1):
Win+1 > Ctrl+3
That is incredibly fast, much faster than saying it.
My guess is that much of the software engineering world is either users who can do the first very quickly or don’t find it cumbersome, or users who set up hotkeys like the latter and will outrace the speed of human speech on any given day. Thus the problem gets little attention.
My first guess would be "Open hacker news" requires clear audible speech. While the KB method just requires pressing 'h' and 'enter'. Also, non-cloud speech recognition just recently got decent.
Context matters with such shortcuts. Even if I make sure that I have the location bar selected, focused and clear, 'h enter' takes me to a completely different website - because hacker news is actually 'news.ycombinator.com', so it's not the default by typing'ctrl-k h enter'.
And let's be frank. Even if we do get voice control, it's not going to somehow take our keyboards and mice away from us.
In principle there could be voice shortcuts; currently there seems to be an expectation that voice interfaces should be entirely limited to natural language words and sentences, but if we're willing to let this constraint go (at least for "power user shortcuts") and just design bespoke syllables in IPA or whatever, we could probably come up with something more efficient.
It also ought to be possible to specifically design interleaved voice+keyboard, voice+mouse, voice+touch, voice+pen, etc. interactions that could be more expressive and efficient than either input method by itself.
Take “Open Hacker News” for example. One user might Click Browser > Open bookmarks tab > “Hacker News”.
Another, having set up a series of hotkeys, will go (on a windows machine, taskbar set for Browser pinned in position 1):
Win+1 > Ctrl+3
That is incredibly fast, much faster than saying it.
My guess is that much of the software engineering world is either users who can do the first very quickly or don’t find it cumbersome, or users who set up hotkeys like the latter and will outrace the speed of human speech on any given day. Thus the problem gets little attention.