> ... because it's not a good way to control a computer?
This comment speaks to a perception problem for aural methods. The state of the mainstream art doesn't seem much past Forstall's demo of 10 years ago. [0] Are generations of people accustomed to WIMP UI able to wrap their heads around a much smaller interaction set? [1]
Gentner and Nielsen's work described in "The Anti-Mac Interface" [2] speaks to some of the differences people will have to mentally bridge such as:
Mac | Anti-Mac
Direct Manipulation | Delegation
See and Point | Describe and Command
WYSIWYG | Represent Meaning
User Control | Shared Control
Feedback and Dialog | System Handles Details
Forgiveness | Model User Actions
The problem I think comes from a gap between distribution of levels of efficiency for computer-human interfaces.
Take “Open Hacker News” for example. One user might Click Browser > Open bookmarks tab > “Hacker News”.
Another, having set up a series of hotkeys, will go (on a windows machine, taskbar set for Browser pinned in position 1):
Win+1 > Ctrl+3
That is incredibly fast, much faster than saying it.
My guess is that much of the software engineering world is either users who can do the first very quickly or don’t find it cumbersome, or users who set up hotkeys like the latter and will outrace the speed of human speech on any given day. Thus the problem gets little attention.
My first guess would be "Open hacker news" requires clear audible speech. While the KB method just requires pressing 'h' and 'enter'. Also, non-cloud speech recognition just recently got decent.
Context matters with such shortcuts. Even if I make sure that I have the location bar selected, focused and clear, 'h enter' takes me to a completely different website - because hacker news is actually 'news.ycombinator.com', so it's not the default by typing'ctrl-k h enter'.
And let's be frank. Even if we do get voice control, it's not going to somehow take our keyboards and mice away from us.
In principle there could be voice shortcuts; currently there seems to be an expectation that voice interfaces should be entirely limited to natural language words and sentences, but if we're willing to let this constraint go (at least for "power user shortcuts") and just design bespoke syllables in IPA or whatever, we could probably come up with something more efficient.
It also ought to be possible to specifically design interleaved voice+keyboard, voice+mouse, voice+touch, voice+pen, etc. interactions that could be more expressive and efficient than either input method by itself.
This comment speaks to a perception problem for aural methods. The state of the mainstream art doesn't seem much past Forstall's demo of 10 years ago. [0] Are generations of people accustomed to WIMP UI able to wrap their heads around a much smaller interaction set? [1]
Gentner and Nielsen's work described in "The Anti-Mac Interface" [2] speaks to some of the differences people will have to mentally bridge such as:
0. https://www.youtube.com/watch?v=SpGJNPShzRc1. https://en.wikipedia.org/wiki/Post-WIMP
2. https://web.archive.org/web/20120904231532/http://www.useit....