Hacker News new | ask | show | jobs
by warrenm 1608 days ago
>The market is huge

Apparently ... it's not

Or, rather, it's not YET "huge"

Sure - half the planet is online, but they're speaking myriad languages in more combinations of enunciation, dialect, and accent than is probably even calculable

>the Natural Language Processing of "OK Google" and Siri are quite refined at this point

Totally different to ask for today's weather and to tell a computer what to do - just like it's totally different to hit your favorite search engine and type "what is Pluto's orbit" and to write the search engine that goes off and does what you asked (and even when it does go off and do it, it still returns multiple (often conflicting) results - which leads to the whole problem of identifying authority online (something I wrote about 15+ years ago https://antipaucity.com/2006/10/23/authority-issues-online/#...))

It's also worlds different to be able to respond to variations on a theme of maybe a couple hundred search keywords (is it even that many?) and the literally unlimited number of commands people issue to their computing devices every day. Let's even say Siri is That Good™ - you've got a MacBook, iPhone, and iPad on your desk ...which one should respond when you say, "Hey, Siri"? Why that one vs this one? Do you have to start every command with the name of the device? Maybe that's not so hard at home (maybe), but get into corporate environments with naming conventions like H5GG71WLD? ... or dozens/scores/hundreds of people within listening distance of everyone's microphones getting triggered by other conversations in the room, conference calls, your cubemates' inability to attenuate their voices and aim only at their laptop when talking ...

It's a nightmare to think about - practically, let alone computationally

Most people look at the example of, say, Star Trek for voice commands to "the computer". Ever notice the computer only responds when the script demands it? Geordi shouting in Engineering commands to his team or panicked messages to the bridge are never misinterpreted by the computer as commands to it

That's mighty convenient - and not at all representative of anything resembling a reality we can create [yet]

Maybe in another few decades or centuries ... but I'd wager probably not

Another consideration: speaking is very slow compared to a click, tap, or typing a few characters at a prompt. Why would you want to intentionally make your human-to-device interactions more clumsy and error-prone?

1 comments

OP here. Great comments and ideas, all. A few notes: * Talon is pretty great * I think the market for text to speech and voice control is huge, and maybe Dragon/Nuance rules it because of their patents, but oh, does it suck. Like being stuck on Windows 95 or something. * Voice Recognition is in fact currently good enough to get real work done efficiently * Serious RSI can't be fixed with ergonomics or better devices * If there were a modern alternative to Dragon, it would solve a chunk of the problem

It's true that computer control currently requires a lot of customization, but I see no practical reason why we can't at least make simple commands fast and accurate, i.e., 'create new html document in VS Code'.