Hacker News new | ask | show | jobs
by gdne 2650 days ago
The problem is the interface. Voice commands and their responses are linear, one dimensional. It’s difficult to represent complex interaction within that scope. Think of all the investment that has gone into telephone based automated customer support. The best interface conceived so far is the dreaded phone tree. That’s essentially the same interface smart speakers are exposing.

The opportunity is to figure out how to better utilize the voice based medium. No one has done it yet. When they do, it will also likely improve the experience around screen readers and accessibility.

5 comments

Well, imagine a goal-based system. The first utterance to a voice assistant provides some inputs and a goal, like booking an Uber, but if you haven't provided all the inputs it needs to meet the goal, the assistant knows how to ask for more information, and also saves enough state to let you modify (or cancel) requests after the fact. This is arguably more advanced than a phone tree; you're not just giving keywords to advance along the branches.
I don't find it appealing to use voice as an interface. I'd much rather have a button to press.

I think the new feature in iOS where it guesses what I want to do (send a message to ABC, for example) based on previous patterns is promising. A whole screen of these actions would be great.

This is right. No none has yet written “The Design of Everyday Things” for voice interaction. We just don’t know what works yet so we are redoing what was done before.
I think you're right, but on the other hand, a skill is faster created than a complex 2D UI.

Sure, I won't use a conversational interface to build the next Photoshop.

I could tell it stuff like, what I ate and it calculates my kcals or macros and tells me how much I have left to eat this day or what other stuff I should eat to hit my macros etc.

I could tell it what I bought and it would categorize the bills.

Some things are just too bothersome to do with my hands.

The medium of voice is very rich and capable when talking to a human, so this is something limited by the intelligence of the system you're interacting with.
Is it? I mean, some folks do great talking. Lots of them. Most are in narrative communication. If you get much beyond that, you jump to physical interesting quickly. Even telling is benefited with gestures. Consider how vague most spoken directions are. When you can augment with pointing, things are easier.