| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dkrich 3329 days ago

My point is that the iPhone/iPad/Android-enabled tablet is a ubiquitous IoT control panel.

I disagree that the most natural of human interfaces is voice. It's like saying that listening to an audiobook will always be a better way to consume media than watching a video. Audio only is great for some use cases, but not most. Humans developed the ability to see, touch, hear, and speak for a reason. After all, radios predated television by a long shot. Then when televisions were released, they pretty much cannibalized the sale of the home radio, because it added the ability to see in real time what you were listening to.

Speech is most certainly an important part of communication for humans, but it's just one, and only best for certain things. Simple queries like "what's the weather today?" or "play some music" is better said than typed, I would agree. But for lots of other tasks its just not efficient. I'd rather pull out my phone and search for restaurant recommendations than try and fumble around communicating by voice. When I pull up the Yelp app, I can instantly view a list of many of restaurants and because I'm used to the interface and visual cues, like the number of stars, location, and reviews, I can discern what I think I'd like very quickly. Now imagine a human trying to describe what they saw on that app to me. It'd be impossible to do. It's just very difficult to convey subtleties with voice only.

As an aside, if Apple wanted to get into the IoT business and pose a threat to Amazon, I think if they released some sort of "Always On" listening mode and began giving developers the ability to build apps which responded to it, they'd already be caught up. If I could say "play some music" or "Face Time Dave" and it solved the simple query problem, I don't know that I'd ever use my Echo again (I maybe use it once or twice a week now).

1 comments

dtien 3329 days ago

In terms of voice/speech, we maybe talking on different points. I'm considering voice as an input mechanism, but the output mechanism certainly has to be dynamic and use the most sensible mechanism in the given context. Ex:

1. Alexa play song --> output on in-built speaker

2. "Alexa turn on Warriors game", "Alexa play latest episode of Game of Thrones" --> output onto nearest TV

3. Alexa get direction to San Francisco --> sends to my phone screen

4. Alexa show me top Sushi restaurants nearby --> send to nearest display ( TV or phone )

... and so on.

So yes, I definitely agree with you, voice as an output quickly becomes untenable. But again, that's what I mean by ubiquitous, you're no longer tied to a device for input/output. Your environment/context will define what your input/output mechanisms are. Outputs can be any displays, speakers, TVs, thermostats, lights, etc. And in most cases, voice is the simplest, most intuitive input mechanism for simple queries that a majority of our daily interactions with our surroundings will require.

Just as touch devices required UI developers to simplify their interface design to accommodate 'touch access' by removing layers of menus, pages, etc. Voice will also precipitate this type of simplification of the interface to where the core elements are accessible with simple queries, with strong, complex NLP and search behind it.

And again, it's more about accessibility than it is about expressiveness through voice input.