Hacker News new | ask | show | jobs
by sixstringtheory 2448 days ago
I think the future is less/no screen. Typing on these folding phones seems like a worse experience. Typing at all isn't really natural, and neither is staring at planar, glowing glass.

I think the future is conversational computing. I don't own an Alexa/HomePod/etc (yet... maybe some open source on prem thing at some point), but I think that's where the puck is moving. It's just that today their capabilities are somewhere around a rotary phone vs. an iPhone. Better than a telegraph (which I guess in this analogy is _typing_ your words into a document) but still very rudimentary. All it needs is time and effort.

Similar to HomePods, we have AirPods and their equivalents. The phone is just a conduit through which can pass the data necessary for the OS to talk with you, to do what you need.

4 comments

Strongly disagree with this. As someone who does own a number of Google Home devices at home and uses Siri on my phone... voice is a terrible interface.

For one, there is zero discoverability. I can ask Google today's weather. I can ask tomorrow's weather. I cannot ask yesterday's weather. Leaving aside why that would be (I would find it useful to know that it's X degrees hotter/cooler than yesterday) there's no way for me to know that without asking. It's the audio equivalent of fumbling around on a keyboard in a pitch black room. Just imagine placing a food order. It's going to have to read a menu to you and you're going to have to remember it all. No amount of tech improvement is going to change that fundamental fact.

Secondly, you can't multi-task. Or have more than one person using it simultaneously. Right now my wife and I might be looking at our phones at the same time, perhaps looking stuff up, maybe tapping out an e-mail. We'd have to go to separate rooms to do that.

But if I want to know today's weather or play a song, it works fine. As long as it recognises my voice correctly and there isn't too much background noise.

To be fair, when I Google "yesterday's weather" in my desktop web browser I don't get a nice little Google info card. I do get some web results for sites that show historical weather, however.
We definitely agree that voice interfaces are very rudimentary today. I try to run lots of things through dictation first that normally I would type out with my thumbs on the smartphone or on a keyboard on my computer. Text messages, search terms, commit messages, Slack conversations. Still, it can't perform very basic tasks like changing or backspacing a word or phrase, either because it misheard it or because you want to change it. (And actually as I dictated this paragraph on my 2018 MacBook Pro, it typed out everything I said twice and still required typing interventions, and eventually I just fell back to typing everything.)

You've laid out some good criteria though. I wouldn't say voice interfaces have really "made it" until it gets to the point where you don't have to ask how to ask it to do something (discoverability). You just ask it to do something and it does it. Although that's just one of many criteria.

The food menu problem is interesting, but pretty much everything that prints out on a ticket in a kitchen is structured data–it should be able to be efficiently conversationalized (preference notwithstanding, of course). Certainly there are many ways you could talk to someone about a menu: what kinds of dishes are there? Appetizers, grilled entrees, pasta, salads, desserts. What kind of entrees? Vegetarian, pork, beef, seafood. OK, but what styles of cuisine? Jamaican, Italian, Szechuan. There's probably an analog to the 5 Why's for figuring out what someone wants to eat! Asking yesterday's weather, though, is a specific case that could probably be solved by an intern, provided that data is easy to find on the Internet (FWIW, I've searched for the very same thing many times and it's much harder to find vs forecasts).

I concede that there will always be a need for graphical interfaces. How do you "speak" a map, or a CAD model? I guess I was just thinking of things that can accomplished with a keyboard. You can speak anything you can type, even if it's as rudimentary as today, where you have to say "period newline newline" to end a sentence at the end of a paragraph while dictating.

I agree it might seem tough to multitask. But consider WiFi routers serving multiple computers, or hell, even CPUs serving different processes, "simultaneously." If voice recognition and NLP become sufficiently sophisticated I could foresee being able to isolate multiple overlapping voices in an audio sample. If not, consider that you could ask it to look something up, immediately followed by your wife dictating an email to send–or one of you could even interrupt the other–and it could be able to handle the context switching and queuing at speed.

And I understand there's a lot I don't know, and I do remain skeptical that this could ever be perfected. Would it really be able to dictate poetry? Would the forms I create or creatively destroy in free verse just totally confuse the voice interface? Would it be smart enough to side step the confusion via some pseudo-meta-cognitive process and ask me what the hell I'm doing?

> Certainly there are many ways you could talk to someone about a menu: what kinds of dishes are there? Appetizers, grilled entrees, pasta, salads, desserts. What kind of entrees? Vegetarian, pork, beef, seafood. OK, but what styles of cuisine? Jamaican, Italian, Szechuan. There's probably an analog to the 5 Why's for figuring out what someone wants to eat!

To me this is the core of why voice interfaces will always be inferior. In the time it would take that voice conversation to happen I would have been able to scan a menu a dozen times over. Our brains are incredibly adept at picking out visual details - identifying the headers that note each section of the menu, picking out key words that may interest us and so on. There is no technological improvement that will help a voice interface rival that.

Have you ever watched a person with vision challenges using VoiceOver with the speed cranked up? I bet they could absorb the info they need to know about a menu before the average reader could, even before any hierarchical organization is exposed to the text-to-speech process. The visual hierarchical and keyword navigation you describe is just what I'm talking about with a voice interface, too.

Just yesterday a colleague I was pairing with was VoiceOvering JSON packed with API keys and stack traces. I, conversely, have many times stood with the fridge door open trying to find something that was plainly front and center. Of course, the answer for many things may be a combination of both hearing and vision.

I also wonder if this easily navigable menu you are thinking of is already cognitively mapped in your mind, and you know what to look for. What if the menu is in a foreign or second language, that the voice assistant could translate for you? Or is a completely foreign-to-you cuisine, or just creatively organized in a way you aren't used to, like by seasonality, emotion or geography? I've sat and stared at some dense menus, that I've had to reread multiple times to remember just a subset of the items. In the end I asked the waiter something like in my example: "something with shrimp" or "what do you like?"

I'm not so sure about the things you say will never or always be, and I don't even consider myself an optimist. Finally, thanks for taking this ride with me, it's definitely made me consider more things!

Except when you can't reply to a message on your phone cause you don't want people around you to hear that.

Let's be real, "conversational computing" may work in movies, but in reality you don't want people in the office or on the street hearing your interactions with your phone.

My thoughts too - it works in the home and at a push I can see it working for certain jobs, but it absolutely seems unworkable in public places.

But then I think again and wonder if maybe this is one of these things that seems unthinkable now but in 10-20 years, everyone everywhere will be doing it completely naturally, it's become part of the background noise of life, and nobody will care enough to really listen to what you're telling your computer to search for / do.

Yes on a silent train it might be awkward - but on the street I can honestly see it being fine, especially if attitudes and culture changes a bit as it's wont to do with new technology usage (see bluetooth headsets for a past example of this).

What are you going to watch your porn on without a screen?

Also, talking to any of those assistants is literally the worst imaginable mode of interaction with a computer, period. Touchscreens in cars are in close second place.

Headphones. Audio is a major part of porn, and audio porn is much healthier at night than video porn, because light makes you stay awake longer.
I comute every day early in the morning by train. I would shoot the ones who start interacting with their devices per voice.