Hacker News new | ask | show | jobs
by skizm 2742 days ago
Lots of comments about improving Siri here. This is interesting to me for several reasons I won't address, but I will ask this: Do people really use voice commands for, well, anything?

From a purely functional standpoint, they seem super awkward to me. I like buttons that click and reassure me of every input. I like to feel confident my actions won't be misinterpreted. I like that no one else near me will get weirded out or annoyed when I'm having trouble interfacing with whatever app I'm currently using. The only reason I can envision using voice controls for anything is in the car while driving, and you would only begrudgingly use them because it is overwhelmingly safer to have both hands on the wheel and your eyes on the road.

Apart from not enjoying voice controls from a functional point of view, is no one else creeped out at always on mics and video cameras in their house? In this era of super crappy security, especially with consumer grade stuff, there's a not insignificant chance your stuff is currently being hacked by one or more non-government bad actors (I already assume the US government, and probably a few other governments, already have 24/7 access to every mic and camera that is connected to the web in any way).

I've been assuming voice commands will die out and that this Alexa / Siri hype (hype might not be the right word. buzz? rumblings?) was a result of Amazon and Apple pushing them from a marketing perspective. The amount of comments about Siri in a thread about a random exec being added to Apple is making me re-consider that PoV.

32 comments

> Do people really use voice commands for, well, anything?

Siri plus shortcuts have made many mundane tasks easier. When I get in my car to come home from work I say "Hey Siri, heading home." That causes my phone to text my wife my arrival time and starts the last podcast I had playing.

It's a simple thing, but is so much easier than texting and then thumbing through the podcast player to start where I left off. I have others like logging my water intake or weight, but it was really adding shortcuts to Siri that made these possible.

Playing music or TV shows is also much easier/nicer. "Hey Google, play The Office on Netflix".

Timers. Another simple thing that is so much easier when you can use your voice when cooking.

I am skeptical when technologists say voice assisted systems will become the dominant interface at least in countries with a high rate of literacy.

I just look at TV vs radio, texting versus calling, or audio books versus written content. I believe most studies indicate that people are better at visual comprehension versus auditory:

https://news.nationalgeographic.com/news/2014/03/140312-audi...

I know scientists love working on voice and speech recognition, since it is a hard problem to solve, but it sometimes feels like its a bit of a solution in search of a problem. I'm sure there are good use cases, I'm just skeptical that they are profound enough for voice to be our primary medium for interaction.

More generally, I think the thing you are noticing is that visual and physical items offer random access.

Compare trying to find a specific piece of information in a book, vs in some training DVD.

If I'm just learning how to cook, watching a professional demonstrate the whole thing is going to be very helpful, but if I already know how to cook in general it's easier to flick to the right section of a book and scan the page for the bit of information I need.

Or compare the difference between listening to a phone system's 7 different options vs seeing all the options available on a single screen.

The other side of this is precision. Not only do input methods like a keyboard allow you to give extremely explicit, high information, instructions with no need for interpretation, they also have extremely fast feedback loops. Imagine trying to use your voice to click on a specific part of an image, or draw a circle around it. Far, far easier to move a pointer with your hand, watch where it goes, and then click when it's in the right position.

So visual comprehension probably is better than auditory, but I think the main things that are important are random access, specific and information dense input, and low latency feedback loops on input - all things that we are far better at achieving with physical/visual methods than auditory or speech based methods.

This is very well said and a great point. A lot of this relates to random access and which has an O(1) lookup. “Play season 2, episode 3” could be better as voice versus “if you want to reach reception, dial 1” is much better as an interface.
I agree with your skepticism about voice becoming generally dominant, but it’s already very useful. It may also become the dominant form of usage for some systems.
It's also hard to imagine sound as a dominant interface because all we have are mediocre examples. We have to work within clunky command boundaries, rephrase commands, be in a quiet environment, not have an accent, etc.

I'm glad we're making progress, but I'll be a skeptic until I can give voice requests as naturally as I'd give them to a human. IMO there's no limit from there.

I agree with your main point, but your examples seem suspect to me. TV is video _and_ audio, audiobooks are a translation of an existing artform (that is, books were originally created to be read, not listened to), and I find texting to be extremely clunky as a concept and do not enjoy tapping out long or interesting messages on a tiny touchscreen.
> Siri plus shortcuts have made many mundane tasks easier. When I get in my car to come home from work I say "Hey Siri, heading home." That causes my phone to text my wife my arrival time and starts the last podcast I had playing.

Wait, you do what?

How can I get SIRI to do this for me? Can you explain how you got SIRI to do that?

Along with iOS 12, Apple released a new app called Shortcuts. It is not preinstalled as far as I know. I think they got it via acquisition. There’s a bunch of new hooks that app makers can/must use to integrate with it, so not everything is supported.

It’s basically IFTTT for iOS, and you can assign phrases in Siri to it.

https://support.apple.com/guide/shortcuts/welcome/ios

I never got Siri to work that well. I found it's had problems calling and some of it's standard features. It's still on my iPhone, there's no way to remove, but I've found Google Assistant to work better. Plus, the Google Home integration is nice.
I use Siri for almost every task that doesn’t require me to physically look at my screen, e.g. reading, watching a video or typing. The #1 and #2 problems I have with Siri recognizing my input are in order: the quality of the mic I’m using and the ambient noise level in whatever environment I’m in. The #3 problem is the quality of my network connection because it won’t work if it can’t contact Apple.

In my experience, AirPods are the best Siri input device I own. EarPods are a distant second, and the built in mic on phone is a not very distant third. It is effectively unusable on my laptop’s built in mic.

The ambient noise basically means I can’t use Siri in noisy places, and I’m typically not inclined to. I might raise my voice a little if I’m putting in a podcast outside and it is windy.

#3 means disabling WiFi when I leave my house, until I’m in a location with a solid WiFi connection, in part because I make use of my cable WiFi. If I have no service and no WiFi then I have no Siri, not even to set a timer.

Beyond that, I find the basic feature set adequate, but not comprehensive. Siri shortcuts doubled Siri’s usefulness and I only use them for three or four apps.

That said, I appreciate Siri’s presence because it does enable me to leave my phone in my pocket a lot more than I used to, so there will occasionally be a week where I didn’t spend more than an hour or two looking at my phone’s screen the entire week (not per day, per week), with 90% of this time spent reading a book. Observing my friends’ obsessions and work habits, I appear to be the outlier in that regard.

>Do people really use voice commands for, well, anything?

Yes, daily.

I use it when cooking to get instructions on how long to bake a specific vegetable and at what temp. While I'm out driving to get directions hands free. When I want to do a search but can't be bothered to type the whole thing out on my mobile keyboard. When I'm at home and want to listen to radio on my speakers. I use a Pixel 2, so when I make a call I just squeeze it and say 'call {contact}', rather than find-and-open the phone app.

Don't look at voice commands as a interface replacement for keyboards. Instead, look at voice commands as a new interface for situations where using a keyboard or touch screen is a hassle.

20% of Google searches are made by voice. https://searchengineland.com/google-reveals-20-percent-queri...

That number is probably a little outdated now. But it’s definitely not targeting a niche audience.

I find this absolutely fascinating when I compare it to my personal experience. It begs the question: how frequently does the average person Google something compared to power users? I consider myself to have above average googling skills, and I'm a developer so there's obviously a lot of documentation searching regularly, but even excluding all of that I surely perform a good 10-20 non-programming searches a day and it could easily be an order of magnitude more if I'm actively researching something.

Does the average user perform significantly less searches, and so the novelty or occasional voice search moves the needle 20%, or are they performing as many searches as me but using voice for many of them. I personally only ever found voice search useful for things that are more like questions and not research ("how old is _______" is a classic example) so I find it difficult to believe the latter. The former would be quite the revelation though because I always assumed _everyone_ googled as much as I do but it seems that might not be the case.

I rarely use google outside of work. At home, I have a few regular sites I keep up with, I know the URLs and I just type them in.
The article says that it is 20% of mobile searches. I doubt many people are doing very lengthy research mobile. I have seen siri and okgoogle being used pretty commonly for mobile searching. I'd be interested to see how many searches happen on mobile vs desktop though.

I would also be interested in seeing if people looking for places/directions factors into this in some way.

I thought the same, then visited family who use it constantly and it all made sense.
> Do people really use voice commands for, well, anything?

A friend of my partner uses voice commands for everything on her iPhone. She is almost blind on one eye, and has terrible eye sight with the other eye (albino trait).

There are some use-cases where voice is a better interaction method. Driving alone in a car is one -- so you can keep both hands on the wheel and both eyes on the road. Another one is in the kitchen. My hands are often full/busy/messy when I'm cooking -- which is a pain when I need to set a timer on my phone. Being able to say "Alexa - set a timer for 12 minutes" is great. The novelty of the Echo has faded almost completely beyond Timers, spotify and weather. Beyond that -- it's a novelty. But I do love those timers.
That's the thing though, given how cheap they an echo dot is, especially during promotions, there's very little reason not to get one. They're the perfect small gift.

Virtually every other hot gadget from the last decade has been far more expensive at this stage to the point where it slowed down the adoption rate (smartwatches, certain cameras), or made it absolutely never go anywhere near what the hype train lead us to believe (VR/AR products).

> given how cheap the echo dot is ... they’re the perfect small gift.

At the Real Canadian Superstore you get a gift when you spend over $300. It changes weekly. Sometimes it’s a box of cereals or chocolates. Sometimes it’s houseware, a plant or a lawn chair. At thanksgiving it’s a frozen turkey. That one time it was 2kg of bacon. Last week it was an Echo Dot.

Based on popularity within the family, and compared to the alternatives, it’s far from being the perfect gift. That said, it does seem to be cheap.

I would take 2kg of bacon over an Echo Dot
I think you're more likely to use a smart assistant as their ability to understand and respond correctly improves.

Hiring away Google's head of AI seems to have made a material difference in how well Siri performs in an annual head to head comparison of how well various smart speakers responded to 800 sample requests.

>Google Home continued its outperformance, answering 86% correctly and understanding all 800 questions. The HomePod correctly answered 75% and only misunderstood 3, the Echo correctly answered 73% and misunderstood 8 questions, and Cortana correctly answered 63% and misunderstood just 5 questions.

>Note that nearly every misunderstood question involved a proper noun, often the name of a local town or restaurant.

https://loupventures.com/annual-smart-speaker-iq-test/

A 22% increase in correct responses over last year's performance.

They're very useful for very limited things - "Set a timer for 30 minutes" or "Wake me up at 8am" is easier than doing that yourself. Even dictating a short text in a pinch is nice. Think "in the kitchen" with wet or dirty hands.
I use this in the same way for location based reminders ie. "...when I leave work" or "...when I get home"
>Do people really use voice commands for, well, anything?

You mentioned this, but constantly in the car, but it's not even sort of begrudging. I got an early sale on the Echo Auto devices (Alexa for your car, basically), and love it. Everything I could do by dinking around with my phone, I now don't have to.

Outside of the car, voice commands for stuff like home control is natural, and almost kinda magical. Walking in with both hands full of groceries and barking "Alexa, turn on the kitchen lights" is awesome. Same for setting timers while cooking, turning on music, and so forth. So long as you remember that you're dealing with what amounts to a voice command line, not the Enterprise's computer, everything flows smoothly.

Conversely, I almost never use Siri even though by way of car bluetooth it should have the same kind of functionality.. but it's so limited and inaccurate as to be functionally useless.

>is no one else creeped out at always on mics and video cameras in their house?

Not for me, because a device that is local-only listening for a wake-word is not even sort of creepy. Your explanation, intentionally or not, paints it as a device that maintains a constant connection to the mothership and gives $company a live stream of everything happening around it.

This is an incredibly annoying misconception that I've grown weary of seeing.

Voice commands are useful if you are having conversation with friends and a question comes up that group wants an answer to. I find it more social to ask a phone with a voice command than to open the phone, type a query into google, and search for an answer.

Of course Siri defaults to an incorrect google search on nearly every question, so this often doesn't work in practice...

I never got into voice control. For two reason. If you have a job that does day to day coding, office or paper work and voice are less used, may be speaking a few command doesn't feel that much. If your day to day job is managing dozen of people which involves lots of listening arguing and speaking as well as selling your idea. The last thing I want to do at the end of the day is to speak again to get what I want. I can use the phone and press a few button, it might take a little longer. But it feels much BETTER.

The second is command. I don't want to put Siri in front of every sentences. It is unnatural. If I have a maid, that is not how the conversation would go if I need to get something done. The amount of work ( turning something on or off, or text, or music ) is relatively small compared to the amount of commands I have to give. Or in other words, Giving a command to Siri, ( 4 - 6 words ) is more troublesome than pressing 3 - 4 buttons.

Being able to hold up my watch and say “set a timer for 30 minutes” is significantly better than fumbling through menus.
In the kitchen, I use a real mechanical timer. Twist, set, done.
Yes. And for my 5 1/2 year old son, who’s always had voice around, it’s very natural. The first words I utter every day are “Alexa, start my day”, which triggers a whole bunch of automation (lights, plugs, news, etc) while I get my cereal and coffee ready.
The number of times I've seen people using voice commands since Siri launched isn't more than 5 (except when people were experimenting with jokes and stuff). This may be specific to where I live idk.

I'd say we need 10-20 more years for voice assistants to be smoothly integrated with our daily lives. Until then big tech companies have just started the race (collecting data, enhancing experience) to be the best voice interface in the future.

From my point of view, users of our generation are just experimental subjects for currently unfunctional & uncommon but buzzed products like voice assistants and VR.

I frequently use voice commands for both my Android phone and for my Amazon Echo, which is the only way to interact with it. I even have my phone wake up and unlock from the sound of my voice because it's so convenient. I do it when I'm driving, when I'm in my bed and my phone is plugged in and out of reach and when I'm just too lazy. Outside of that I use my Echo for shopping, adding things to the shopping list as I'm looking through my fridge, timers, recipes, music, asking factual questions and more. I love voice activated features and don't find them super-awkward in the least.
> Do people really use voice commands for, well, anything?

I was recently getting a dental procedure done and the Periodontist kept using Alexa while he was treating me. "Alexa, play the Eagles!" "Alexa, skip this song!" He seemed super impressed like he was really excited to show it off... I thought it was annoying. I think voice command stuff is lame. I have siri permanently disabled on my iphone and apple watch.

I've heard some people like using siri with the apple watch, but I never got into it- I would always accidentally set it off when I was weightlifting.

Have you tried Google assistant? I find it to be much more useful than Siri.
People tend to use it quite heavily, once it crosses the threshold where it understands you the overwhelming majority of the time. IOW the threshold Siri is yet to cross.
Ease of use and friendliness of the user interface is what the spirit of Apple Computers is all about. If I were to guess Jobs might be focusing on if he were alive would be the convenience and ease of use of Apple products. I think voice communications would be something he would use. That is based on his various biographies about him. Whether this is ethical or morally right to have tech go this direction is another matter. Ease of use is the trend.
The only reason I can envision using voice controls for anything is in the car while driving, and you would only begrudgingly use them because it is overwhelmingly safer to have both hands on the wheel and your eyes on the road.

Yup, that's pretty much the only reason I use Siri. Though I could see VR as another possible use case, since hand controls in VR are less precise and slower than keyboard/mouse for textual input.

> Do people really use voice commands for, well, anything?

On my phone it's rare unless I'm getting in my car and having it pull up directions.

Where I found I use it all the time is with the Amazon Fire Stick. When I have a show I want to watch I don't have to fumble with a stupid keyboard on the TV, I just say the name and it works. Also setting timers in the kitchen when I'm cooking, it saves me from getting raw meat on all the surfaces

Two scenerios:

- when I’m driving is the big one

- and for reminders.

It’s lot easier to say:

- remind me to call my mom when I get home.

- remind me to call my wife when I get in the car.

- remind me to get milk when I get to $grocery_store

Than to set up the reminder manually.

I use it for a few things only:

- Remind me to X in N minutes/hours/days

- Set an alarm for N minutes

- What’s the weather like today/tomorrow?

- Play X by Y (when driving and wanting a specific album or song to play)

- What is [insert some question of simple knowledge I’ve forgotten]?

Nothing else seems worth bothering with.

I used Google voice a handful of times every day when I was on Android. Now that I'm on iOS I have Siri disabled. It's infuriating that Apple supply sub part services, and don't let you choose something else as a default replacement.
I say "Wake me up at 7" and "Turn off my alarm" every work night and work morning. If I'm ever timing something (like how long to let my French press steep), I say "Set a timer for x minutes."
You can set an alarm once for M-F, etc.
I prefer to do it every night before going to bed. I like to feel like I'm choosing to get up at that time, as opposed to being under a pre-ordained schedule. (Which, of course, it is, but perception matters.)
I used to use voice commands extensively when I used Android. But in 2015 I got an iPhone and Siri is so much worse (both recognition accuracy and available functionality) that I stopped bothering. It’s a shame.
Yes, in the car via CarPlay. It makes for safer driving. I just plug my phone in and put it away so I can’t reach it to text and drive. I then ask it to text, look up directions or play whatever song I want.
This would be great if Siri ever reliably worked for me. It works just often enough that I keep trying to use it before giving up in frustration for the umpteenth time. I recently retrofitted CarPlay to my Mazda and it's better than Mazda's voice recognition, but that's a very low bar.

Here's an example of something that seems obviously should work: I'm driving to pick someone up. Apple maps is navigating. "Hey siri, text <person I'm picking up> my ETA."

Yeah, that doesn't work. The only thing I reliable get out of Siri is setting a timer and opening the camera.

I mostly use it to set alarms and reminders, it's much easier to say "Remind me to do something in an hour" than go find the app, set up the correct time, and write the description.
Voice commands are the future . “Turn off kids bedroom” “remind me about ... later today”, “call wife”, “set timer for 3 minutes” - it is so much faster then fumbling through the UI
I only use it to spell words I don’t know the spelling of. :p

I speak with an accent so it was always hit and miss for me - although it had gotten better in recent years.

I’m also too lazy to talk ...

I dislike voice commands.

I have tried using speech-to-text for text messages and emails, and I usually spend more time correcting the mistakes than it saves.

Not to mention that this input can only be used in private spaces: Home or separate office cabins.
> Do people really use voice commands for, well, anything?

Setting alarms and reminders

Also to request playing Downton Abbey. Although Alexa is sketchy about specific episodes and seasons. And no way can she understand "play the bit where Mary and Matthew are standing outside as it begins to snow"...
There's a (perhaps apocryphal) story about Minsky and Engelbart: Minsky proudly proclaimed, "We're going to make machines intelligent! We're going to make them walk and talk! We're going to make them conscious!" Engelbart shot back, "You're going to do that for computers? Well, what are you going to do for people?"

https://books.google.nl/books?id=uNDW_dQ_dlAC&pg=PA167&lpg=P...

Ben Shneiderman's 1993 IEEE Software article, "Beyond Intelligent Machines: Just do it!" was prompted by discussion between Mark Weiser (father of Ubiquitous Computing) and Bill Hefley, and argues that users want a sense of direct and immediate control over computers that differs from how they interact with people.

http://www.cs.umd.edu/hcil/trs/93-03/93-03.html

[...]

WHY NOT INTELLIGENT? I am opposed to labeling computers as "intelligent" for several reasons. First, such a classification limits the imagination. We should have much greater ambition than to make a computer behave like an intelligent butler or other human agent. Computer-supported cooperative work, hypertext/hypermedia, multimedia, information visualization, and virtual reality are powerful technologies that enable human users to accomplish tasks that no human has ever done. If we describe computers in human terms, we run the risk of limiting our ambition and creativity in the design of future computer capabilities. In the same way that most of us have learned to use terminology not specific to any gender, we should now learn not to limit designers of computers with the tag "intelligent" or "smart."

Second, the qualities of predictability and control are desirable. If machines are intelligent or adaptive, they may have less of these qualities. Usability studies at the University of Maryland show that users want the feelings of mastery, competence, and understanding that come from a predictable and controllable interface. Most users seek a sense of accomplishment at the end of the day, not the sense that some intelligent machine magically did their job for them.

Another reason I'm concerned about this label is that it limits or even eliminates human responsibility. I am concerned that if designers are successful in convincing the users that computers are intelligent, then the users will have a reduced sense of responsibility for failures. The tendency to blame the machine is already widespread and I think we will be on dangerous ground if we encourage this trend. As part of my work, I collect newspapers articles about computers, some of which bear the headlines "Victims of Computer Error Go Hungry," "IRS Computers Err on Refund Reports," and "Computers That 'Hear' Taking Jobs" -- all of which seem to absolve human operators by implicating the machine.

Finally, I have a basic philosophical objection to the "intelligent" label. Machines are not people, nor can they ever become so. For me, computers have no more intelligence than a wooden pencil. If you confuse the way you treat machines with the way you treat people, you may end up treating people like machines, which devalues human emotional experiences, creativity, individuality, and relationships of trust. I know that many of my colleagues are quite happy to call machines intelligent and knowledgeable, but I prefer to treat and think about machines in very different ways from the way I treat and think about people.

[...]

+ Natural-language interaction seems clumsy and slow compared to direct manipulation and information-visualization methods that use rapid, high-resolution, color displays with pointing devices. Lotus HAL is gone, Artificial Intelligence Corp.'s Intellect hangs on but is not catching on. Although there are some interesting directions for tools that support human work through natural-language processing (aiding human translators, parsing texts, and generating reports from structured databases) this is different from natural-language interaction.

+ Speech I/O in talking cars and vending machines has not flourished. Voice recognition is fine for handicapped users and special situations, but doesn't seem to be viable for widespread use in office, home, or school settings. Our recent studies suggest that speech I/O has a greater interference with short term and working memory than hand-eye coordination for menu selection by mouse. Voice store and forward, phone-based information retrieval, and voice annotation have great potential but these are not intelligent applications.

[...]