>>>But I suspect that people don’t ask those questions because, after trying a time or two and getting no answers or wrong answers, they just give up on Siri.
<<<
Absolutely this. I mean, you try a few of this these "long tail" queries and you eventually say "Fuck it!" I attempt these "long tail" queries (which to me, that term sounds like some shitty play-it-down excuse from Apple) weekly just to see if Apple is finally getting their act together in regards to Siri usefulness. I am consistently disappointed and never surprised and delighted.
I try to screenshot "Siri fails". Here are the last few in my screenshot album:
-------------------
"Open last screenshot" → "You don't seem to have an app named 'last screenshot' We could see if the App Store has it [App Store]"
"Share this with my wife" (Photo was open in Photos.app) → "I'm sorry Joe, I'm afraid I can't do that"
"When is the last time I exercised" → "Interesting Question, Joe"
"What year was this song recorded" (In Music app with song playing) → "Interesting question, Joe"
-------------------
We're living in a world where we can do more and more without using the screen as an interface. It's happening. I'm worried that Siri is a low priority product for Apple right now and they will soon be scrambling to play catch up if they aren't already.
EDIT: I'm hoping that Apple's recent buddying up with IBM allows for some of Watson's intellect to seep into Siri. Who knows, maybe in a future keynote we'll hear Tim say, "Ladies and gentlemen, Watson is coming to your Mac and iPhone!"
I believe they have added ML in the Assistant, along with some new features like context recognition. Granted, it's of course difficult to say how much is shared between Now and Assistant.
"It also can’t distinguish between the question of “who is” a person and a request for that person’s contact card. For instance, I have a contact card for Apple CEO Tim Cook. When I ask, “Who is Tim Cook?” Siri shows me the contact card, not his bio."
I don't see it as humblebragging (any of us could create a Cook contact / ReCode readers likely know Walt Mossberg's relationship to Apple) but, worse, a faulty premise.
Odds are, if you have them in your contact database, you already know them; you're not going to want Siri to give you their Wikipedia bio.
Siri has myriad faults, and thankfully someone of Mossberg's stature might push Apple to address them, but this is not one of them.
If you are asking specifically for _who_ somebody is, it should tell you who that person is - not give the contact card, because that card does not tell you who somebody is.
At least, that's what I would expect if I asked who somebody is.
I'm presupposing the contact card has their job position/title, and therefore still answers the question – also presupposing the original (odd) premise of wondering who someone already saved in your phone is.
The Venn overlap between the set of "contacts in your phone" and "people with Wikipedia bios" is likely rather small. Hence why I think it's a faulty premise to complain about Siri defaulting to contact card when these two sets do intersect.
That's not the point either. The point is that if I asked you, a human, "Who is Tim Cook" and you replied "123-555-1212" or "tim@apple.com" I would be a little dumbfounded.
If I'm asking the question - maybe a friend is nearby who doesn't know them and I'm too lazy to explain - then I want to know who they are, not what their contact info is.
I could imagine some scenarios when I ask a human assistant "who is Joe Bloggs" and it would be quite reasonable for them to answer "oh you've met him, he even gave you his business card".
I don't think anyone's saying that the question "who is x" ever means "give me the contact information for x". They are just arguing as to whether or not Siri's counter-intuitive behavior may be a reasonable response, given that it's a computer and not a human. This is not a hard question for humans.
It's a very hard question for humans. Back in the 70s AI had already worked out that conversations take place in "frames" which include a a ton of implied state. It turns out that state is essential to make sense of human conversations, because words and constructs have different meanings in different frames.
Even simple questions like "Who is..." has many different interpretations. A human will understand the context. An AI won't, because you can't derive the context from the words themselves. It's a function of social setting, physical setting, relationship, previous conversations, and so on.
At the moment conversational interfaces are more like a Bash shell with a speech recogniser on the front. The shell needs a precisely formed command and has almost no concept of state or context at all. (I think Siri actually has some, but not much.)
So it's completely unrealistic to expect CIs to be able to do this today. It will only be possible when NLP gets a whole lot more sophisticated and starts tracking context and state - although even that will still be a hard problem, because social state is defined as much by location, physical surroundings, time of day, and custom as by the words being used.
Except the contact card has meta-information like: company they work at, where they're located, etc. (including custom fields you can create).
A contact card could definitely be used for that. If one exists, Siri should give the info from it and then wait to see if the user also wants external information (from Wikipedia or elsewhere).
I'm not convinced this one should even be considered a failure with Siri. If I have a contact card for someone, it's perfectly reasonable to be given that contact card when I ask who they are. If I want to see a wikipedia article, I should just ask for the wikipedia article in the first place!
My wife is exactly the customer that's given up on asking Siri questions and uses it for timers. I can agree that Google does a better job at answering questions, but I don't like that Google requires access to just about everything in my life to be at all worthwhile, which is generally why I avoid google services. I realize I'm a minority but I don't like their all or nothing approach to slurping my data.
Does Android really answer questions? The only one I know to work is "How high is Mount Everest?". For everything else, it just reads the first sentence of the Wikipedia article, or presents a search. I'm using a German phone, so an English one might work better. But mine doesn't even know "What color is the sky?"
The thing that always impresses me about Google Now and their voice parsing in general isn't so much about speaking answers back to spoken queries but just how good it can be at guessing what you mean.
My favorite example was a couple of years ago when I was at a bar. There was a song on the jukebox ("Coffee Pot" by Cajmere) which, for reference, is a house music song with a repeated vocal sample that says "It's time for the perculator [sic]". I assume they mean "percolator" but that's how the guy pronounces it in the sample.
Anyway, we were joking around and even though the bar was quite loud with people talking loudly all around and music playing, I pulled out my phone and asked "Ok, Google: is it time for the perculator?" I pronounced it incorrectly like the guy in the song and within seconds, I had results. The top one was a link to the video for "Coffee Pot" by Cajmere on Youtube and the rest were links to other stuff regarding the song.
Now, this was at least 2 years ago and possibly more. The response to my silly joke of a question wasn't so witty as to give me a spoken "it's time for the perculator" or anything like that....but it could parse my words despite all the noise and competing speech around me, it understood a mispronounced word, and it knew that this lyric sample referred to a song with a different title (Coffee Pot) and linked me to the video.
That was seriously the first time I was really impressed by their voice assistant even if it was a ridiculous request. No idea what the result would be on today's Google Now or tomorrow's Assistant (guess I'll find out when my Pixel gets delivered). Still, even if just from an engineering and software angle, that made me grin a bigger geeky grin than I might have imagined.
I find all of the voice responses from both to be tediously slow. I'm a fast speaker, and I like it when people speak at a fast pace with me.
I like that Google seems to just display information for may of the responses vs. trying to cram in some overly long witty reply or verbatim reading of the text that takes too long to speak back.
Having some sort of speed/succinctness setting would be helpful in that regard.
My dad uses Google Now like this all the time. The first sentence of the Wikipedia article is generally a pretty useful summary of whatever you were searching for -- maybe this is a difference between the English and the German Wikipedia articles? Google now is pretty good at trivia in general though. I just asked mine who won the 1995 NCAA Basketball Championship, and I learned the UCLA Bruins won. My google now says the sky is blue when I ask it as well.
I can't wait until personal agent AI has advanced to the point where I can ask that and hear "Cerulean blue. Cerulean makes me think of a breeze. A gentle breeze..."
Dangit. I shouldn't have said anything, then. I want my personal AI agent to watch all the movies and television shows I have ever watched, and read all the books I have ever read, and play all the video games I have ever played, so that it can make obscure references that I can understand at the appropriate times.
In this case, it would have to know the various names of sky-blue colors, like "azure", "cerulean", "process cyan", or technical answers like "transparent with Rayleigh scattering" or "hue value between 190 and 200 degrees". Then it would have to cross reference that with things I have already seen or read, and then prioritize several potential responses before finally playing me an audio clip from X-Files 3:17 "Pusher", when the eponymous character is being transported in the back of a cop car.
Google has their knowledge graph, a semantic search engine for the most commonly asked questions.
The wikipedia page for Knowledge graph cites Thomas Jefferson as an example, so I asked 'what years was Thomas Jefferson in office?', and Google deferred to the first paragraph of TJ's wiki.
Then I asked 'What year was Thomas Jefferson born?', and it gave me a direct answer.
The German version is much worse than the English one. It's worth switching the phone language to English to try it there.
It's not that different with Siri, they always brag with features in English, localization takes a while and has much less resources allocated it seems.
The eye-opener for me was the movie Her. That take on personal assistants was amazing, but it seemed to have with it the assumption of personal privacy between you and the assistant, and unbridled access to ALL of your data.
So a good exercise is to rewatch that movie, and in each scene where the assistant does something useful or cool, take a moment to reflect on what sort of data they would need access to, and how that data might be used.
I think where things get murky is that the movie didn't really touch on the company behind it at all, and if anything it almost gave the impression of being more like Apple than Google. As such, it neatly skirted today's concerns where the clear leader in the space (Google), is an advertising company that monetizes it's users' data and is constantly raising privacy concerns.
It is a fine line to walk, but ultimately everyone will need to weigh the decision in their mind as to the right balance of privacy and convenience.
> When I asked, “What is the weather on Crete?” Siri gave me the weather for Crete, Illinois, a small village which — while I’m sure it’s great — isn’t what most people mean
Most of the narrative about Apple's maps products has simply been "Apple bad, Google good" and hasn't looked much deeper than this (with the exception of Justin O'Beirne's cartographic commentary, which is remarkably detailed though much of it is just a matter of taste). But Mossberg really hits on something here:
Apple's geocoding is way below par. The cartography is superb IMO. The routing is very good[1]. The source data isn't bad at all. But the geocoding is really error-prone. I've asked it for directions to Milton Keynes, a town with a population of 250,000 just 40 miles from here, and it won't give me anything other than the nearest branch office (in a completely different town) of a company whose HQ happens to be in MK. I've asked it for directions to the village of Brize Norton and it flat-out refuses, sending me instead to the RAF base of the same name, even if I ask for "Brize Norton village", "Brize Norton School" and so on.
Yes, geocoding is famously hard. But as an OpenStreetMap volunteer I see that for anything other than granular street-level addresses (where we don't have the data), our grassroots geocoder is more reliable than that built by the most valuable company on earth. And as feedback to OSM shows, bad geocoding is the easiest way to make people think your entire map product is no good.
[1] except there's no bike routing... but hey Apple, if you want top-notch bike routing, give me a call ;)
Route information with Siri doesn't seem to be useful even if they know the location. I asked Siri (in central London) how to get to heathrow the next day. It just opened apple maps and showed me how to walk there. I guess I used maps for in pedestrian mode before, but that answer was absolutely useless.
I'll disagree about them having good source data, or if they have it they're not showing it.
Public transport info is only available for very small set of predefined cities (e.g SF, Paris). Apple Maps doesn't even show the stops on the map, when every OSM app has them.
While I ask Android for occasional content (while watching tv, "How tall is blah blah actor" etc) I usually use Androids ability to get my usual directives. "Directions to ..., Weather in..., Call <business> in <location>..., Set timer for...". It does both very well.
I stay signed out and opted out of all Google Now features and tracking so this is just it does by default. Voice is my primary ui for my phone. It is very useful and I find it parses my input correctly nearly everytime, it's very rare for it to error.
Meanwhile, my wifes iphone 6 gets basic commands wrong frequently and she spends minutes typing all the time. Siri isn't losing, it has lost.
I don't think this actually matters to the iOS population however. It's well known they don't buy iphones for features.
edit: A sample question from the article "when is the presidential debate" is handled fine as an example
On the occasions when I have witnessed it, every attempt my spouse makes at using Siri in an ordinary, reasonable way always ends in cursing and shouting.
It reminds me in many ways of the handwriting recognition on the Newton--which is to say it appears to be only a hairsbreadth from absolutely useless.
Also, accents. Google sells phones all around the world and has worked very very hard to get training data from everywhere. For example, it has no trouble with my Indian accent at all. Siri... She's very American.
The main gripe I have with Siri is that is still can't understand bilingual (german/english) conversation. This makes this service unusable for me. How do you ask Siri for a song, a movie, a person name which happens to be english, when the device is set to german?
Yes, I'm a native Spanish speaker and Google Now works great with bilingual queries for me, that has been improving in recent years.
I always select English as my default language in my phones, some years ago if I wanted to call someone with a Spanish name saying something like "Call Ramon Hernandez" or asking for directions "Take me to Periférico de la Juventud" I used to had to fake an American accent. Today I don't have to do that anymore, I can speak natural Spanish and then it can understand my Mexican accent when I speak English. So it's not just improving in bilingual queries, they have focused in accents and mispronunciations.
Google's voice recognition works well when it's English plus one other language. If neither is English (eg. Japanese + Cantonese) it doesn't work at all. Neither does the triple combination of English + Japanese + Cantonese work, even if explicitly specified in settings.
The Apple TV Siri actually can cope with two languages when it comes to movie names. Using Siri here I don't have to "hack" it by pronouncing English words in German.
At least the iOS keyboard understands more than one language simultaneously now. Something macOS and Android do for a long time.
Siri seems so dumb because she is dumb, incredibly annoyingly dumb.
Not compared to humans, but compared to what Apple could have achieved, what others achieved some time ago.
Examples see other comments and: ... “Stop searching the web” “OK” … 1 minute later: “open my email” “I found this on the web for ‘open my email’” (... fixed by now? ...)
"here is what I found on the web for 'what is Yen's email address'"
I wish Siri wouldn't ever show me web results; If I'm talking to my phone I don't want to click around or often even look at it. Everytime Siri offers to search the web for something, it's a fail.
Siri is great for setting alarms and timers -- much faster than using the UI.
I recently switched back to iOS after using Android for a couple years and one thing I found shocking is how little Siri had improved. I only use her for creating timers these days. I'll try to schedule events but I find her success rate at adding things to my calendar accurately is < 50%. Try correcting a time to a just-added event. She can't do it.
I asked her for the nearest gas station the other day while driving and I think she responded with a list of google search results? Which is almost comical. I'm not going to grab my phone and sift through Google search results while behind the wheel.
nearest gas station works
for me. i use it all
the time. perhaps it's the phrasing. either way, i agree with the general sentiment. only use it for timers and directions while driving
>It puts much less emphasis on what it calls “long tail” questions, like the ones I’ve cited above, which in some cases, Apple says, number in only the hundreds each day.
And every one of those "hundreds per day" is another person one step closer to disabling Siri. Which is what I did last month. Siri got things wrong so often, it was worse than useless.
It strikes me that they are still struggling to solve the questions that get asked thousands of times a day. So, yeh, it'd be nice to have the long tail work too. But, clearly this is still very much a work in progress from all vendors.
I'd like AIs to have settings so they start engaging me, rather me always having to engage them. e.g ask me what my tasks are today, or other types of interactions initiated by them at a propitious time.
Because voice recognition from all the major providers goes at it completely backwards, being ready to answer any of the world's questions in every situation.
In the real world nothing goes like this. We start with context and work outwards from there. And I don't mean know-everything-about-your-personal-life context, there's no need for that either.
I'd say more but that is what I am solving at Optik. We're a bit over a year into development and things are really starting to pick up. Using Cortana in the Hololens is making it painfully obvious just how close but how far off the mark remains most voice command software.
What I don't understand is why we don't try to solve this using a huge fact database derived from natural language parsing of the web. Basically the same approach as deep learning - something that was unfeasible 20 years ago, but due to massive improvements in processing power now works.
What I'm thinking of is basically SHRDLU [1] on steroids. Parse a ton of web pages. There are great natural language parsers that can parse most well-formed English sentences. Start with simple sentences like "Golden delicious is an apple" or "Barack Obama is the President". Then you store this in a Subject-Verb-Object database (I just learned that this is called a Triplestore [2]).
Every statement gets a plausibility value. Deal with ambiguity by adding multiple interpretations of a sentence (with different plausibilites if available). Assign an origin (e.g. website, author, quoted person ...) to each statement. Then, you could query this by asking "What do mice like?"... and it would make "Subject: Mice, Verb: like (enjoy), Object: ???" and return a list of solutions, ordered by plausibility.
Does anybody have any insight into why this isn't done or wouldn't work? It seems wierd that I can't ask my phone simple facts about the world, other than those whose form have been hardcoded.
(Now that I think about it, the opposite would also work. Hardcode a ton more commands. Hire 100 people, let them sift through the most common queries. Watch a few dozen testers, add add all queries they try to use. Instead of throwing computing resources at the problem, this would throw cheap labor at it.)
Anyway, it boggles the mind that I can't shout "OK Google, play 'itsy bity spider' on youtube" when my toddler demands it but won't release my phone :-). It opens a search and shows what I want as the first result (probably customized from my history), but I have to go the last mile myself.
Obviously there's still a long way to go, but yes, Google is working on everything you suggest and more.
This is actually Google's bread and butter, and why Mossberg found so many cases where Google Now does better than Siri. It's not clear how Apple can catch up, either, given Google's massive advantage in training data.
Triplestores are defiantly cool; I used one once where I put in our user's birthdays and took a public ally available TS dump (I think it was from dbpedia) of celebrities and matched them up saying things like "hey you share a birthday with X"
What was cool about that was that I didn't have to know how to get their info nor what format it was in, provided that they followed some standard namespaces.
Look up JSON-LD while you are at it. This method can be extended to the IoT and then we will get some interesting outcomes.
Apple and Google are both shooting themselves in the foot by insisting on keeping their voice products within walled gardens.
Amazon is doing an amazing job of allowing the community to extend the voice functionality of their Amazon Echo. You can build an app that does just about anything you want and release it on their Amazon Echo store. There are already hundreds on there, and it's only a matter of time before someone releases a truly conversational A.I. integration for it.
My family uses our Amazon Echoes (all 6 of them, scattered strategically around the house) for hundreds of little things throughout the week. I control all of my lights, my thermostat, my entire home theater (via custom voice activated scripts). I even use it to make my tempurpedic bed vibrate ("Alexa, turn on bed vibration"). I use Echo for timers, and my morning alarm to wake up (which also triggers my bed vibration). I use it to query wikipedia subjects, perform quick math calculations, order new paper towels...
If you gave me a couple of hours, I could even whip up an Alexa integration that would let me open my garage door and remote start my car in the morning ("Alexa, turn on my car's air conditioner")! It would just be an Alexa command that triggered a custom script which would log into the web interface for my car link and would click the remote start button. Easy. I love building this sort of thing.
The possibilities are endless when you have an open, extensible platform. I don't understand why Apple and Google are being so dumb and close-minded, and still don't understand this!
I have mixed feelings about Siri. The voice recognition has definitely gotten much better. The domains that it knows about isn't increasing nearly enough. That's not even a matter of AI it's just doing the work. For instance knowing what's on tv.
It does come in handy when I'm driving and for reminders. It seems to do pretty well with directions, playing songs, playing podcasts, and messages.
I much rather tell Siri "remind me not to forget my lunch when I get out of the car" than try to do that with the reminders app.
The difference between Siri and Google Now is the real downfall of iPhone (at least for me). I have enough comfort level with Google Now today that I feel comfortable just flinging any question at it when I'm in hands-free mode (say, when driving) and expect a useful answer of some sort. Trying Siri on friend's or colleagues phones, it's always a frustrating experience.
And this doesn't even include the automatic actions taken by Google Now. Alerting me about travel times, flight delays, sports scores etc. etc. without even being asked.
Ultimately, in Google's vision, the phones are rapidly becoming just a tiny window (no Microsoft.. not you), into all the knowledge and power that actually resides on the cloud. Not so much for Apple, whose core competency is, and has always been, in hardware design and supply chain management.
I also think Siri fails with Text messaging. I should be able to have an IM discussion using just my voice. But the Siri interface is simply not that involved. I'm constantine having to say "Read Latest Text Message".
Best use for Siri: it can turn off all alarms on its own. Helpful for when I set 5 or more occasionally for something, I don't need to keep scrolling the list and disabling them all one by one.
Add even a faint hint of an ESL speaker accent, and Siri becomes outright useless even for the mundane tasks... like setting a timer or sending a message.
Serious question: Is it possible that Apple keeps Siri artificially annoyingly dumb because they still have to pay Nuance billions for all this 'useless chatter' on a usage bases?
I don't know about the new Google Assistant, but I found Siri to work much better than Google Now.
I've been an Android user for about 10 years and just switched to iOS within the last week.
Siri is really the only feature on iPhone (so far) that I have found I prefer to Android.
While it's far from perfect, so far I have been quite pleased with Siri. It is able to let me do my most common tasks via voice commands. 9 times out of 10 Google Now couldn't even tell what I was saying. I attributed it to my somewhat strange accent, but Siri has been nearly perfect on this front. Good thing though as I loved the Swype keyboard on Android and hate typing on the iPhone; I fat finger everything on that tiny virtual keyboard.
Absolutely this. I mean, you try a few of this these "long tail" queries and you eventually say "Fuck it!" I attempt these "long tail" queries (which to me, that term sounds like some shitty play-it-down excuse from Apple) weekly just to see if Apple is finally getting their act together in regards to Siri usefulness. I am consistently disappointed and never surprised and delighted.
I try to screenshot "Siri fails". Here are the last few in my screenshot album:
-------------------
"Open last screenshot" → "You don't seem to have an app named 'last screenshot' We could see if the App Store has it [App Store]"
"Share this with my wife" (Photo was open in Photos.app) → "I'm sorry Joe, I'm afraid I can't do that"
"When is the last time I exercised" → "Interesting Question, Joe"
"What year was this song recorded" (In Music app with song playing) → "Interesting question, Joe"
-------------------
We're living in a world where we can do more and more without using the screen as an interface. It's happening. I'm worried that Siri is a low priority product for Apple right now and they will soon be scrambling to play catch up if they aren't already.
EDIT: I'm hoping that Apple's recent buddying up with IBM allows for some of Watson's intellect to seep into Siri. Who knows, maybe in a future keynote we'll hear Tim say, "Ladies and gentlemen, Watson is coming to your Mac and iPhone!"