Hacker News new | ask | show | jobs
by joebadmo 5346 days ago
I'm skeptical about Siri. My wife got an iPhone 4S on opening weekend, and we both did the requisite amount of playing with the new feature, and it was amusing, but after the first couple of days she pretty much abandoned it.

There are some reliability problems, but I think the main problem is that Siri still lives firmly in the AI uncanny valley. Which is exacerbated by the way that Apple presents Siri, i.e. as AI.

With a clearly defined set of commands, you can be confident about what's going to work and what isn't. And if you try something that isn't a command, you relegate the failure to "oh, that's not a command." But with Siri, because it's presented as an anthropomorphized, intelligent agent, it makes the failure feel a lot more brittle and frustrating.

For example, "What's my next appointment?" works, but "When's my dentist appointment?" doesn't. Why not? Well, I know why not, and you probably know. It's because it's really, really, really not AI, and unless we make some kind of breakthrough on strong AI, it's not going to be for a long time. But my wife doesn't know that. All she knows is that Siri is cool when it works, but is actually pretty stupid a lot of the time. Which means she's not reliable, which is important because Siri is most useful when it must be most reliable, like when you're in a rush.

Apple will certainly continue to add commands and make Siri smarter and smarter, but this will necessarily be incremental, and that failure will always feel brittle to lay users.

[edit: btw, John Siracusa talks a bit about this on the most recent Hypercritical: http://5by5.tv/hypercritical/39-quasimodo-backpack]

5 comments

I can't shake the feeling that the Siri of today is like the app ecosystem of the iPhone 1.

• That Apple has a really solid plan for this feature, and we're only seeing the very beginning of it now — the phase where we are introduced to the interface, before they blow the lid off and open it up to every imaginable use-case.

• That it will be significantly improved before most people ever buy a supporting device, so the handful of customers being burned by the somewhat-lacking version 1 product are vastly outnumbered by the people who get their first taste of the mature, fully-realized vision.

Unlike some of Apple's other features, UI, Apps, and operating software, Siri is not something that will be easy to improve. Siri represents the cumulative efforts of decades of computer science research by numerous public and private entities.

While it's easy to add more voice actions, making advances on the underlying technologies will require additional decades of hard computer science research. Apple, having no R&D division, will not likely even contribute to this.

Unless your main complaint is a lack of canned question types that it can answer, you won't likely see the fast improvements you are expecting in the next few years.

I'd also observe that "let's create an 'AI' by piling on the special cases until we have a generally-capable tool" has been tried numerous times, and it's a known failure case. After a certain point, the piled-on rules being negatively interacting with each other, and it requires one of 1. true AI (thus begging the question) or 2. treating the set of rules as one of the quirkiest programming languages ever to make effective use of it.

Many people are speculating about how wonderful Siri will be in the (near) future; I'd submit that the evidence suggests that it has pretty much come out of the gate with all the power it's going to have for the foreseeable future. Natural language querying seems to have been stuck at the same plateau for a long time, just like voice recognition technology has been.

Maybe for domain independent stuff, but for a domain specific thing (like scheduling), I think heuristics could go a long way. Yes, it's like a programming language, and yes, it has quirks.

Just making sure the weak AI is clear about its interpretation through explicit confirmation ("Sir, I understand that you would like to launch the missiles at Russia in 15 minutes, am I correct?" "No. Siri, please book lunch with my sisters at the Russian restaurant on the 15th.") would probably make up for a lot.

Is voice recognition technology really on a plateau? I have highly accurate speaker independent speech recognition in my pocket now. I'm using it to dictate this response. It didn't require any training, and it's nearly perfect. I may be mistaken, but I believe this capability is relatively new. Even if it's not, it's so close to being perfect that there's little room for improvement, or so it seems.

And yes, I know that all of the smart are not in my pocket, but rather in the cloud.

I bet you're enunciating clearly and that the mic is not picking up much background noise. (Note that you may be in an environment with some noise but there are easy ways to create noise-cancelling microphones or directional microphones that are very effective. You'd have to check the recording to see how much noise is on it.) I could get the same results from Dragon's voice recognition software with careful enunciation and a bit of practice 10 years ago. It is also well-known how to get very good accuracy on a restricted dictionary. What has not been solved is improving beyond that. Situations in which humans will easily extract speech, so easily that we do things like casually lay music tracks over a speaker without much thought, software will still just fail miserably for, as far as I know.
Was Dragon's software speaker-independent ten years ago, and did you have to train it? I looked into this recently and couldn't find any speaker-independent PC software now, and I think it all required training. Being able to just pick up and talk without any preliminaries is still a pretty big deal.

I'm sure you're right about the other deficiencies. "Almost perfect" is very strong, after all. Still, it's really excellent, and in my experience is much better than it was just a few years ago.

People often get carried away thinking about linear progression from the current state when the general problem is NP-hard.

With Siri, however, I'm not interested in it being a person, but something that can help set reminders, timers, appointments, and dictate text messages while I drive. That's huge for me. Rather than breadth, if Apple focuses on depth then the problem is more tractable because you have more context with which to reduce complexity.

Apple's R&D is applied, so it will have a product focus and get to market quicker. If they can make Siri really good at a specific number of tasks then people will understand what it is good for, rather than be disappointed, and it will improve faster.

A thing like Siri might benefit tremendously from being open source.

Everybody with an itch to scratch and time on their hand would help to incrementally improve Siri for 1000's of specialist applications.

Apple touted Voice Actions in OS X 10 years ago. Why is it different/better this time?
That you don't usually have a mac in your pocket. Voice is only really good when your hands are busy and you don't have a chance to sit down and pull out a laptop.
I'm not sure it's a valid comparison.

Apple sold more iPhone4s than all previous iPhones combined. I expect that iPhone4S will sell even more. Releasing Siri as a beta to such a large customer base is not the same as releasing the "somewhat-lacking version 1" iPhone, and then iterating it forward.

I really don't get this notion of a lot of Apple supporters that just because they put something out and call it AI, Apple has now solidly invented AI. Weren't they usually praised for not announcing things before they are ready? According to your theory, with Siri they would have broken the rule. The current Siri would merely be an announcement that one day they would deliver real AI, and in the meantime they would deliver the current broken version.
In 5 years it might add to something, but I don't think people should be buying iPhone 4S because of it right now, when they're probably only going to use it in the first week.
What about the feasiblity of using voice commands in a public environent, e.g. at work or on a train? I'm interested in hearing how people deal with that.
That's definitely another issue too. My wife has specifically said that she doesn't like to do that, even in semi-private (e.g. work), because she doesn't necessarily want to say stuff about her private schedule out loud, or broadcast her next text message, or even who she's calling.
As you may already know, you should be able to hold the 4S up to your ear, so at least it looks like you’re talking to another human, and Siri won't speak aloud. (I have not actually used a 4S, so my apologies if I'm mistaken.)
This doesn't change the fact that she has to talk out loud. I don't like to take personal calls at work, either. I usually go out into the hall or somewhere more private. I don't think this is rare behavior.
It doesn't talk out loud if you hold the phone up to your ear. To use Siri with minimal noise (aside from your own speech):

Push the power button to turn on the screen. Hold the phone to your ear. Wait for the be-beep sound, played quietly through the ears-only speaker. Speak. Keep the phone against your ear. Siri will respond through the ears-only speaker.

That's exactly what the parent said he doesn't want to do.
You're correct. I tried this with my girlfriend's 4S this evening—if the screen is on and you hold the phone to your ear, you will hear the ding prompting you to start speaking.
Battery life reportedly improves when this feature is turned off, due to Siri's constant monitoring of the accelerometer.
Minor nitpick; it's the proximity sensor which is used for this function, not the accelerometer.

I can verify that my iPhone 4S is polling the proximity sensor continuously, activating Siri whenever the phone is brought to the ear, while my 3GS only appears to poll the proximity sensor during phone calls.

[Edit] After further testing, it does seem to be a combination of gyroscope and proximity sensor, as you do need to perform the motion of bringing the phone to a position with its front facing somewhere between sideways and upwards and its bottom somewhere between straight down and sideways on its edge while covering the upper portion of the device to activate Siri. It does not work if you're laying down horizontally on your back or on your side, nor when you bring it to a straight vertical for example. It seems to have a fairly narrow tolerance for the angle to which the phone must be brought in order to activate Siri, that being the angle at which you naturally hold a phone to make a call while sitting or standing straight up. [/Edit]

I'm pretty sure the accelerometer has to be monitored continuously, with our without Siri. After all, how do you think geo-fenced reminders work? The mostly-low-powered accelerometer can wake up power-hungry location services to see if you've left your geofence. There's no reason to have the GPS running all the time (and your battery life would suffer significantly from that).
Why would it monitor constantly, and not just when you are using it, like how the phone behaves during a call with the face-disables-touchscreen?"
They thought of that. You can put it up to your head like you're on a phone call. You don't feel self-conscious talking on the phone in public do you?
While on the bus, I don't call my personal assistant and ask her to change my gynecologist appointment, using carefully paced, clear language.
Can't you relabel things with Siri? I.e. at home say "Siri, when I say 'water polo' I mean gynecologist" (replace 'water polo' with a sport you would never actually play under any circumstances, otherwise you could end up with a big surprise at some point). Then you can have siri change your water polo appointment on the bus without issue. :)
IBM: Watson.

Now, if Apple could buy IBM...

With enough commands in its library, Siri will not only be indistinguishable from real AI, it will be intelligent (please see also the behaviorist view of AI, Descartes' language test and the Turing Test).
Aside from the fact that this has been tried and not yet worked (see Lenat's Cyc, for example), having to use the Turing Test as a measure of sapience is an artifact of having no idea what sapience is. If we knew (or, once we know) what algorithm produces consciousness, we can test for that directly, and it seems unlikely to me that a giant lookup table will have an internal experience of consciousness in the way that I do.
Whens my next dentist appointment works for me for some reason, but I agree when Siri fails, she fails spectacularly and it's jarring.