Everyone seems to forget that Watson would never have answered Toronto if it didn’t have to. It wasn’t at all confident, you can’t even really say that it made a mistake. It just didn’t know the answer and was – correctly so – very sure that it didn’t know the answer.
Excellent point. But that "Toronto" was its highest guess makes you wonder if it could confidently select something that exploits whatever errors led to Toronto. And if you asked it a medical question, would it say "I don't know?" if it wasn't confident or would it give an answer anyway (along with a confidence level that might be ignored or rationalized away)?
My concern about its utility, and I read they would like it to answer medical questions, is that
Watson's performance reminded me of chess computers. They play fantastically well in maybe 90% of
positions, but there is a selection of positions they do not understand at all. Worse, by definition
they do not understand what they do not understand and so cannot avoid them. A strong human Jeopardy! player,
or a human doctor, may get the answer wrong, but he is unlikely to make a huge blunder or category error--
at least not without being aware of his own doubts. We are also good at judging our own level of certainty.
A computer can simulate this by an artificial confidence measurement, but I would not like to be
the patient who discovers the medical equivalent of answering "Toronto" in the "US Cities" category,
as Watson did.
I would not like to downplay the Watson team's achievement, because clearly they did something most
did not yet believe possible. And IBM can be lauded for these experiments. I would only like to wait
and see if there is anything for Watson beyond Jeopardy!.
If IBM wants to fix the "Toronto" problem, have at it. But those sorts of "embarrassing" errors could be quite costly in medical situations. During the show they showed Watson's progression from really stupid answers very frequently to less frequently, which makes me personally believe their fundamental process is flawed (not necessarily irreconcilable) and their current algorithms are just a bunch of hacks thrown together on top of Google rather than something more sophisticated like Wolfram Alpha.
Watson was not confident in that answer - only 30% (http://asmarterplanet.com/blog/2011/02/watson-on-jeopardy-da...). Had that been a normal question, it wouldn't have buzzed in. It only answered because Final Jeopardy is the only time when not answering and answering incorrectly have the same penalty.
> but I would not like to be the patient who discovers the medical equivalent of answering "Toronto" in the "US Cities" category, as Watson did.
Surprise, that kind of mistake happens far too frequently in the medical field now.
Why is Kasparov commenting on something so far out of his recognized area of expertise relevant anyway? I don't go to Knuth for advice on chess, nor Hawking for snarky banter on economics, etc. (Although if I had access to either of those 2, I might try it.)
Would Watson make that mistake happen more or less often? (Bringing in Watson can lead to blindly trusting or blindly ignoring the "stupid computer" depending on the doctor; seems like a problem with doctors rather than a lack of tools?)
> Why is Kasparov commenting on something so far out of his recognized area of expertise relevant anyway?
Isn't the asking obvious? (I won't comment on the relevance; people do and read many irrelevant things every day.) People asked for his thoughts 'cause he got beat by IBM's Deep Blue and he's had a lot of experience with computers in their relationship with chess (specifically combining humans and computers to make really strong opponents). People also asked for Ken Jennings' thoughts and AI isn't his expertise. And people recently asked Hawking for his thoughts on aliens...
Part of 'the "Toronto" problem' is simply the format of the challenge. In Jeopardy, Watson can give at most one response, in that particular situation exactly one. In a medical diagnosis situation, Watson's responses wouldn't be so constrained. He could give a list of 20 possibilities, with confidence margins for each one, and even as far as a list of possible additional tests designed to favor one possible diagnosis over another. This sort of information, utilized by a competent doctor, has a far, far smaller potential for disaster than the "what is a lobotomy?" scenarios that people are scared of.
My point is that the "Toronto" error is constantly cited as proof that Watson is fundamentally flawed, when it's actually a fairly reasonable bug if you understand the process it goes through to reach the answer- It's just seen as a stupid answer because it misses a key filter that humans would pick up.
In the medical case, it's actually better for the answer to be obviously, embarrassingly wrong than slightly wrong. Like the other commenter said, people aren't going to be getting amputations for headaches just because Watson says so. There's much more danger in something like prescribing medications with a fatal interaction, something that a hypothetical "Dr. Watson" would pick up.
there is a selection of positions they do not understand at all. Worse, by definition they do not understand what they do not understand and so cannot avoid them.
This is almost certainly true for humans too in terms of general problems rather than specifically chess. There are probably concepts which we have so little understanding and comprehension of that we can't even see our own ignorance. Rumsfeld's known unknowns.
I didn't up or down vote you, but meta-edits about downvotes seem to either get downvoted to oblivion because of perceived whining or upvoted a lot out of a perceived injustice to the downvote. Also I think your comment is rather condescending. "You think Windows sucks? How about you build something better?"
Only added after I watched it go up and down twice. So it's not just the meta-whining, apparently several people found it highly offensive on content alone, and I was mystified as to why.
Thanks for the outside input regarding the Windows comparison, I don't think that's quite the same as this, though. The people talking down Watson aren't saying it sucks as much as they're saying it's trivial. Vista sucked but nobody would have called it trivial or inconsequential.
The problem is that the whole event was orchestrated to showcase IBM. Jeopardy didn't offer an open call. There's been no series of open competitions in Jeopardy-style trivia, as there was with gradually-improving chess computers.
Instead, IBM wanted a forum to show off its multi-million-dollar QA technology, and approached Jeopardy. (They may have also, though I haven't seen definitive information either way, offered Jeopardy promotional payments.) IBM then spent 3+ years optimizing for the Jeopardy domain. (In the Reddit QA, the Watson team answered: "At this point, all Watson can do is play Jeopardy and provide responses in the Jeopardy format.")
And in the matches, Watson dominated on one dimension of Jeopardy play – quickly pressing a button after a light goes off – that's the least interesting technical challenge. (Yes, it's an important part of any champion's skills, but a machine would have won that button-pressing competition 50 years ago, so it obscures rather than highlights any other 'breakthroughs' Watson may represent.)
While impressive in several dimensions, and drawn from much deeper research by IBM, the only thing we can say for sure about Watson is that it was a "Horse for the Course" in Jeopardy. And unfortunately, no other computer horses were invited to play, and offered the same prizes (in money and fame).
I suspect, now that the pattern has been set, we'll see leaner teams showing they can do as well or better than Watson with far less funding/hardware, over the next few years. Still, in the popular imagination, these efforts will live in the shadow of Watson, when a fair competitive process might have given them a chance to upstage Watson.
Quickly pressing a button after a light goes off is pretty unimpressive. Figuring out the answer to the question and measuring your confidence in order to decide whether to press the button is impressive.
Agree on your suspicion. Simply quartering the cost of memory and copying the approach from the paper with some home-grown improvements will get people ahead of IBM and probably inside IBM's decision loop so they're permanently ahead. But plowing something the first time is often the hardest. These weren't dumb people working on this thing for 3+ years.
I actually told my wife (who knew the correct answer - Chicago) half jokingly that perhaps it was the sportsmanship built into Watson to throw away an easy answer once in a while, because he was on a hot streak up to that point.