Hacker News new | ask | show | jobs
by ethbro 2616 days ago
The difference is the expectation of fallibility with human docs. Which is why checks are built into the system.

I have zero trust in IBM to market their ML products correctly so that proper checks are maintained.

Especially since technically, explainability is still an active area of ML research.

2 comments

>I have zero trust in IBM to market their ML products correctly so that proper checks are maintained.

That's a problem, yes, but it's not a new one. Vendor management has been around for decades, centuries, millennia maybe. If I contract out part of my job, it's still my responsibility to make sure the contractors are doing their job right. "But their marketing said..." or "but their sales guys said..." is not an excuse and everyone knows that.

Doctors noticing that Watson is wrong is expected. Doctors missing the fact that Waston is wrong is a failure of that doctor and the doctor who didn't check the results is the responsible party. The checks don't come from Watson, the checks come from humans who oversee Watson.

If Watson is wrong often enough that it's hindering the doctors, then kicking it out is the right call. But there can never be an argument of "Watson got the diagnosis wrong and that's why the patient died" because ultimately IBM is still just a vendor and Watson is still just a contractor.

The difference is like:like vs like:unlike, and seems to be one of the more dangerous ML application challenges.

If I as a medical provider hire a remote vendor, who has medical teams in India look over initial results to flag issues, those humans will fail in human ways. I can anticipate that: I'm a human.

If I use a similar ML product, it's very difficult for me to anticipate (or even understand) the ways it which it might / does fail. Which makes it unlike my previous experience. Which gives it a fundamentally different risk profile.

It's the Boeing issue in a nutshell: the failure case that unfolded was unlike the scenarios the pilots were trained for. Unfortunately, in the two crashes they were unable to dynamically RCA quickly enough to solve the problem.

My point was that coupled with IBM's inept and inaccurate marketing, it seems unlikely the appropriate risk information is in the hands of those responsible for managing risk.

And honestly, if a system has unlimited failure modes, and I can't learn and limit them in practice, it's useless.

Because in that scenario I should be duplicating all the work it's done to ensure it didn't go off the rails. In practice (and guided by labor cost savings promised in the contract signed with management), that full verification doesn't happen (because the vendor is incentivized to recommend it doesn't), and people die.

It's a good point that it's a failure of management. It almost always is in situations like this.

I design and deploy customized automation systems for customers, and it's part of the standard process that we run the automation side-by-side with the old process for several months in order to learn the new failure methods and synchronize the process. Yes, for a few months we're duplicating the machine's work, but without the machine we'd be doing the work anyway. And no one is going to die if my automation fails, but we still do this anyway. It's crazy to think anyone would believe they didn't need to do side-by-side verification no matter what sales and marketing told them.

I don't know enough about Watson or IBM sales to say if Watson is good or bad, but I'm not trying to defend Watson or IBM. Watson may very well be a complete failure. But that aside, it's not the only failure in this story. No one should expect to implement a new tool and never verify if it's working correctly.

> Doctors noticing that Watson is wrong is expected.

Well as a doctor if I’m still ultimately responsible then nothing has fundamentally changed, this is just another tool, possibly one I’ll be forced to use by someone with not one day of medical training.

And medicine in the US is in a precarious position. Software engineers are generally not licensed, and they’re not sued (it’s virtually unheard of). It’s different for doctors. There’s a complex dynamic of removing autonomy from providers to ostensibly improve outcomes (and revenue cycle) while also still holding those same personally liable.

So much of what I do is indirectly dictated by wonks in IT and billing, but they have virtually no liability. And one wonders why burnout in medicine continues climbing with no end.

>Well as a doctor if I’m still ultimately responsible then nothing has fundamentally changed

If your stethoscope fails and you mistakenly pronounce the patient dead because of that, who is responsible? Do you blame the stethoscope salesman for claiming it's an accurate medical instrument that you never need to second-guess? Do you stop using stethoscopes altogether because they're "just another tool"?

If your x-ray machine fails and you say the patient's leg isn't broken because of that, who is responsible? It's "just another tool", do you stop using x-ray machines?

If your physician's assistant measures the patient's blood pressure wrong and you never double check their work, do you fire all of your PAs? And go back to seeing every patient for every procedure yourself?

Everything you have is "just another tool" and as with any tool, it's up to the human doctor to interpret the output. The idea is tools make you faster and more accurate, but everyone knows tools fail so you need to be able to double check their work. If the tool is consistently inaccurate, sure, throw it away. But if your argument is "if the tool can't completely replace me it's worthless" I think you're selling yourself a little short there.

Of course you're ultimately responsible. Your stethoscope didn't go to college for 8 years, it's just there to make your job easier.

> the expectation of fallibility with human docs

Not just the expectation but the understanding. A doctor might very well forget which leg to amputate, so we know to Sharpie "NOT THIS LEG" on the one being kept. But a doctor is very unlikely to see a patient with a broken wrist and prescribe antipsychotics, so we don't do much to prevent that error. Human fallibility happens along fairly predictable channels, and we've spent a very long time committing resources to controlling those channels.

Watson, though, thought Toronto is a city in the USA. Anyone who's dealt with ML output knows that the errors are often quite surprising, even before dealing with adversarial inputs. Even in a system where Watson's outputs are subject to checks, the checks we have today are human-specific and developed at a significant human cost. ML answers can't just outperform individual human doctors to add value, they need to either be gracefully integrated with them or be able to outperform the entire system which keeps those doctors on track.