Hacker News new | ask | show | jobs
by ethbro 2619 days ago
The difference is like:like vs like:unlike, and seems to be one of the more dangerous ML application challenges.

If I as a medical provider hire a remote vendor, who has medical teams in India look over initial results to flag issues, those humans will fail in human ways. I can anticipate that: I'm a human.

If I use a similar ML product, it's very difficult for me to anticipate (or even understand) the ways it which it might / does fail. Which makes it unlike my previous experience. Which gives it a fundamentally different risk profile.

It's the Boeing issue in a nutshell: the failure case that unfolded was unlike the scenarios the pilots were trained for. Unfortunately, in the two crashes they were unable to dynamically RCA quickly enough to solve the problem.

My point was that coupled with IBM's inept and inaccurate marketing, it seems unlikely the appropriate risk information is in the hands of those responsible for managing risk.

And honestly, if a system has unlimited failure modes, and I can't learn and limit them in practice, it's useless.

Because in that scenario I should be duplicating all the work it's done to ensure it didn't go off the rails. In practice (and guided by labor cost savings promised in the contract signed with management), that full verification doesn't happen (because the vendor is incentivized to recommend it doesn't), and people die.

1 comments

It's a good point that it's a failure of management. It almost always is in situations like this.

I design and deploy customized automation systems for customers, and it's part of the standard process that we run the automation side-by-side with the old process for several months in order to learn the new failure methods and synchronize the process. Yes, for a few months we're duplicating the machine's work, but without the machine we'd be doing the work anyway. And no one is going to die if my automation fails, but we still do this anyway. It's crazy to think anyone would believe they didn't need to do side-by-side verification no matter what sales and marketing told them.

I don't know enough about Watson or IBM sales to say if Watson is good or bad, but I'm not trying to defend Watson or IBM. Watson may very well be a complete failure. But that aside, it's not the only failure in this story. No one should expect to implement a new tool and never verify if it's working correctly.