Hacker News new | ask | show | jobs
by ArchD 1214 days ago
And someone will decide that a 0.0001% probability of an airplane crash caused by a bug is acceptable? Maybe 0.0000001% is more reasonable? In any case, how do you accurately determine the probability of a rare failure of a black box without doing many real experiments with real inputs?
2 comments

That's how safety critical devices are already built, so yes. We have standardized probabilities of failure (e.g. SIL [0]) from the unexpected, because mitigating 100% of risk is somewhere between impractical and impossible.

[0] https://en.wikipedia.org/wiki/Safety_integrity_level

From a quick reading of the wiki, the associated methodology seems rather limited:

"System complexity, particularly in software systems, making SIL estimation difficult to impossible"

"The requirements of these schemes can be met either by establishing a rigorous development process, or by establishing that the device has sufficient operating history to argue that it has been proven in use."

You could prove that normal code satisfies some specs, but you can't do that with neural nets unless the number of possible inputs is tiny. So, the only way to establish that the black box neural net meets some SIL target is through "sufficient operating history".

To clarify, I wasn't offering SIL up as an example of how we should validate ML systems, but instead to demonstrate that "software 1.0" systems are already designed the way GP is questioning. Best practices for applying integrity level concepts to ML is still a topic of active debate right now.
Indeed. Some of the commentators on this article should really examine their assumptions about how reliable and correct “Software 1.0” is.
And what’s a postmortem going to look like?

“Don’t worry, we stuck the flight data recorder in the training set, and rebuilt the model. Should be good to go now”?

We already know what the post mortems look like. AI black boxes will look a lot like declarative programming black boxes. We don't really know how the code runs, we just ask it nicely to do what we want and then stare at the config files and the docs if it doesn't.

Low code and AI are going to have many of the same failure modes. Until someone combines them and then they'll have exactly the same failure modes.