Hacker News new | ask | show | jobs
by sp821543 3178 days ago
http://web.mit.edu/2.75/resources/random/How%20Complex%20Sys...

7) Post-accident attribution accident to a ‘root cause’ is fundamentally wrong. Because overt failure requires multiple faults, there is no isolated ‘cause’ of an accident. There are multiple contributors to accidents. Each of these is necessary insufficient in itself to create an accident. Only jointly are these causes sufficient to create an accident. Indeed, it is the linking of these causes together that creates the circumstances required for the accident. Thus, no isolation of the ‘root cause’ of an accident is possible. The evaluations based on such reasoning as ‘root cause’ do not reflect a technical understanding of the nature of failure but rather the social, cultural need to blame specific, localized forces or events for outcomes.

3 comments

Damn right. Security forensics should operate more like the NTSB. There are policy, cultural, process, organizational, team and more factors to consider in the totality of MECE-like structured forensics with hopefully a report and recommendations at the end. Political or timid audits aren’t useful in correcting deficiencies wherever they may exist if they jump to a narrow conclusion too quickly.
That quote sounds good, but I don't think it's necessarily applicable to this situation. The author seems to be talking about complex systems that are designed and operated to be robust against failure, like the space shuttle. Saying that Challenger blew up because of an O-ring is technically correct but also horribly wrong, as an example. Equifax IT does not appear to be operating at a level to prevent a single failure from causing terrible damage all on its own.

That aside, it's hardly true that one person can bear all the blame for not patching their systems, even if they did successfully prevent patches from happening. For one thing, how the hell did they keep their job after doing that? Unless it was the CEO (well, now that they have a new CEO maybe they'd like to put all the blame on him), there was someone up the chain who could insist that the patch get applied. I think you definitely could apply root cause analysis techniques here, and I strongly suspect that such analysis would uncover numerous serious deficiencies in Equifax's IT operations. Of course, guessing that a large boring corporation has terrible IT practices is similar to guessing that a given duck quacks and has wings, so there's that.

> Equifax IT does not appear to be operating at a level to prevent a single failure from causing terrible damage all on its own.

they're operating at a level where over a hundred and thirty million people could have their ability to get a mortgage, open a bank account, or start a business harmed. If you think that such responsibility does not mandate the highest requirements for data safety, you should not work in this industry.

> That quote sounds good, but I don't think it's necessarily applicable to this situation. The author seems to be talking about complex systems ...

Companies, and the people, teams, and processes that those companies are comprised of, are complex systems in the manner the paper is discussing.

Great quote. Always worth mentioning the book "A field guide to human error". A quote from that book:

> Throwing out the Bad Apples, lashing out at them, telling them you are not happy with their performance may seem like a quick, nice, rewarding fix. But it is like peeing in your pants. It gets nice and warm for a little while, and you feel relieved. But then it gets cold and uncomfortable and you look like a fool.

Explanation of why the bad apple theory doesn't work - https://goo.gl/LPKMns