| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by westajay 5332 days ago

When I wrote my comment I wasn't thinking of lots of up-front planning. I was thinking along more simple lines like root cause analysis using a human factors or equipment taxonomy (much more affective then 5-whys).. and simple logging of incidents for later analysis.

I think some of these kinds of processes can be adopted with small investments in training and change.

Also, a lot of these kind of failures seem to stem from changes at the networking layer.. which should be more planned and tested given their place in the stack (we're not talking about crazy app behaviour).