|
|
|
|
|
by StavrosK
3316 days ago
|
|
I've found that not only is it good to not assign blame in postmortems, but it's also accurate: The culprit usually is the checks and balances, as mistakes will happen, and the goal should be to have failsafes and detection. I'm reminded of airplane accidents: Whenever you hear of an airplane accident, it's always some amazingly crazy series of things going exactly wrong to get the plane to crash. We have a tendency to think "wow, what bad luck", but a better way to think about it is that airplanes are so safe that an accident' can't occur unless a whole series of things go very specifically wrong. A company's goal should be to increase the number of necessary things that need to all go wrong before there is downtime. |
|
While there are always technical causes for larger technical failures, I've seen far too many times RCA post-mortems performed that result in witch hunts instead of a solemn contemplation of how things could be better done by everyone. Such an RCA may ignore that a normally careful engineer was overworked by managers, never is lack of relevant monitoring and testing due to budget cuts cited, and you'll certainly never see "teams X and Y collaborated too much" as a reason for failure in these places. Because in a typical workplace, the company's values and culture are never related to a failure. You can't objectively measure how bad or how good a culture is either. Why make it part of post mortems when you don't think it's a failure?