| These sort of deeply apologetic and hyper-transparent post-mortems have become commonplace, but sometimes I wonder how beneficial they are. Customers appreciate transparency, but perhaps delving into the fine details of the investigation (various hypotheses, overlooked warning signs, yada yada) might actually end up leaving the customer more unsettled than they would have been otherwise. Today I learned that Asana had a bunch of bad deploys and put the icing on the cake with one that resulted in an outage the next day. This is coming from someone who runs an ad server - if that ad server goes down it's damn near catastrophic for my customers and their customers. When we do have a (rare) outage, I sweat it out, reassure customers that people are on it, and give a brief, accurate, and high level explanation without getting into the gruesome details. I'm not saying my approach is best, but I do think trying to avoid scaring people in your explanation is an idea. |
They require us to actually do the work of identifying the issues and writing up what happened and why. I realize that having a customer contract to do this shouldn't be a requirement but human psychology is funny thing. I can turn to my pm and say "I have to do this it's part of the contract" and they immediately back off.
I agree it might not be the best solution but it's definitely better than not doing them.