|
|
|
|
|
by Smerity
1596 days ago
|
|
What I find most interesting in this is the pseudo detective story of hunting down disappearing post-mortem and "lessons learned" documentation. Optimistically we'd hope that perhaps the older systems no longer reflect the existing systems in any meaningful way (possibly as the org structures and/or software stacks shift and change) and they're no longer relevant. I'd imagine most lost knowledge is not an explicit decision however which means such historical scenarios / documentation / ... are just lost as part of business. Lost knowledge is the default for companies. Twitter is likely better than most given their documentation is all digital and there exist explicit processes to catalogue such incidents. I'd also be curious to see how much of this knowledge has been implicitly exported to their open source codebases. |
|
As you say, the default tendency in many companies when failures occur is information-loss. That can be attributed to using too many communication tools, cultural expectations that problems should be hidden, silo'd or disparate documentation stores, or lack of process.
Intentional, open, thorough and replicated note-taking with cross-references before, during and after incidents can create radically different environments which allow for querying, recovery and improvement regardless of failure mode(s). Kudos to Dan for moving in that direction with these writeups (and to you for raising the subtext).