Hacker News new | ask | show | jobs
by Smerity 1596 days ago
What I find most interesting in this is the pseudo detective story of hunting down disappearing post-mortem and "lessons learned" documentation. Optimistically we'd hope that perhaps the older systems no longer reflect the existing systems in any meaningful way (possibly as the org structures and/or software stacks shift and change) and they're no longer relevant.

I'd imagine most lost knowledge is not an explicit decision however which means such historical scenarios / documentation / ... are just lost as part of business. Lost knowledge is the default for companies.

Twitter is likely better than most given their documentation is all digital and there exist explicit processes to catalogue such incidents. I'd also be curious to see how much of this knowledge has been implicitly exported to their open source codebases.

1 comments

What you've said is, in my opinion, likely to be a difference between the technology companies that become tomorrow's infrastructure and the ones that disappear (even if it takes decades).

As you say, the default tendency in many companies when failures occur is information-loss. That can be attributed to using too many communication tools, cultural expectations that problems should be hidden, silo'd or disparate documentation stores, or lack of process.

Intentional, open, thorough and replicated note-taking with cross-references before, during and after incidents can create radically different environments which allow for querying, recovery and improvement regardless of failure mode(s). Kudos to Dan for moving in that direction with these writeups (and to you for raising the subtext).