Hacker News new | ask | show | jobs
by db48x 1099 days ago
It sounds like none of those things have an owner who is tasked with keeping them up to date and correct. All the work that needs to be done needs to have a specific well—documented owner, otherwise diffusion of responsibility ensures that it will eventually fall through the cracks.
1 comments

Management's job is be "the owner". They are ultimately responsible to make sure that there is no diffusion of responsibility.

In our weekly meetings, recurring problems were identified and fixes implemented. No call was considered completed and closed until all relevant documents had been updated as appropriate. At the yearly review the quality of your documentation was as important as your time to respond, time to fix. That is how mission critical on-call work should be handled.

That's great. I think one of the issue in our process is we use wiki for on-call summary/hand-off notes. That's not ALWAYS very helpful as it has a dependency what engineers add to them. Also time and severity of the alerts make a difference as well. E.g. if they are triggered in the night/unfriendly time the first intuition of the engineer is to fix it and not to make a note or document unless there a easy way to do so. We use PagerDuty and I dont think it provides easy way to make those note or comments. So that leaves it to the engineers who need to do it after the fact. Some teammates do it rigorously where some dont. I think Management's challenge is also they can only push so much as it becomes an attrition risk :(
Yea, that’s not uncommon. Personally I prefer to give each document a specific owner, but either way you do it someone has to be tasked with ensuring that the documentation is correct.