| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bArray 3567 days ago
	Was this incident really recorded minute by minute or is that made up? I've noticed a lot of companies that give this kind of detail like to give a minute by minute report, I just don't understand how they get that accuracy?

6 comments

gjtorikian 3567 days ago

Oh, man. Most definitely that's real.

If you're working in Slack or chat, you've got a minimum of half a dozen people typing and putting out suggestions and offering to investigate something. That's all time stamped. And even if you're not doing that real-time, you may be using something like a GitHub issue to discuss the problem via comments, which are also time-stamped.

No one at the moment of the incident is probably going "Ah, it's 8:01, better write down that I identified the problem." It's most likely "hay I think I got it one sec" and then that works. Or doesn't. But hopefully it does.

link

jwatte 3567 days ago

Yes, slack and irc time stamps is common. Ideally your shell and auditing gives you that for commands, too!

link

stephengillie 3567 days ago

It's from details gathered from tickets and chat history, customer reports and server logs. My team is developing a set of tools to manage our incidents, and automating the gathering of details like this are central to the reporting element.

link

gbin 3567 days ago

As a tool for that, chatops is pretty cool because you can easily record your conversations but also your actions.

link

jon-wood 3567 days ago

Generally it's not recorded minute by minute in the moment. When I write post mortems like this I'll piece together the timeline after things have calmed down through a combination of metrics, logs, and the ongoing discussion that takes place on Slack. To assist in that I'll tend to have a running commentary of what I'm doing in Slack even if I'm the only engineer dealing with the incident, it helps putting the timeline together later, and also means other people coming to see what's up and offer help can get caught up without interrupting.

link

dcosson 3567 days ago

Often one person will be in charge of taking notes while the rest diagnose (using things like server logs or email timestamps to get these times as precise as possible). Not just for the post mortem, it can be very helpful in figuring out what happened, making sure the timing of events plausibly lines up with your hypothesis, extrapolating based on the length of a particular part of the incident to decide what to do next, etc.

link

dgcoffman 3567 days ago

We reconstruct history from timestamps in Slack and our logging and monitoring systems.

link

beachstartup 3567 days ago

they probably just look at the chat history and wrote a timestamped summary narrative.

judging from the number of 'sorry's in the text, seems like post mortems have been slowly adapted into a very specialized form of semi-fictional stage drama in which the audience is pandered to excessively through the use of hyperbolic apology.

link