Hacker News new | ask | show | jobs
by slipperyp 4697 days ago
I doubt this is the internal audit of the outage event. At least I hope it isn't, but I could be wrong. The template from Google isn't super uncommon and I bet most organizations that really want to drill in and understand root causes and prevent recurrence use something very similar to it (if the organization is serious about uptime and if they're successful at reducing recurrence).

It doesn't seem crazy to me that Facebook's publicly facing summary of this is as casual as this seems to be. They owned up to breaking their platform and indicate they're taking measures to not do it again. But if the person who's internally accountable for analyzing this and preventing recurrence told me "we're building better tools" without any specifics about those tools, who's got accountability, or the timeline they anticipate putting those in place, I'd say they should pack their bags, so I bet there's a more detailed plan internally. I'm also not a facebook app developer, though, and if I had any revenue depending on not being shut down like this, I might be more frustrated with this either a) poor level of transparency (giving them the benefit of the doubt) or b) poor depth of analysis.

1 comments

> At least I hope it isn't, but I could be wrong.

I used to work at Facebook, and this is most definitely not the internal audit. A lot of Facebook engineers are former Googlers, and bring a lot of the culture and practices with them. You can rest assured that people are hunkering down in a conference room as we speak.

That said, Google's postmortems are a thing of awe and distributed widely within the company.