Hacker News new | ask | show | jobs
by adbachman 1469 days ago
We just picked them up across an engineering/tech org of ~350 to do precisely what they describe here.

PagerDuty for notifications and on-call rotations, Datadog for monitoring, Slack for communication in-the-moment, Google Docs for post-mortem documentation; Blameless as the glue and automation that takes away a lot of the incidental mental overhead of communicating and documenting while the incident is happening.

Super encouraging to see competition, though. A former teammate turned me on to https://how.complexsystems.fail/ and I'm willing to believe that in a complex enough system, the closest we will get to understanding how it actually works is during/after incident response.

1 comments

One of my favourite sites!

And that is great, we know plenty of happy Blameless customers, they're certainly one of the better ones in the market we compete against.