|
|
|
|
|
by alexsolo
5819 days ago
|
|
We've taken steps to minimize outages as much as possible. The system is distributed across 3 data centers, with fast automatic rollover in case of a data center outage. We've architected the system to ensure we never drop alerts. PagerDuty integrates with monitoring via email or API; if we receive the message on our end, we guarantee you will be alerted. We've had a few incidents where we have delayed sending out the phone call or SMS alert for a few minutes, but we've never dropped an alert. In terms of setting a formal SLA, we haven't done so mainly because we're not sure how to go about implementing this. I've checked the SLAs of a few hosting and cloud providers including AWS, Rackspace, Linode and Slicehost, and I haven't found a compelling example to work from. Some of these guys don't have an SLA (they try their best) and the others give you only a portion of your money back. The whole point of an SLA is to incentivize us to never go down. In our case, we know that if we ever go down, we will lose our customers; that's incentive enough :). Having said that, we may still add an SLA guarantee as part of a larger "enterprise" pricing plan. We definitely plan on adding plugins for all the popular monitoring systems. We've also released an integration API to allow PagerDuty to integrate with any system that can make an HTTP API call (or call a command-line script that can do this). I'm pretty sure Zabbix will work with PagerDuty right now, via the integration API. We'd love to work with you to set this up. Please send me an email at alex@pagerduty.com. |
|