Hacker News new | ask | show | jobs
by partiallattice 1861 days ago
That's an SLA where they expect to end up paying customers back for outages. They've done an admirable job with only a few global outages, but subsets of customers have experienced plenty of outages.

A single 5 min outage would blow through more than a decade of SLO at 6 9s. As far as I'm aware, there does not exist a service that has been up for more than a decade with fewer than 5 min of downtime, and that definitely includes route 53.

1 comments

In order to achieve 6 9s you need a two levels of replication, you need synchronous replication with instant failover, plus asynchronous replication to handle site failover. RonDB provides both. NDB is used in lots of telecom services where you can't make a phone call unless NDB is up. These services definitely can at times run for decades without downtime. RonDB is built on top of NDB.
Data durability and high availability are two different metrics.

iirc, S3 SLAs availability at 99.99; durability at 99.999999999 and 99.99999999999999 for cross-region replicated buckets.

Correct, Availability is the amount of time you are available to read and write data. Durability is the amount of time your data is not lost. The metrics mentioned in this post are about availability. Most cloud vendors provide SLAs of 99.95% availability. One problem to solve when working with a cloud vendor is that they need to upgrade their OS images every now and then, so to get the highest availability one must integrate with the cloud APIs announcing those changes.