Hacker News new | ask | show | jobs
by kichik 1206 days ago
True. But cloud makes it a lot easier. In some cases it's built-in, like S3. In others it's a checkbox like RDS Multi-AZ. And if you need to roll your own, multi-AZ or even multi-region is much more straightforward than renting another rack somewhere.

I have personally seen Stack Overflow be "under maintenance" or straight up down a lot more than I have seen entire us-east-1 down.

1 comments

Keep in mind that the "cloud" relies on an opaque control plane with undocumented failure modes (that sometimes even the provider does not know).

Just because you tick a checkbox doesn't mean it'll actually work as planned, and unlike infrastructure within your control that you can actually test (pull the network or power cable from a live server if you need to), you can't simulate a cloud provider outage.

> multi-AZ or even multi-region is much more straightforward than renting another rack somewhere.

Assuming that enough of the AWS control plane is alive to actually allow you to login and administer the services in your backup region.

Furthermore, cloud providers are their own businesses and are constantly in motion (introducing new features, etc). That's good for their business but bad for yours, as it means they might be doing risky changes that could affect you should it go wrong.

Exactly. I run a large enterprise service in a single datacenter with 5 years of 100% uptime. Our design goal is 99.97% measured monthly.

We have that because we have complete control end to end. We made an engineering decision not to have geo-redundancy because many of the dependent services aren’t available that way either.

Because of the compute requirements, running that service in AWS or GCP would cost about 80% more, inclusive of all costs (equipment, labor, utilities, etc)