Hacker News new | ask | show | jobs
by Nextgrid 1205 days ago
Keep in mind that the "cloud" relies on an opaque control plane with undocumented failure modes (that sometimes even the provider does not know).

Just because you tick a checkbox doesn't mean it'll actually work as planned, and unlike infrastructure within your control that you can actually test (pull the network or power cable from a live server if you need to), you can't simulate a cloud provider outage.

> multi-AZ or even multi-region is much more straightforward than renting another rack somewhere.

Assuming that enough of the AWS control plane is alive to actually allow you to login and administer the services in your backup region.

Furthermore, cloud providers are their own businesses and are constantly in motion (introducing new features, etc). That's good for their business but bad for yours, as it means they might be doing risky changes that could affect you should it go wrong.

1 comments

Exactly. I run a large enterprise service in a single datacenter with 5 years of 100% uptime. Our design goal is 99.97% measured monthly.

We have that because we have complete control end to end. We made an engineering decision not to have geo-redundancy because many of the dependent services aren’t available that way either.

Because of the compute requirements, running that service in AWS or GCP would cost about 80% more, inclusive of all costs (equipment, labor, utilities, etc)