Hacker News new | ask | show | jobs
by myusernameisok 3244 days ago
Especially because AWS regions are broken up into multiple availability zones (data centres in the same area). So taking out a single data centre won't do much if the AWS customers have correctly designed their systems for high-availability (ie having redundant instances in other AZs/regions with their data backed up elsewhere).
4 comments

A single availability zone can be spread across many physical data centres.

According to this rackspace article, the largest AZ has 5 data centres.

https://blog.rackspace.com/aws-101-regions-availability-zone...

Take this for what it is - I'm a software developeer who works in the cloud, not a cloud expert.

The abstraction behind AZs is what every AZ counts for at least 1 data centre. So for every region there is at least 2 AZs, and every AZ means at least 1 data centre (or 5 in your link).

This just means it's harder to take out a whole region by destroying individual data cetnres. Since most regions consist of 2-5 AZ, and AZs consist of 5+ data centres, that means destroying dozens of data centres.

That's a big if. The S3 outage in February this year showed that there are a lot of sites that aren't designed for this.
The S3 outage was across the whole US-EAST-1 region, not just one data centre.

I know my company and a few others that are redundant within a region (ie if one AZ goes down), but not if a whole region goes down.

This all assumes that "taking out" datacenters is a physical/hardware operation.

When you widen the potential attack surface to include software vulnerabilities, unauthorized access, process flaws and other "soft" vectors, a much wider--possibly coordinated--attack that is potentially far more crippling can be imagined.

The question is how load would behave if more than 1 AZ fails for several days. If 2 out of 5 AZs in US-East are down for a week, everyone would distribute to the other 3 (and probably on all three to be sure). I'm not sure if they have enough spare capacity to handle that. The AZ model is designed for single failures and failures that are a few hours, not days.

Of course just speculation, I have no knowledge about DR plans at Amazon.