Hacker News new | ask | show | jobs
by packetslave 1429 days ago
Update from AWS: they lost power to (part of?) a single DC in the use2-az1 availability zone.

10:25 AM PDT We can confirm that some instances within a single Availability Zone (USE2-AZ1) in the US-EAST-2 Region have experienced a loss of power. The loss of power is affecting part of a single data center within the affected Availability Zone. Power has been restored to the affected facility and at this stage the majority of the affected EC2 instances have recovered. We expect to recover the vast majority of EC2 instances within the next hour. For customers that need immediate recovery, we recommend failing away from the affected Availability Zone as other Availability Zones are not affected by this issue.

2 comments

Interesting to see it's been a loss of power that caused this. Usually the better datacenters have multiple levels of power redundancy including emergency backup generators.
It depends entirely on how AWS architected their power redundancy. Given that the outage affected a portion of one DC in one AZ, we can make some assumptions, but the truth is we just don't know.

It could be that their shared-fate scope is an entire data hall, or a set of rows, or even an entire building given that an AZ is made up of multiple datacenters. I don't know that AWS has ever published any kind of sub-AZ guarantees around reliability.

Datacenter power has all kinds of interesting failure modes. I've seen outages caused by a cat climbing into a substation, rats building a nest in a generator, fire-fighting in another part of the building causing flooding in the high-voltage switching room, etc.

Our best was a bird landing on a transformer up on a pole. Installed a fake Eagle after that.
Given the scope of the effort invested in attempting to prevent duck and goose crap on the world's docks, I'm skeptical that this tactic is effective.
Shrug... the datacenter is land locked (different animal species) and the problem hasn't happened again in multiple years.

I think you're taking the Eagle a bit too seriously though... if we didn't do anything how would we know? It isn't like this was an expensive thing to try out.

OK. It's just that I am one of those people who have tried to solve the duck/goose problem and would be delighted if a fake eagle or owl worked.
Insert clip of O'Brien explaining to cardassians why there are backups for backups
In case anyone is unaware of the reference, that’s taken from Star Trek Deep Space 9 https://youtu.be/UaPkSU8DNfY
Ya, am I surprised by this too. Like, you have one job, keep the power on.
I thought that AWS availability zones were intentionally not canonically named to prevent everyone from adding stuff to AZ “A”. So my us-east-1 zone “A” might be your “B”.

But that system breaks down here when you need to know whether you are in an affected zone. Is there a way to map an account’s AZ name to the canonical one which apparently exists?

They gave up on that, now there's an extra "zone ID" you can read that maps to an absolute address. They used to be extremely cagey about giving out those mappings for your account.

Examples about how these relate: https://stackoverflow.com/questions/63283340/aws-map-between...

I'm also pretty sure that GCP's identifiers are absolute (and this time, throughout) as well, since their documentation (which renders the same in incognito mode or whatever) makes reference to what zones have what microarchitectures and instance types.

The mapping from AZ Name (account-specific) to AZ ID (global) shows up on the EC2 overview page in the dashboard.
This is true, at the AWS account level. us-east-2a for my account may map to the internal use2-az1, but in your account us-east-2a may map internally to use2-az2.