us-east-1 is the region that has most issues. Every time you hear about an AWS outage it's typically just us-east-1, and it brings down half the internet. It's really puzzling why everyone keeps hosting their projects in us-east-1.
The advantage of that region is that it tends to get new instance types and services first. And if you need to be on the bleeding edge, then you have to agree to deal with some risks. But for everyone else that doesn't need the bleeding edge, why not just run your instances in us-west-2 or us-east-2 for example? I've run services in us-west-2 for years and I've never had to deal with fallout from an AWS outage.
Because we want to be colo with everybody else that's on us-east-1. It makes for lower latency, higher transfers, and when shit hits the fan none of your customers notice that your SaaS product is down because they can't access their entire site!
> It's really puzzling why everyone keeps hosting their projects in us-east-1.
I've been using Amazon for 4 years, and this is the first time I hear their us-east-1 is more "cutting edge" and less reliable than all the other zones. This isn't even listed on their page describing these zones: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-reg...
The AWS Global Infrastructure guide to all the Regions is pretty informative. When you look at the table here[1], you realize pretty quickly that every idea that AWS has ever had gets an initial deployment in Northern Virginia (us-east-1).
It's also one of the reasons why so many people continue to use it. If you really need one of these services for your infrastructure, then your very likely going to be stuck using us-east-1. It may be quite some time before you get a 2nd region.
Of course it's not listed. They're not going to say "this is our first region and it runs on some pretty old hardware and hacked together configurations so we recommend you don't use it."
I agree its "common knowledge" but is it true? Or is it just a case that because 80+% (made up) of everyone hosts in us-east-1, thats the only one we hear about in HN (and other sites) threads?
I go by the outage reports published by AWS. I haven't done an exhaustive analysis but there is a trend to support the common knowledge. I spend most of my time in us-west-1 and us-west-2 and very few if any outages over the past 3 years.
I think you answered your own question. In the case of AWS, sometimes being "bleeding edge" means you're allowing amazon to scaffold infrastructure with a new service that you'd otherwise have to deal with yourself. For a small company, an AWS service being bleeding edge is still a lot more resilient than doing it yourself.
I doubt that the vast majority of customers in us-east-1 are using bleeding edge services. By the time you figure out how to integrate some new AWS offering into your infrastructure it has probably been rolled out to the other regions and the kinks worked out. Unless you need latest and greatest GPU offerings for DL, in which case maybe.
Pricing between us-east-1 and us-west-2 has been largely the same in my experience. us-west-1 had a price premium, presumably due to higher costs in California.
"04:36 PM PDT We are investigating network connectivity issues for some instances in a single Availability Zone in the US-EAST-1 Region.
04:58 PM PDT We can confirm that some instances are unreachable and some EBS volumes are experiencing degraded performance in a single Availability Zone in the US-EAST-1 Region. Engineers are engaged and we are working to resolve the issue.
05:05 PM PDT We have identified the root cause and are beginning to see recovery for instances and EBS volumes in the affected Availability Zone in the US-EAST-1 Region. We continue to work toward full resolution.
This is exactly why you should design your services to run in multiple availability zones to mitigate issues like this. We run our most critical services in at least 3 availability zones and we are moving the rest of our services soon as well. While these problems are unfortunate, it is part of relying on Amazon to manage resources.
Always plan for service degradation and look for ways to mitigate against issues like this.
They randomise the AZ letters, because humans tend to shove things in 'a' before anywhere else - it's a psychological load-balancer.
If you want to figure out which of your AZs corresponds to another account's AZs, you can compare spot-prices, which are individual per AZ. Also, for some reason, my account doesn't have a 'b', just a-c-d-e. Weird.
Interesting that the idea is to distribute load across the zones (because people default to using A), but Google Cloud zones are uniform across all accounts and it seems to work for Google.
I heard that aws actually scrambled availability zones... so your 1a might be my 1c, etc.. Haven't confirmed it, but the comments seem to bear that out.
Cronitor saw our customers first impacted at 4:29, largely recovered by 5:04.
Another interesting thing is that AZ identifiers are randomized from customer-to-customer so when people report their failures, like here in this thread, it can sometimes seem like a problem is region-wide when in fact it's isolated to an AZ.
I don't follow your point about randomized AZs. I agree they are randomized and have observed and correlated region letters across multiple customers/accounts. How does that make it seem like a problem is region wide though?
The advantage of that region is that it tends to get new instance types and services first. And if you need to be on the bleeding edge, then you have to agree to deal with some risks. But for everyone else that doesn't need the bleeding edge, why not just run your instances in us-west-2 or us-east-2 for example? I've run services in us-west-2 for years and I've never had to deal with fallout from an AWS outage.