Hacker News new | ask | show | jobs
Ask HN: Anybody else EC2 issues on us-east-1?
42 points by oliverfriedmann 3245 days ago
Originally just us-east-1d was not reacting well, but now most EC2 instances don't do much anymore.
15 comments

us-east-1 is the region that has most issues. Every time you hear about an AWS outage it's typically just us-east-1, and it brings down half the internet. It's really puzzling why everyone keeps hosting their projects in us-east-1.

The advantage of that region is that it tends to get new instance types and services first. And if you need to be on the bleeding edge, then you have to agree to deal with some risks. But for everyone else that doesn't need the bleeding edge, why not just run your instances in us-west-2 or us-east-2 for example? I've run services in us-west-2 for years and I've never had to deal with fallout from an AWS outage.

Because we want to be colo with everybody else that's on us-east-1. It makes for lower latency, higher transfers, and when shit hits the fan none of your customers notice that your SaaS product is down because they can't access their entire site!
> It's really puzzling why everyone keeps hosting their projects in us-east-1.

I've been using Amazon for 4 years, and this is the first time I hear their us-east-1 is more "cutting edge" and less reliable than all the other zones. This isn't even listed on their page describing these zones: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-reg...

The AWS Global Infrastructure guide to all the Regions is pretty informative. When you look at the table here[1], you realize pretty quickly that every idea that AWS has ever had gets an initial deployment in Northern Virginia (us-east-1).

It's also one of the reasons why so many people continue to use it. If you really need one of these services for your infrastructure, then your very likely going to be stuck using us-east-1. It may be quite some time before you get a 2nd region.

[1] https://aws.amazon.com/about-aws/global-infrastructure/regio...

Of course it's not listed. They're not going to say "this is our first region and it runs on some pretty old hardware and hacked together configurations so we recommend you don't use it."
It's pretty common knowledge that us-east-1 is the most unreliable region. Also the oldest which helps explain why.
I agree its "common knowledge" but is it true? Or is it just a case that because 80+% (made up) of everyone hosts in us-east-1, thats the only one we hear about in HN (and other sites) threads?
I go by the outage reports published by AWS. I haven't done an exhaustive analysis but there is a trend to support the common knowledge. I spend most of my time in us-west-1 and us-west-2 and very few if any outages over the past 3 years.
I think you answered your own question. In the case of AWS, sometimes being "bleeding edge" means you're allowing amazon to scaffold infrastructure with a new service that you'd otherwise have to deal with yourself. For a small company, an AWS service being bleeding edge is still a lot more resilient than doing it yourself.
I doubt that the vast majority of customers in us-east-1 are using bleeding edge services. By the time you figure out how to integrate some new AWS offering into your infrastructure it has probably been rolled out to the other regions and the kinks worked out. Unless you need latest and greatest GPU offerings for DL, in which case maybe.
Only two regions have good availability for AWS services.

For example, email (SES). Only Virgina and Oregon. And Oregon goes down too.

If all you needed was VPS, you'd probably find better offerings elsewhere.

us-east-1 pricing (for EC2, S3, etc.) is usually lower than other regions
Pricing between us-east-1 and us-west-2 has been largely the same in my experience. us-west-1 had a price premium, presumably due to higher costs in California.
check https://phd.aws.amazon.com/phd/home?region=us-east-1#/event-... when this happens...

As of now it says for me

"04:36 PM PDT We are investigating network connectivity issues for some instances in a single Availability Zone in the US-EAST-1 Region.

04:58 PM PDT We can confirm that some instances are unreachable and some EBS volumes are experiencing degraded performance in a single Availability Zone in the US-EAST-1 Region. Engineers are engaged and we are working to resolve the issue.

05:05 PM PDT We have identified the root cause and are beginning to see recovery for instances and EBS volumes in the affected Availability Zone in the US-EAST-1 Region. We continue to work toward full resolution.

"

Had a similar issue earlier today with Rackspace's version of EBS.
This is exactly why you should design your services to run in multiple availability zones to mitigate issues like this. We run our most critical services in at least 3 availability zones and we are moving the rest of our services soon as well. While these problems are unfortunate, it is part of relying on Amazon to manage resources.

Always plan for service degradation and look for ways to mitigate against issues like this.

Some services, like AWS Redshift do not allow multi-AZ deployments.

Not very helpful when Redshift didn't have a single note in their status page for 30+ minutes after it went down.

Any bets on whether the status page will show up as Red for EC2/Redshift tomorrow? I'll take 100 to 1 odds for $1 that it won't be red.

Yes: https://status.aws.amazon.com/

us-east-1b seems to be affected for me.

4:36 PM PDT We are investigating network connectivity issues for some instances in a single Availability Zone in the US-EAST-1 Region.

I'm having problems in us-east-1c
The AZ names are different for each account.
Thanks! did not know that.
Interesting! Do you know anywhere I can read more about this?
Wow, how did I not know this? Thanks for the info.
They randomise the AZ letters, because humans tend to shove things in 'a' before anywhere else - it's a psychological load-balancer.

If you want to figure out which of your AZs corresponds to another account's AZs, you can compare spot-prices, which are individual per AZ. Also, for some reason, my account doesn't have a 'b', just a-c-d-e. Weird.

They also introduced us-east-1f, and it's the same for everyone because it's new :).
Interesting that the idea is to distribute load across the zones (because people default to using A), but Google Cloud zones are uniform across all accounts and it seems to work for Google.
1d here

I heard that aws actually scrambled availability zones... so your 1a might be my 1c, etc.. Haven't confirmed it, but the comments seem to bear that out.

Cronitor saw our customers first impacted at 4:29, largely recovered by 5:04.

Another interesting thing is that AZ identifiers are randomized from customer-to-customer so when people report their failures, like here in this thread, it can sometimes seem like a problem is region-wide when in fact it's isolated to an AZ.

I don't follow your point about randomized AZs. I agree they are randomized and have observed and correlated region letters across multiple customers/accounts. How does that make it seem like a problem is region wide though?
When several people report failures, each in their own AZ, if you don't realize they're randomized, you can assume many/all AZs are impacted.
Got your point now. Thanks for replying.
Yep, phone just exploded with alerts
Amazon are starting to push 'new' features and services to Ohio pretty quickly if you still need a non-Virginia AWS East Coast location.
us-east-1 is the most common region for failures.

It's literally a joke between me and my friend, when either of us are unavailable for social time.

Is this related to the L3 fiber cut and related Comcast outages in the northeast?
Yes, seeing issues with the autoscaling API for a single zone in us-east-1.
Highly recommended switching to google cloud.
always.
lol
yes
Yes
yup