Hacker News new | ask | show | jobs
by themitigating 1260 days ago
"It's kinda a shame they reduced it to this. A single machine in a colo center is going to be far more reliable than single availability zone"

I think that depends on the colo honestly. What is so unreliable about a single EC2 instance in a zone?

4 comments

Faster disks, no control plane to fail, simpler network, etc.

This isn't bias speaking, I work on Fly.io, our VMs are less reliable than EC2 VMs. AWS's pitch is that all the extra complexity in their infrastructure benefits you. So is ours! But it is, in fact, extra complexity that will bite you in the ass if you don't build your apps the right way.

The fact that under the ec2‘s hood it’s a massively complex infrastructure as opposed to most colos.
Yea I’m left scratching my head too. Is there really a difference in reliability between an EC2 instance and colocated hardware?
Yes. In my experience, it's substantial. 200 servers in colo == maybe 1 failure every 6 months. 200 EC2 instances, one per month.

These are different things, though. If you're using AWS, you would build to account for this.

your personal experience has no value when discussing reliability as a whole
thank you.
Your personal experience is no match for the awesome power of AXIOMS.
US-East-1 is pretty famous for being unreliable. The other zones tend to be a lot more reliable in comparison.
There’s also a lot of selection bias: that region is the most popular and people remember hearing about problems a lot more than the people who were unaffected but didn’t say anything about it.

I’ve had plenty of instances in us-east-1 for over a decade without downtime other than the 17 minutes in 2011 where they had a network routing issue which kept the entire region running but off of the internet. I never had that with a colo - power outages & backhoes - but several came close.

For me, I’d tend to focus the question on how screwed you are if something goes down. You can save a ton of money for a bandwidth-heavy service if you use a colo so it’d really be a question of how easy it is to make it redundant (short outage) and rebuild (long outage or permanent equipment failure).

It's well known that us-east-1 (the very first) is a pet among AWS's cattle regions.

It has failure modes that none of the other regions have.

I’m aware, but my point was simply that people are prone to overstating the extent of those problems. If it was as bad as lore would have it, it’d be far less popular.
Why would you think it would be less popular? Most everyone that chooses us-east-1 chooses it because they're close to it and 1 is the first number. They don't research it before they start using it.
If people were experiencing significant downtime they’d leave us-east-1 or AWS. There’s no sign of that happening so I’d suggest that there’s a tendency to over-weight the degree to which people complaining in forums constitutes representative data.
"Well known"

That's not sufficient evidence