Hacker News new | ask | show | jobs
by bcrosby95 1251 days ago
> Why is Joe’s closet computer a bad choice? Because it’s a single point of failure and we won’t be able to ship fast if it breaks, which it will.

It's kinda a shame they reduced it to this. A single machine in a colo center is going to be far more reliable than single availability zone in AWS, which is all many people resort to. And maybe we've just gotten lucky, but in general our very simple 20 machine colo center setup has been more reliable than our single region AWS setup.

But yeah, if you literally put Joe's computer in a closet it isn't gonna be too reliable for all sorts of reasons completely unrelated to the reliability of modern computer hardware.

4 comments

I think he literally means Joe's computer closet.

Colo is fine if you can set it up extremely quickly and it takes minimal ongoing support time. Depending on what you mean by Colo, that may or may not be possible.

Staying online is one part of it, returning online after a failure is another, and there AWS will be simply incomparable to your single machine which will have burnt.
"It's kinda a shame they reduced it to this. A single machine in a colo center is going to be far more reliable than single availability zone"

I think that depends on the colo honestly. What is so unreliable about a single EC2 instance in a zone?

Faster disks, no control plane to fail, simpler network, etc.

This isn't bias speaking, I work on Fly.io, our VMs are less reliable than EC2 VMs. AWS's pitch is that all the extra complexity in their infrastructure benefits you. So is ours! But it is, in fact, extra complexity that will bite you in the ass if you don't build your apps the right way.

The fact that under the ec2‘s hood it’s a massively complex infrastructure as opposed to most colos.
Yea I’m left scratching my head too. Is there really a difference in reliability between an EC2 instance and colocated hardware?
Yes. In my experience, it's substantial. 200 servers in colo == maybe 1 failure every 6 months. 200 EC2 instances, one per month.

These are different things, though. If you're using AWS, you would build to account for this.

your personal experience has no value when discussing reliability as a whole
thank you.
Your personal experience is no match for the awesome power of AXIOMS.
US-East-1 is pretty famous for being unreliable. The other zones tend to be a lot more reliable in comparison.
There’s also a lot of selection bias: that region is the most popular and people remember hearing about problems a lot more than the people who were unaffected but didn’t say anything about it.

I’ve had plenty of instances in us-east-1 for over a decade without downtime other than the 17 minutes in 2011 where they had a network routing issue which kept the entire region running but off of the internet. I never had that with a colo - power outages & backhoes - but several came close.

For me, I’d tend to focus the question on how screwed you are if something goes down. You can save a ton of money for a bandwidth-heavy service if you use a colo so it’d really be a question of how easy it is to make it redundant (short outage) and rebuild (long outage or permanent equipment failure).

It's well known that us-east-1 (the very first) is a pet among AWS's cattle regions.

It has failure modes that none of the other regions have.

I’m aware, but my point was simply that people are prone to overstating the extent of those problems. If it was as bad as lore would have it, it’d be far less popular.
Why would you think it would be less popular? Most everyone that chooses us-east-1 chooses it because they're close to it and 1 is the first number. They don't research it before they start using it.
"Well known"

That's not sufficient evidence

> our very simple 20 machine colo center setup has been more reliable than our single region AWS setup.

What happens when there's an outage at the rack level where the 20 machines are?

Or when the whole colo is on fire? [0]

[0] https://www.datacenterdynamics.com/en/news/fire-destroys-ovh...