| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bcrosby95 1251 days ago

> Why is Joe’s closet computer a bad choice? Because it’s a single point of failure and we won’t be able to ship fast if it breaks, which it will.

It's kinda a shame they reduced it to this. A single machine in a colo center is going to be far more reliable than single availability zone in AWS, which is all many people resort to. And maybe we've just gotten lucky, but in general our very simple 20 machine colo center setup has been more reliable than our single region AWS setup.

But yeah, if you literally put Joe's computer in a closet it isn't gonna be too reliable for all sorts of reasons completely unrelated to the reliability of modern computer hardware.

4 comments

andrewmutz 1251 days ago

I think he literally means Joe's computer closet.

Colo is fine if you can set it up extremely quickly and it takes minimal ongoing support time. Depending on what you mean by Colo, that may or may not be possible.

iLoveOncall 1251 days ago

Staying online is one part of it, returning online after a failure is another, and there AWS will be simply incomparable to your single machine which will have burnt.

themitigating 1251 days ago

"It's kinda a shame they reduced it to this. A single machine in a colo center is going to be far more reliable than single availability zone"

I think that depends on the colo honestly. What is so unreliable about a single EC2 instance in a zone?

mrkurt 1251 days ago

Faster disks, no control plane to fail, simpler network, etc.

This isn't bias speaking, I work on Fly.io, our VMs are less reliable than EC2 VMs. AWS's pitch is that all the extra complexity in their infrastructure benefits you. So is ours! But it is, in fact, extra complexity that will bite you in the ass if you don't build your apps the right way.

dilyevsky 1251 days ago

The fact that under the ec2‘s hood it’s a massively complex infrastructure as opposed to most colos.

arcturus17 1251 days ago

Yea I’m left scratching my head too. Is there really a difference in reliability between an EC2 instance and colocated hardware?

mrkurt 1251 days ago

Yes. In my experience, it's substantial. 200 servers in colo == maybe 1 failure every 6 months. 200 EC2 instances, one per month.

These are different things, though. If you're using AWS, you would build to account for this.

themitigating 1250 days ago

your personal experience has no value when discussing reliability as a whole

mrkurt 1247 days ago

thank you.

tptacek 1247 days ago

Your personal experience is no match for the awesome power of AXIOMS.

CuriousCosmic 1251 days ago

US-East-1 is pretty famous for being unreliable. The other zones tend to be a lot more reliable in comparison.

acdha 1251 days ago

There’s also a lot of selection bias: that region is the most popular and people remember hearing about problems a lot more than the people who were unaffected but didn’t say anything about it.

I’ve had plenty of instances in us-east-1 for over a decade without downtime other than the 17 minutes in 2011 where they had a network routing issue which kept the entire region running but off of the internet. I never had that with a colo - power outages & backhoes - but several came close.

For me, I’d tend to focus the question on how screwed you are if something goes down. You can save a ton of money for a bandwidth-heavy service if you use a colo so it’d really be a question of how easy it is to make it redundant (short outage) and rebuild (long outage or permanent equipment failure).

karmakaze 1251 days ago

It's well known that us-east-1 (the very first) is a pet among AWS's cattle regions.

It has failure modes that none of the other regions have.

acdha 1251 days ago

I’m aware, but my point was simply that people are prone to overstating the extent of those problems. If it was as bad as lore would have it, it’d be far less popular.

karmakaze 1251 days ago

Why would you think it would be less popular? Most everyone that chooses us-east-1 chooses it because they're close to it and 1 is the first number. They don't research it before they start using it.

themitigating 1251 days ago

"Well known"

That's not sufficient evidence

908B64B197 1251 days ago

> our very simple 20 machine colo center setup has been more reliable than our single region AWS setup.

What happens when there's an outage at the rack level where the 20 machines are?

Or when the whole colo is on fire? [0]

[0] https://www.datacenterdynamics.com/en/news/fire-destroys-ovh...