Hacker News new | ask | show | jobs
by imiric 1661 days ago
> This seems like an insane stance to have, it's like saying businesses should ship their own stock, using their own drivers, and their in-house made cars and planes and in-house trained pilots.

> Heck, why stop at having servers on-site? Cast your own silicon waffers, after all you don't want spectrum exploits.

That's an overblown argument. Nobody is saying that, but it's clear that businesses that maintain their own infrastructure would've avoided today's AWS' outage. So just avoiding a single level of abstraction would've kept your company running today.

> Because you are worst at it. If a specialist is this bad, and the market is fully open, then it's because the problem is hard.

The problem is hard mostly because of scale. If you're a small business running a few websites with a few million hits per month, it might be cheaper and easier to colocate a few servers and hire a few DevOps or old-school sysadmins to administer the infrastructure. The tooling is there, and is not much more difficult to manage than a hundred different AWS products. I'm actually more worried about the DevOps trend where engineers are trained purely on cloud infrastructure and don't understand low-level tooling these systems are built on.

> AWS has fewer outages in one zone alone than the best self-hosted institutions, your facebooks and petagons. In-house servers would lead to an insane amount of outage.

That's anecdotal and would depend on the capability of your DevOps team and your in-house / colocation facility.

> And guess what? AWS (and all other IAAS providers) will beg you to use multiple region because of this. The team/person that has millions of dollars a day staked on a single AWS region is an idiot and could not be entrusted to order a gaming PC from newegg, let alone run an in-house datacenter.

Oh great, so the solution is to put even more of our eggs in a single provider's basket? The real solution would be having failover to a different cloud provider, and the infrastructure changes needed for that are _far_ from trivial. Even with that, there's only 3 major cloud providers you can pick from. Again, colocation in a trusted datacenter would've avoided all of this.

4 comments

>, but it's clear that businesses that maintain their own infrastructure would've avoided today's AWS' outage.

When Netflix was running its own datacenters in 2008, they had a 3 day outage from a database corruption and couldn't ship DVDs to customers. That was the disaster that pushed CEO Reed Hastings to get out of managing his own datacenters and migrate to AWS.

The flaw in the reasoning that running your own hardware would avoid today's outage is that it doesn't also consider the extra unplanned outages on other days because your homegrown IT team (especially at non-tech companies) isn't as skilled as the engineers working at AWS/GCP/Azure.

The flaw in your reasoning is that the complexity of the problem is even remotely the same. Most AWS outages are control plane related.
> it's clear that businesses that maintain their own infrastructure would've avoided today's AWS' outage.

Sure, that's trivially obvious. But how many other outages would they have had instead because they aren't as experienced at running this sort of infrastructure as AWS is?

You seem to be arguing from the a priori assumption that rolling your own is inherently more stable than renting infra from AWS, without actually providing any justification for that assumption.

You also seem to be under the assumption that any amount of downtime is always unnacceptable, and worth spending large amounts of time and effort to avoid. For a lot of businesses systems going down for a few hours every once in a while just isn't a big deal, and is much more preferable than spending thousands more on cloud bills, or hiring more full time staff to ensure X 9s of uptime.

You and GP are making the same assumption that my DevOps engineers _aren't_ as experienced as AWS' are. There are plenty of engineers capable of maintaining an in-house infrastructure running X 9s because, again, the complexity comes from the scale AWS operates at. So we're both arguing with an a priori assumption that the grass is greener on our side.

To be fair, I'm not saying never use cloud providers. If your systems require the complexity cloud providers simplify, and you operate at a scale where it would be prohibitively expensive to maintain yourself, by all means go with a cloud provider. But it's clear that not many companies are prepared for this type of failure, and protecting against it is not trivial to accomplish. Not to mention the conceptual overhead and knowledge required with dealing with the provider's specific products, APIs, etc. Whereas maintaining these systems yourself is transferrable across any datacenter.

This feels like a discussion that could sorely use some numbers.

What are good examples of

>a small business running a few websites with a few million hits per month, it might be cheaper and easier to colocate a few servers and hire a few DevOps or old-school sysadmins to administer the infrastructure.

and how often do they go down?

depends I guess, I am running on-prem workstation for our DWH. So far in 2 years it went down minutes at the time, when I decided to do so, because of hardware updates. I have no idea where this narrative came from, but usually hardware you have is very reliable and doesn't turn off every 15 minutes.

Heck, I use old T430 for my home server and still it doesn't go down on completely random occasions (but thats very simplified example, I know)

But was it always accessible from the internet, and serving requests in an acceptable amount of time?
The one in work yes, but for internal network, as we are not exposed to internet. But to be honest, we are probably one of few companies that make priority that there is always electricity and internet in the office (with UPS, electricity generator, multiple internet providers).

No idea what are the standards for other companies.

There are at least 6 cloud providers I can name that I've used which run their own data centers with capabilities similar to AWSs core products (ec2, route53, s3, cloud watch, rdb)

Ovh, scaleway, online.net, azure, gcp, aws

That's one's I've used in production, I've heard of a dozen more including big names like HP and IBM, I assume they can match aws for the most part.

...

That being said I agree multi tenant is the way to go for reliability. But I was pointing out that in this case even the simple solution of multi region on one provider was not implemented by those affected.

...

As for running your own data center as a small company. I have done it, buying components building servers and all.

Expenses and ISP issues aside, I can't imagine using in house without at least a few outages a year for anywhere near the price of hiring a DevOps person to build a MT solution for you.

If you think you can you've either never tried doing it OR you are being severely underpaid for your job.

Competent teams to build and run reliable in house infrastructure exist, and they can get you SLA similar to multi region AWS or GC (aka 100% over the last 5 years)... But the price tag has 7 to 8 figures in it.