| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lwb 2548 days ago
	Do people ever worry that an entire cloud provider may go down, or is that too unlikely of a case?

6 comments

jsty 2548 days ago

However much we technical people might salivate at the prospect of designing a multi-cloud solution, for the vast majority of businesses it simply isn't worth the cost / complexity. I'd wager 90-something percent of applications could suffer multi-hour outages without impacting business function to any measurable degree.

Plus the fact that without serious investment, you're probably more liable to decrease availability by going multi-cloud thanks to the increased system complexity.

link

hinkley 2548 days ago

The real trick here, which many people don’t want to look at, is to avoid overly centralizing your workflow.

I can get a lot of work done while Outlook is down. Hell, probably more work done.

If our build server is down I can work for a couple hours (unless we’ve done something very bad). Same for git or our bug database or wiki or or or. When I get stuck on one thing I can swap to something else every couple of hours. And there is always documentation (writing or consuming).

But if some idiot, hypothetically speaking of course, puts most of these services into the same SAN, then we are truly and utterly screwed if there is a hardware failure.

Similarly if you make one giant app that handles your whole business, if that app goes down and there are no manual backups you might as well send everybody home.

I went to get a drink the other day and the place looked funny. They’d tripped a circuit breaker and the whole kitchen lost power. But the registers and the beverage machines were on a separate circuit. And since they sold drinks and food in that order, they stayed open and just apologized a lot. Whoever wired that place knew what they were doing.

link

hunter2_ 2548 days ago

Probably lost 1 of 3 phases. You're quite right in that the decision of what phase a circuit is on has a lot to do with business, and hopefully no major repurposing of the space without rewiring the space has occurred. For lighting, you'd want 1/3 of fixtures per room to go out, not 1/3 of your rooms in their entirety. For appliances and receptacles, you'd rather lose a whole function (the kitchen) than be able to cook but not do dishes, with every function trying to figure out oddball workarounds.

link

StreamBright 2548 days ago

The chance that AWS goes down is much smaller than anything else going down. There are many SPOFs in a typical smaller company setup, most of those are not even obvious to the operators.

link

jsjohnst 2548 days ago

In the past ten years:

It’s happened more than once with Azure and GCP. I think it happened once with AWS, but not positive there.

link

dodobirdlord 2548 days ago

AWS had a multi-hour total S3 outage in us-east-1 in February 2017 that knocked out a huge number of things mostly because it turns out that a huge share of their customers run in only 1 region and it's us-east-1. Things mostly continued to work in other regions.

I recall Azure had some sort of multi-region database failover disaster that took several regions offline, and GCP has had several global elevated latency/error rate events, but I don't think that any cloud provider has been "down" in the sense that the word is usually used.

link

jsjohnst 2548 days ago

GCP (and all of Google) was down worldwide in 2013 as one example:

https://www.theregister.co.uk/2013/08/17/google_outage/

Here’s one that’s on Azure. Not a 100% total outage like above, but bad enough most I know in the industry would call it being down:

https://www.zdnet.com/article/windows-azure-suffers-worldwid...

If I get a free moment, I’ll dig up other examples, but those were ones that were easy to find.

link

sneak 2548 days ago

Billing issues can take down your entire account at a given cloud provider all at once.

link

Spooky23 2548 days ago

It’s a legit concern, but it adds complexity that will probably cause more outages than the thing you are worried about.

IMO, you’re better off with a private data center or colo and separate integrations with cloud.

link

debaserab2 2548 days ago

I don’t think it’s happened (yet) although some of the earlier outages when AWS was younger were pretty far reaching. I think all of S3 has gone down a time or two.

link

jsjohnst 2548 days ago

All of S3 has, but that’s because S3 had a single choke point in a single region for a long time.

link

ti_ranger 2548 days ago

> All of S3 has, but that’s because S3 had a single choke point in a single region for a long time.

The only S3 event here was limited to us-east-1: https://aws.amazon.com/premiumsupport/technology/pes/

Some APIs were impacted, because they are global by nature (e.g create-bucket). But S3 was working fine in all other regions, for existing buckets.

However, many websites were affected, because they didn't use any of the existing S3 features that allow for regional redundancy, simply because S3 had been so reliable they didn't know/think they needed to have critical assets in a bucket in a 2nd region that they could fail over to.

Admittedly, even the AWS status page was impacted, because it also relied on S3 in us-east-1.

S3 has done a lot of work to improve matters since, and mechanisms have been put in place to ensure that all AWS services don't have inter-region dependencies for "static" operation.

However, it is still incorrect to claim that it was all of S3. Many customers who use S3 only in other regions were totally unaffected.

link

tyingq 2548 days ago

All of S3 create-bucket is "all of S3" for a lot of use cases and customers.

link