Hacker News new | ask | show | jobs
by tomlagier 1651 days ago
I wonder if AWS will make more or less money from these outages?

Will large players flee because of excessive instability? Or will smaller players go from single-AZ to more expensive multi-AZ?

My guess is that no-one will leave and lots of single-AZ tenants who should be multi-AZ will use this as the impetus to do it.

Honestly, having events like this is probably good for the overall resilience of distributed systems. It's like an immune system, you don't usually fail in the same way repeatedly.

7 comments

* Free chaos monkey installed in every AZ
> * Free chaos monkey installed in every AZ

Only during this beta period, AWS will start charging for this feature soon enough.

We (Netflix) begged them for years to create a Chaos Monkey that we could pay for. There were things we just couldn't do ourselves, like simulate a power pull or just drop all network packets on the bare metal. I guess not enough people asked.
CMaaS sounds amazing for resiliency engineering. There's so much I want to be doing to perturb our stack, but I don't know all the ways stuff can go wrong. Sure I can ddos it, kick services and servers offline, etc, but that's what, a few dozen failure modes? Expertise in chaos would be valuable by itself. Not to mention being able to shake parts of the system I normally can't touch.

Side note: terraform is pretty good for causing various kinds of chaos, deliberately or otherwise.

If my company is any indication, they're going to make more money since everyone will simply check the multi-AZ or multi-region checkboxes they didn't before and throw more money at the problem instead of doing proper resiliency engineering themselves.
It doesn’t matter how much of resiliency engineering you do. Having everything in a single AZ is a risk. If this is acceptable then it’s fine if not you need to think of multi az from day 1.
Auth0 ran in six AZs in two regions[1] and went down today[2], because they picked the wrong two regions. How many regions and AZs should someone pay for before they get reliability?

1: https://auth0.com/blog/auth0-architecture-running-in-multipl... 2: https://twitter.com/auth0/status/1471159935597793290

At a minimum they should have chosen regions not in the same time zone or general geographic area. US-West 1 and US-West 2 might well be safeguarding against a server failure but is not a disaster plan. If your customers are global, choosing multiple continents is probably prudent.
Whelp, I guess you're not using Cognito then. It has no user account syncing feature so you can't have a user group in more than one region. Grrrrr!
No one just "moves off" AWS. Once your apps are spaghetti coded with lambdas, buckets and all sorts of stuff, it's basically impossible to get off. More than likely, as you noticed, it will increase spending since multi-AZ/multi-region will become the norm.
>I wonder if AWS will make more or less money from these outages?

There is no possibility that outages are good for AWS. Nor is there more money to be made from "publicity" of the outages.

I think GP has a point with,

>Or will smaller players go from single-AZ to more expensive multi-AZ?

No -- if they needed to they already would have migrated to a multi-region. If they don't need it -- they won't have. The reason is simple -- it's expensive as you say. I'm not a fanboi or evangelist of AWS either -- I do have pet theories they named their products with shit names in order to make more money by making AWS skills less transferable to Google Cloud etc. S3 should be Amazon FTP, RDS should be Amazon SQL etc.
> S3 should be Amazon FTP

I... don't think you know what S3 is. Or maybe what FTP is.

(Also S3, EC2, RDS, etc. were named long before GCP had competing services)

I mean, lots of people put off doing something expensive but safer just because it’s expensive, but rethink after the consequences show.
S3 is nothing like FTP? RDS stands for Relational Database Service. You have a valid point but picked the worst examples.
S3 is Simple Storage Service RDS is Relational Data Service EC2 is Elastic Compute Cloud

All of these make sense.

If you're gonna complain about names, at least pick the really sucky ones, like Athena, Snowball, etc.

You’re saying businesses always make the right decisions and never put them off?
Not at all the case. It was a regional outage that got Netflix to more than double our AWS spend going multi-region, so that outage netted them millions of extra dollars per year just from Netflix.
You’re underestimating the ability of eng leadership to not take these issues seriously. Only when there’s sufficient pressure from the very top or even the customers it takes a priority.
> There is no possibility that outages are good for AWS.

Do you know how many non-technical CEOs/boards/bosses have told their tech people that they need to go multi-region/cloud because that's what the one-paragraph blog and/or tweet told them to do in response to last weeks event?

The actual answer?

In the next 5 calendar years the bottom line will still grow.

However, the brand damage means they permananently lose market share. Which impacts their growth ceiling.

I would not go multiple Availability Zone within the same Infra/Cloud provider...
"Or will smaller players go from single-AZ to more expensive multi-AZ"

Yes! When you have a service interruption pay 2x more! With a region down I am sure other regions wont have any interruptions either! /s