| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by happymellon 1429 days ago
	Considering almost all of the services are multi-zone, it's not hard to add in a couple of lines to make them resilient against this. People are just unaware, and probably making bad calls in the name of being "portable".

2 comments

blamarvt 1429 days ago

If your application and infra can magically utilize multiple zones with “a couple lines”… then I would say you are miles ahead of just about every other web company.

link

happymellon 1429 days ago

> you are miles ahead of just about every other web company.

I'm curious who these web companies are.

Use something like Lambda and you get multi-az for free.

https://docs.aws.amazon.com/lambda/latest/dg/security-resili...

Dynamo is another service that wouldn't be impacted as it is multi-az.

Getting postgres RDS multi-region would require the extra couple of lines in your CDK, but is fairly straightforward.

link

leesalminen 1429 days ago

Today, a SaaS I’m familiar with that runs ~10 Aurora clusters in us-east-2 with 2-3 nodes each (1 writer, 1-2 readers) in different AZs had prolonged issues.

At least 1 cluster had a node on “affected” hardware (per AWS). Aurora failed to failover properly and the cluster ended up in a weird error state, requiring intervention from AWS. Could not write to the db at all. This took several hours to resolve.

All that to say that it’s never straightforward. In today’s event, it was pure luck of the draw as to whether a multi-AZ Aurora cluster was going to have >60 seconds of pain.

That SaaS has been running Aurora for years and has never experienced anything similar. I was very surprised when I heard the cluster was in a non-customer-fixable state and required manual intervention. I’ve shilled Aurora hard. Now I’m unsure.

Thank goodness they had an enterprise support deal or who knows if they’d still have issues now.

link

twistedpair 1429 days ago

It's that easy for a lot of managed services.

Want GKE to run multi-zone, or Spanner to run multi-region, just check a box (and insert coin).

link

the_fury 1429 days ago

Or how about "I'm fully aware, I've done the math taking into account both cost and complexity of implementation and cost of downtime, and I'm probably making fantastic calls based on my actual needs."

link

happymellon 1429 days ago

If you had "done the math" then you would have gone serverless and gained multi-az for free, as it is almost always the cheapest option.

link

the_fury 1429 days ago

This has quickly grown to more than adding in a couple of lines! Now I need to architect my legacy app so that I can deploy into lambdas, then I can get resiliency I don't really need!

Not all systems require high availability. Some systems are A-OK with downtime. Sometimes, I'm perfectly fine with eventual consistency. You really do have to look at the use-cases and requirements before making sweeping staements.

link

happymellon 1429 days ago

I thought we were talking about cloud architects making poor decisions when designing solutions.

Where did legacy apps come from?

> Some systems are A-OK with downtime.

And those ones would not have cared about this outage. Your point isn't that clear.

link

the_fury 1429 days ago

No, we were talking about architechts making decisions that you characterised as poor. I was pointing out that your statement was over-general and that there are many instances where making the informed decision to ignore HA is a completely reasonable thing to do.

By your last sentence, it appears you agree with me.

If you meant to say that your statement only applies to cloud architects who are attempting to maintain an uptime SLA with multi-az/region redundancy, then sure, AWS has lots of levers you can pull and those complaining really should spend some time studying them.

As for legacy applications, I would not have brought up them up at all if you hadn't suggested pushing things into lambdas as a solution to multi-az. Once again, there are many many situations where this is not appropriate. Not everything is greenfield, and re-architecting existing applications in an attempt to shoehorn it into a different deployment model seems a bit much. Unless I'm misunderstanding what you meant.

link

yibg 1429 days ago

Right, because magically serverless is the right answer for every application.

link