If your application and infra can magically utilize multiple zones with “a couple lines”… then I would say you are miles ahead of just about every other web company.
Today, a SaaS I’m familiar with that runs ~10 Aurora clusters in us-east-2 with 2-3 nodes each (1 writer, 1-2 readers) in different AZs had prolonged issues.
At least 1 cluster had a node on “affected” hardware (per AWS). Aurora failed to failover properly and the cluster ended up in a weird error state, requiring intervention from AWS. Could not write to the db at all. This took several hours to resolve.
All that to say that it’s never straightforward. In today’s event, it was pure luck of the draw as to whether a multi-AZ Aurora cluster was going to have >60 seconds of pain.
That SaaS has been running Aurora for years and has never experienced anything similar. I was very surprised when I heard the cluster was in a non-customer-fixable state and required manual intervention. I’ve shilled Aurora hard. Now I’m unsure.
Thank goodness they had an enterprise support deal or who knows if they’d still have issues now.
I'm curious who these web companies are.
Use something like Lambda and you get multi-az for free.
https://docs.aws.amazon.com/lambda/latest/dg/security-resili...
Dynamo is another service that wouldn't be impacted as it is multi-az.
Getting postgres RDS multi-region would require the extra couple of lines in your CDK, but is fairly straightforward.