If your application and infra can magically utilize multiple zones with “a couple lines”… then I would say you are miles ahead of just about every other web company.
Today, a SaaS I’m familiar with that runs ~10 Aurora clusters in us-east-2 with 2-3 nodes each (1 writer, 1-2 readers) in different AZs had prolonged issues.
At least 1 cluster had a node on “affected” hardware (per AWS). Aurora failed to failover properly and the cluster ended up in a weird error state, requiring intervention from AWS. Could not write to the db at all. This took several hours to resolve.
All that to say that it’s never straightforward. In today’s event, it was pure luck of the draw as to whether a multi-AZ Aurora cluster was going to have >60 seconds of pain.
That SaaS has been running Aurora for years and has never experienced anything similar. I was very surprised when I heard the cluster was in a non-customer-fixable state and required manual intervention. I’ve shilled Aurora hard. Now I’m unsure.
Thank goodness they had an enterprise support deal or who knows if they’d still have issues now.
Or how about "I'm fully aware, I've done the math taking into account both cost and complexity of implementation and cost of downtime, and I'm probably making fantastic calls based on my actual needs."
This has quickly grown to more than adding in a couple of lines! Now I need to architect my legacy app so that I can deploy into lambdas, then I can get resiliency I don't really need!
Not all systems require high availability. Some systems are A-OK with downtime. Sometimes, I'm perfectly fine with eventual consistency. You really do have to look at the use-cases and requirements before making sweeping staements.
No, we were talking about architechts making decisions that you characterised as poor. I was pointing out that your statement was over-general and that there are many instances where making the informed decision to ignore HA is a completely reasonable thing to do.
By your last sentence, it appears you agree with me.
If you meant to say that your statement only applies to cloud architects who are attempting to maintain an uptime SLA with multi-az/region redundancy, then sure, AWS has lots of levers you can pull and those complaining really should spend some time studying them.
As for legacy applications, I would not have brought up them up at all if you hadn't suggested pushing things into lambdas as a solution to multi-az. Once again, there are many many situations where this is not appropriate. Not everything is greenfield, and re-architecting existing applications in an attempt to shoehorn it into a different deployment model seems a bit much. Unless I'm misunderstanding what you meant.