| But does it really matter that the incident is a flood or a cascading software failure if the likelihood and severity is the same? Being in the same building is an "implementation detail" from a customer perspective, what matters is the consequences of this decision. For example, maybe this decision allows for better network connectivity at a lower cost for inter-zones traffic, while, on the other hand, not protecting against some classes of risks. In the end, you can have a similar multi-zone outage keeping the region down for an extended period of time just because of a bad network config push (see the massive facebook outage in 2021). As a customer, I don't care if it's a flood or a network outage. Imho, what matters the most is a clear documentation of how these abstractions work for users and the corresponding contractual agreements (costs, SLAs, etc). Users can thus decide if they are ready to pay the price of protecting themselves against an extended outage impacting a single region. |
The MTTR for outages caused by physical damage is way higher, and resiliency against physical disasters is a major selling point of availability zones as a fault container.
Hosting every zone of your region (if that's actually the case here) in the same building is simply negligent.
Besides the obvious risks like this incident, even if the zones have physical fire barriers, chances that operators will be allowed in to one "zone" after another has a fire are slim to none.