| Outages like these don't really resolve instantly. Any given production system that works will have capacity needed for normal demand, plus some safety margin. Unused capacity is expensive, so you won't see a very high safety margin. And, in fact, as you pool more and more workloads, it becomes possible to run with smaller safety margins without running into shortages. These systems will have some capacity to onboard new workloads, let us call it X. They have the sum of all onboarded workloads, let us call that Y. Then there is the demand for the services of Y, call that Z. As you may imagine, Y is bigger than X, by a lot. And when X falls, the capacity to handle Z falls behind. So in a disaster recovery scenario, you start with: * the same demand, possibly increased from retry logic & people mashing F5, of Z * zero available capacity, Y, and * only X capacity-increase-throughput. As it recovers you get thundering herds, slow warmups, systems struggling to find each other and become correctly configured etc etc. Show me a system that can "instantly" recover from an outage of this magnitude and I will show you a system that's squandering gigabucks and gigawatts on idle capacity. |
If it was possible to have this fixed sooner I’m sure they would have done that. That’s not the point of my comment tough.