This seems like a pretty common response for a breaking incident for a scale app. Requests flow through to a failing system and trigger HTTP 500. Those requests may pachinko through the stack, making a variety of calls that can compound the degradation of a system weathering an unplanned failure state.
Engineers stop the bleeding by 503'ing requests at the perimeter or putting up a static maintenance page. This allows things like caches or DBs or app servers to cool off while a rollback or a revert goes out. Then, when the system is stable again, let requests flow through again (slowly, of course).
Engineers stop the bleeding by 503'ing requests at the perimeter or putting up a static maintenance page. This allows things like caches or DBs or app servers to cool off while a rollback or a revert goes out. Then, when the system is stable again, let requests flow through again (slowly, of course).