|
|
|
|
|
by jetru
1653 days ago
|
|
There's definitely miscommunication around this. I know I've miscommunicated impact, or my communication was misinterpreted across the 2 or 3 people it had to jump before hitting the status page. For example, The meaning of "S3 was affected" is subject to a lot of interpretation. STS was down, which is a blocker for accessing S3. So, the end result is S3 is effectively down, but technically it is not. How does one convey this in a large org? You run S3, but not STS, it's not technically an S3 fault, but an integration fault across multiple services. If you say S3 is down, you're implying that the storage layer is down. But it's actually not. What's the best answer to make everyone happy here? I cant think of one. |
|
I can.
"S3 is unavailable because X, Y, and Z services are unavailable."
A graph of dependencies between services is surely known to AWS; if not, they ought to create one post-haste.
Trying to externalize Amazon's internal AWS politicking over which service is down is unproductive to the customers who check the dashboard and see that their service ought to be up, but... well, it isn't?
Because those same customers have to explain to their clients and bosses why their systems are malfunctioning, yet it "shows green" on a dashboard somewhere that almost never shows red.
(And I can levy this complaint against Azure too, by the way.)