| You don’t know what you’re talking about. AWS spends a lot of time thinking about this problem in service to their customers. How do you reduce the status of millions of machines, the software they run, and the interconnected-ness of those systems to a single graphical indicator? It would be dumb and useless to turn something red every single time anything had a problem. Literally there are hundreds of things broken every minute of every day. On-call engineers are working around the clock on these problems. Most of the problems either don’t affect anyone due to redundancy or affect only a tiny number of customers- a failed memory module or top-of-rack switch or a random bit flip in one host for one service. Would it help anyone to tell everyone about all these problems? People would quickly learn to ignore it as it had no bearing on their experience. What you’re really arguing is that you don’t like the thresholds they’ve chosen. That’s fine, everyone has an opinion. The purpose of health dashboards like these are mostly so that customers can quickly get an answer to “is it them or me” when there’s a problem. As others on this thread have pointed out, AWS has done a pretty good job of making the SHD align with the subjective experience of most customers. They also have personal health dashboards unique to each customer, but I assume thresholding is still involved. |
A good low-hanging fruit would be, when the outage is significant enough to have reached the media, you turn the dot red.
Dishonesty is what we're talking about here. Not the gradient when you change colors. This is hardly the first major outage where the AWS status board was a bald-faced lie. This deserves calling out and shaming the responsible parties, nothing less, certainly not defense of blatantly deceptive practices that most companies not named Amazon don't dip into.