Hacker News new | ask | show | jobs
by discodave 1652 days ago
The AWS summary says: "As the impact to services during this event all stemmed from a single root cause, we opted to provide updates via a global banner on the Service Health Dashboard, which we have since learned makes it difficult for some customers to find information about this issue"

This seems like bad faith to me based on my experience when I worked for AWS. As they repeated many times at Re:Invent last week, they've been doing this for 15+ years. I distinctly remember seeing banners like "Don't update the dashboard without approval from <importantSVP>" on various service team runbooks. They tried not to say it out loud, but there was very much a top-down mandate for service teams to make the dashboard "look green" by:

1. Actually improving availability (this one is fair).

2. Using the "Green-I" icon rather than the blue, orange, or red icons whenever possible.

3. They built out the "Personal Health Dashboard" so they can post about many issues in there, without having to acknowledge it publicly.

1 comments

Eh I mean at least when DeSantis was lower on the food chain then he is now, the normal directive was that ec2 status wasn't updated unless a certain X percent of hosts were affected. Which is reasonable because a single rack going down isn't relevant enough to constitute a massive problem with ec2 as a whole.