Hacker News new | ask | show | jobs
by 0x0nyandesu 1654 days ago
Saying "S3 is down" can mean anything. Our S3 buckets that served static web content stayed up no problem. The API was down though. But for the purposes of whether my organization cares I'm gonna say it was "up".
3 comments

> We are currently experiencing some problems related to FOO service and are investigating.

A generic, utterly meaningless message, which is still a hell of a lot more than usually gets approved, and approved far too late.

It is also still better than "all green here, nothing to see" which has people looking at their own code, because they _expect_ that they will be the problem, not AWS.

Most of what they actually said via the manual human-language status updates was "Service X is seeing elevated error rates".

While there are still decisions to be made in how you monitor errors and what sorts of elevated rates merit an alert -- I would bet that AWS has internally-facing systems that can display service health in this way based on automated monitoring of error rates (as well as other things). Because they know it means something.

They apparently choose to make their public-facing service health page only show alerts via a manual process that often results in an update only several hours after lots of customers have noticed problems. This seems like a choice.

What's the point of a status page? To me, the point of it is, when I encounter a problem (perhaps noticed because of my own automated monitoring), one of the first thing I want to do is distinguish between a problem that's out of my control on the platform, and a problem that is under my control and I can fix.

A status page that does not support me in doing that is not fulfilling it's purpose. the AWS status page fails to help customers do that, by regularly showing all green with no alerts hours after widespread problems occured.

As mentioned in the article, internal metrics were fubar most of the day.
Who cares if it worked for your usecase?

Being unable to store objects in an object store means that it’s broken.