Hacker News new | ask | show | jobs
by ckozlowski 3397 days ago
(Disclaimer: I work for AWS.)

The dashboard is not changing color due to the S3 issue. We're updating the banner in place of that.

Edit: Update at 11:35 AM PST: We have now repaired the ability to update the service health dashboard. The service updates are below. We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.

http://status.aws.amazon.com/

3 comments

For some reason, reading about "believe we understand root cause" made me think of: "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."
Maybe you could encourage your colleagues to host the status page outside of AWS?
We'll have to wait for the postmortem, but I bet it was an unintentional dependency on S3 that no one realized had come into place until S3 went down -- especially considering how fast they were able to remove the dependency and fix it.
This reminds me of a GitHub outage from them having a build dependency on GitHub. IIRC, they tried to roll back to building a prior version but since the site was offline, the build failed.
S3 gets used to store a lot of static content. Can't speak for that team, but I'm sure they'll take that feedback. Happy the banner functionality remained unimpeded.
Possibly AWS status page wisely relied on a third party, which relied on a fourth party, which relied on S3.
It took them ~30 minutes.
It took them 2 hours actually.
Maybe with GCP? :)
I'm happy to offer up some spare space on my godaddy hosted Linux plan if that helps...
Could you go more in depth? What does S3 have anything to do with it?
I think the most reasonable guess is that they have some backend system that continously pushes some status json/xml file to an S3 bucket.

Then there's the frontend, that apparently periodically reads this file from S3 and caches the results.

I guess the comment they added on the top after two hours of being in the dark was likely manually added to the web frontend.

Obviously all of this would be hilariously badly designed if it was made this way. Still...

It's where they store the error icon.
So the "working" icon works, but the "not working" doesn't? I'm not sure that's right.
Tragic comedy gold