Hacker News new | ask | show | jobs
by seabass 819 days ago
Whenever there's an incident like this, it seems like the status pages take a while to reflect the problem. Why is that? The part of the status page that can be manually updated is now correct, but most of the automated checks still show 100% uptime and only "degraded performance" in some cases despite being fully offline.

If I were implementing a status page, it might look something like a ping to some url from various regions. Assuming that's what these status pages are doing, why do they often say "All systems operational" until well into the downtime? Frustrating that I have to confirm on HN before I can know for sure something isn't just down for me.

2 comments

(Render CEO) As an engineer myself, I understand your frustration. Updating the status page automatically is a hard problem when the systems in question are distributed and complex. Perhaps we could post a generic 'something is wrong' message using the automated checks we have in place, and go from there.
For me it is the speed of update, I posted an outage on our own status a good 15m before they did on their status. I had to email them before I got an ACK.