Hacker News new | ask | show | jobs
by saghm 1239 days ago
Someone has to "approve" the status pages showing what's actually happening? From a customer perspective, it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one because no one really cares about what the status page says if they're not having issues. It's hard to see how the goal here could be anything other than trying to add plausible deniability for what would otherwise be obvious deception.
4 comments

> it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one

Thats not the goal.

> It's hard to see how the goal here could be anything other than trying to add plausible deniability for what would otherwise be obvious deception

Thats the goal. The "status page" is considered the source of truth for most of the big contracts. If status-page=OK then your contract with them isn't violated. So changing the status page is a big deal, with real financial implications. The status page isn't a view into the SRE's tickets, its a declaration that the service isn't being provided.

Utter rubbish. Major contracts have account managers and it all gets hashed out 1-1.
Don't know why this was downvoted. We've definitely been able to provide proof of an outage when the status page showed otherwise and get a refund in the form of server credits by contacting them directly. For all 3 big vendors, AWS, Azure, GCP
Agree here as well. It's usually not that hard to provide based on the many, many metrics Azure resources emit that their SLA was breached.

What might be happening is that there is fine print you have to read and be in compliance with in order to be eligible for the SLA.

For example, look at all the conditions which have to be met for a breach of VM SLA in Azure:

https://azure.microsoft.com/en-us/support/legal/sla/virtual-...

Hidden in the SLA details is typically hints on how you can become more resilient in the cloud. So it pays to read the SLA details and really deeply understand what they are telling you.

Exec approval for showing major outages on status dashboard is pretty much standard practice across large companies. The main differentiator is whether it’s approved within five minutes or two hours.
> it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one because no one really cares about what the status page says if they're not having issues.

I disagree. What if you're having issues and the status page is incorrectly reporting an incident? It would be easy to waste a load of time waiting for the status page to sort itself out, only to find out you've still got an issue.

You can't approve a fact.
As others noted, the so-called "status" pages of big service providers don't serve to reflect reality but to shape it. For actual status you need to consult independent monitoring services.
well.... if that fact can be delayed by just a tiny bit... that's enough