Hacker News new | ask | show | jobs
by oooyay 1001 days ago
No, not really. SLAs are calculated on a per customer basis and generally have a legal definition in contracts if they're actual, functioning SLAs.

The status pages purpose is generally to head off a flood of customer reported issues. This is why you'll usually see issues that affect a broader subset of users on that page.

1 comments

> No, not really. SLAs are calculated on a per customer basis and generally have a legal definition in contracts if they're actual, functioning SLAs.

And how can I as a customer calculate this? We're not going to sue each time there's a breach of SLA to get the real data. Whatever the status page says will trigger customers to decide if they should claim SLA credits. A lower number (delayed update of the status page) will skip payouts or reduce it.

> The status pages purpose is generally to head off a flood of customer reported issues. This is why you'll usually see issues that affect a broader subset of users on that page.

That's what you assume and that's what it's supposed to be. It's long been abused otherwise. Amazon for example will require explicit approval to update the page. They and others have famously delayed updating the status page as late as they can get away with often attempting to not even call an outage. It will say something like "increased error rates".

Five nines of availability calculates to 5 minutes, you can calculate up and down from there. If you don't want to do the conversion from percentage to minutes there's lots of calculators like this one: https://uptime.is/five-nines

I wasn't assuming what status pages are used for, I was speaking to my experience working in reliability engineering. I can't speak to Amazons practices as I've not worked there, but when I've seen this happen it's because we struggled to identify customer impact. The systems you're talking about are vaste and a single or even subset of applications reporting errors doesn't mean there's going to be customer impact. That's why I mentioned it usually takes a human that knows that system and it's upstreams to know if there'll be customer impact from a particular error.

I'd encourage you to read the wording of an SLA in a contract. They're often very specific in terms of time and the features they cover. Increased error rates tells me you'll probably run into retry scenarios, which depending on your contract may not actually affect an SLA. Error rates are generally an SLO or an SLI, which are not contractually actionable.

> And how can I as a customer calculate this?

Either your shit works, or it doesn’t. You do monitor, don’t you?

> Either your shit works, or it doesn’t. You do monitor, don’t you?

That then becomes a he said she said problem with the vendor you're claiming against. Does everyone have time for it? You will submit the SLA credit claim and chances are unless it's WAY off you'll accept the vendor's nerfed version and move on. Something is better than nothing.