Hacker News new | ask | show | jobs
by foobiekr 85 days ago
It doesn't help that almost all of the big tech companies talking about 5 9s are lying about it; "Does it respond to the API at all, even with errors? It's up!" and so on. If you spend a lot of time analyzing browser traces you see errors and failures constantly from everyone, even huge companies that brag a lot about their prowess. But it's "up" even if a shard is completely down.

The five nines tech people usually are talking about is a fiction; the only place where the measure is really real is in networking, specifically service provider networking, otherwise it's often just various ways of cleverly slicing the data to keep the status screen green. A dead giveaway is a gander at the SLAs and all the ways the SLAs are basically worthless for almost everyone in the space.

See also all of the "1 hour response time" SLAs from open source wrapper companies. Yes, in one hour they will create a case and give you case ID. But that's not how they describe it.

1 comments

Thats the rub.

Once you dig into the details what does it mean to have 5 9s? Some systems have a huge surface area of calls and views. If the main web page is down but the entire backend API still is responding fine is that a 'down'? Well sorta. Or what if one misc API that some users only call during onboarding is down does that count? Well technically yes.

It depends on your users and what path they use and what is the general path.

Then add in response times to those down items. Those are usually made up too.