Hacker News new | ask | show | jobs
by munchbunny 2539 days ago
I'd agree if that were actually true, but it's not.

With large enough services there is always some acceptable level of errors due to 0.001% probability events. When there's an outage, it's not usually everything down, but even 0.1% of jobs failing ends up affecting a lot of users.

Even 10% of jobs failing still isn't "down", it's "partly down", even if you have to issue credits for SLA violations and publish a public postmortem later.