Hacker News new | ask | show | jobs
by severusdd 378 days ago
The 92 % stat looks really interesting! It’s rarely the spectacular crash that knocks a cluster over. Instead, the “harmless” retry leaks state until everything breaks at 2 a.m on one fateful Friday. Evidently, we should budget more engineering hours for mediocre, silent failures than for outright disasters. That’s where the bodies are buried.
1 comments

Or survivorship bias: the major issues, that have been addressed, do not cause problems cause they were addressed. Some of the minor issues that are not addressed randomly do cause major issues.