Hacker News new | ask | show | jobs
by jimrandomh 4020 days ago
Because it's hard to make an automatic monitoring system that reliably distinguishes between "a failure occurred but everything is fine" and "a failure occurred and now everything is on fire".