| HN Mirror

So, I'm making a bit of a subtle claim - you should absolutely be elbow-deep in your systems, and you should be understanding things well enough to build these sorts of proactive alerts, but you shouldn't rely on people being elbow-deep for noticing problems in real time.

If you're ever at the point where you catch a problem and automated monitoring didn't, that's a bug in automated monitoring. If you are really good at finding new bugs in automated monitoring and more things to monitor because you're spending your time getting a sense of how the system behaves, that's fantastic, keep doing that. (That is one of the good reasons for dashboards IMO - a bunch of data to look at when you've already realized something's wrong. Just don't use dashboards to make the decision that something must be wrong.) If you don't improve your automated monitoring and you're worried things will start failing without humans watching dashboards, then you're not solving your existing bugs.