|
|
|
|
|
by pjmorris
2296 days ago
|
|
> Similarly, if a human found an issue and alert didn't trip, I'd count that as a bug/missing feature in the monitoring. The way that I took the GP's point was that humans can find things that haven't yet been automated, while automation can't (at least not yet, but I'd argue it'll take AGI for that.) |
|
What you should do is rely on automation to detect problems and alert people, and in postmortems, look at graphs and have humans say things like "Hey, this queue kept steadily climbing for three hours before the outage" or "We would have noticed it in this metric but it's so noisy so we can't alert on it" or something. Then you can write more automation (or focus on some prerequisite dev work).