Hacker News new | ask | show | jobs
by parentheses 1991 days ago
On call being too eventful is a bug (arch, infra, code). The solution is to propose that every wake is responded to as something that must be prevented going forward.

The usual incident review and postmortem process can be applied. If they happen so often you can start with applying the process to some subset to start.

Firefighting is a waste of talented technical resources and results in good people leaving.