| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by plasma 1992 days ago

Fundamentally I’d suggest you share this pain with your team, including any product managers and management/business development teams.

Change your thinking and approach.

You can do this by culturally re-prioritising the development teams workload to fundamentally treat the root causes for any outage and regular alerts as urgent to be resolved.

The work needed to fix the root cause gets to kick something out of the current sprint to be attended to immediately.

The dev/product team should fundamentally agree the alerts should be rare, not regular.

Instead of just tweaking alarms, and feeling beaten down at the regular issues, change your thinking to tackle the root causes and fix them, just like any bug or new feature.

You’ll become excited that you’re solving the issues.

By having this shared understanding in the dev team to always be resolving root cause of outages, including architecture restructures and rebuilds of components that take weeks or months, you’ll reduce these incidents dramatically.

Finally, by doing this, you share the pain with everyone else - product managers and business leads don’t get their features or other improvements as fast, they now see what you deal with, they’ll ask why things appear to have slowed down, and you can now say you need more resources.