You can't prove that "this work caused us to not get paged" versus "that work is unnecessary and you wouldn't have been paged regardless".
Even when you can, you can't prove the impact. As a real example, our team has extensive presubmit infrastructure to catch and block some classes of configuration error that have caused customer data corruption in the past. There have been CLs which were caught by those presubmits and meant that we didn't have outages, but there's no dollar amount tied to an outage that didn't exist.
Meanwhile, team X did something similar that caused data corruption, had N customers affected for such a period of time, scrambled to root cause, roll back, and restore from backups, getting customers back up and online. Look how responsive and great they are!
You can have before and after data and track trends. How did you know the issues was wide spread in the first place. You must have some proof somewhere.
The impact is how many outages overall. If you only prevent one outage then maybe it's not that meaningful.
Your last paragraph, your right that happens in the short term. In the long term those teams get reputations for being a shit show, there will be high turnover, good engineers won't transfer in, people's compentaencies start to get questioned, other teams will avoid working with that team and develop their own solutions, and higher up people will start to look at what's going on.
> those teams get reputations for being a shit show,
Reputations with who? The VPs who rotate in and out every few years (if you're lucky enough to go a few years between reorgs) for a new title and salary bump?
> there will be high turnover, good engineers won't transfer in,
On the contrary, many people want to work on the team that gets visibility where people can actually get promoted rather than having to justify their existence constantly
Even when you can, you can't prove the impact. As a real example, our team has extensive presubmit infrastructure to catch and block some classes of configuration error that have caused customer data corruption in the past. There have been CLs which were caught by those presubmits and meant that we didn't have outages, but there's no dollar amount tied to an outage that didn't exist.
Meanwhile, team X did something similar that caused data corruption, had N customers affected for such a period of time, scrambled to root cause, roll back, and restore from backups, getting customers back up and online. Look how responsive and great they are!