| HN Mirror

Yes that’s very correct. The way I think of it, MTTR is easier to measure and manage as a manager. MTTR is all about “operational excellence”. Basically, when shit hits the fan, how good are we at figuring out what caused it and how to fix it. That’s a muscle that you can train, the script goes:

- What alerts are we missing that could have helped us catch that earlier?

- What dashboards could we have had to help diagnose the issue quicker?

- What Ops tools could we have had to help mitigate such issue quicker?

- What extra logging/metrics/telemetry could we add to help us catch this quicker?

- What “safe deployment practices” could we have employed to avoid/improve this?

- what processes could we enforce to facilitate all of that?

Rinse and repeat that few hundreds or thousands of times while mounting MTTR KPI and you will see that number improve. Most likely through your team “gaming it”

MTBF is much, much, tricker to measure or “manage out”. It’s about “excellence in engineering” which is not measurable nor controllable. You want a random feature X. Your team tells you it’s really not how the system works, and they want few months making the change slowly while observing the system. But you don’t want just X, you want X, Y, Z, W, V, Q, A, B, C, D, all the way throw AAZZW12. So you tell the team to go fuck itself.