| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pbronez 1041 days ago

Yeah this stood out to me as well.

I think the reason for this is that observed production failure is _certain_. Hard fails are undeniable. They obviously need to be addressed and obviously deserve resources. The amount of deserved resources can be clearly calculated by projecting the concrete, observed costs of the failure forward in time.

Before prod goes down, there is much more uncertainty:

- even an expert engineering assessment has some level of uncertainty

- engineering may not have a full appreciation for the business context of the work, and might over-weight technical issues relative to other concerns

- if the engineers are contractors, or otherwise organizationally distant from the experience owners, that inserts a trust gap which further increases uncertainty

- the business owner’s projections are themselves uncertain. Is the expected launch volume really that high, or is it aspirational?

- the costs of failure are uncertain too… if the system goes down, how hard will it go down? What will that actually cost in lost revenue? Fuzzier stuff like brand reputation is even harder to quantify.

Meanwhile the costs of paying the contracted development team another 2 months on the same project are quite concrete. The team already spent significant political capital to force a change on an incumbent team. Now they’re saying they want more money because it still doesn’t work??

The big open question is - what was the cost of the failed launch? How long did it take to get the system back up and running at scale? What did it cost in terms of user loyalty? How does that compare to the concrete cost of holding launch until the auth system was upgraded?

Different people will answer those questions in different ways. What matters is how the customer answers those questions, whether their bosses believe that answer, and their bosses judgment of the overall situation.