| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CraigJPerry 1173 days ago

> 3. Theory of Constraints: A system is only as strong as its weakest point. Focus on the bottleneck. Counterintuitively, if you break down the entire system and optimize each component individually, you’ll lower the effectiveness of the system. Optimize the entire system instead.

This underpins a lot of what we call DevOps (i mean the actually useful interpretation of DevOps, not all the shit that gets a DevOps label in an attempt to sell something).

Despite working with this idea every day for over a decade, the idea itself still blows my mind. Despite being theoretically quite succinct, it has so much practical depth that i struggle to see myself getting bored of applying the learnings from it.

Anecdote: i landed in a role in a part of a big org that was on fire, but the fire wasn’t due to stupidity. The team was huge with genuinely no lemons on it (i later found out this was no accident - the head had been given permission to cherry pick staff from across the org and he took a lot of flak for causing brain drain in other parts of the org). They had a software component that everyone relied on in production but technically no one really owned. Everyone was maxed out, growth wasn’t the problem but an externally driven change in how the business worked was. The pace was non-negotiable.

This component “worked” as far as we could tell. Volumes and the fact that some theoretical failure modes would be hard to detect in practice at that time, meant it was not possible to be confident that it was fully working correctly, but it was at least mostly working.

The problem: no one could release changes to this component reliably but changes were often needed. Over 80% of releases were rolled back. On average it took 2-point-something releases to successfully get a change out to this component that didn’t need to be rolled back.

Lots of optimisations had been applied to this component. This was not a stupid team and it did not suffer this pain willingly. There were software optimisations applied - mostly tools to abstract changes to be simpler to deal with. For example, one source of complexity was a bunch of rules that had to be dealt with but these could be handled in software allowing the human to just specify mostly the desired behaviours. That improved the situation but only slightly, also the continuously changing landscape meant this tooling itself became a moving target and a source of bugs. There were special review processes for changes, there were 3 experts in this huge org who reviewed everyone elses changes - this review process was excruciating to perform and involved examining a gargantuan model representation in excel.

Still the failed releases ticked up. No other part of the system suffered in this way.

There was popular thinking at the time was that this system just needed an owner. Of course no one wanted that thankless poisoned chalice.

Applying ToC to this resulted in a system that needed no owner long term, the tools that been created were all disbanded, the review board too.

The result was that newbies to the team pairing on a deliverable would be given responsibility to change that component as a way of flexing their solo skills.