| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sourceless 584 days ago

I think unfortunately the conclusion here is a bit backwards; de-risking deployments by improving testing and organisational properties is important, but is not the only approach that works.

The author notes that there appears to be a fixed number of changes per deployment and that it is hard to increase - I think the 'Reversie Thinkie' here (as the author puts it) is actually to decrease the number of changes per deployment.

The reason those meetings exist is because of risk! The more changes in a deployment, the higher the risk that one of them is going to introduce a bug or operational issue. By deploying small changes often, you get deliver value much sooner and fail smaller.

Combine this with techniques such as canarying and gradual rollout, and you enter a world where deployments are no longer flipping a switch and either breaking or not breaking - you get to turn outages into degradations.

This approach is corroborated by the DORA research[0], and covered well in Accelerate[1]. It also features centrally in The Phoenix Project[2] and its spiritual ancestor, The Goal[3].

[0] https://dora.dev/

[1] https://www.amazon.co.uk/Accelerate-Software-Performing-Tech...

[2] https://www.amazon.co.uk/Phoenix-Project-Helping-Business-An...

[3] https://www.amazon.co.uk/Goal-Process-Ongoing-Improvement/dp...

7 comments

motorest 584 days ago

> The reason those meetings exist is because of risk! The more changes in a deployment, the higher the risk that one of them is going to introduce a bug or operational issue.

Having worked on projects that were perfectly full CD and also projects that had biweekly releases with meetings with release engineers, I can state with full confidence that risk management is correlated but an indirect and secondary factor.

The main factor is quite clearly how much time and resources an organization invests in automated testing. If an organization has the misfortune of having test engineers who lack the technical background to do automation, they risk never breaking free of these meetings.

The reason why organizations need release meetings is that they lack the infrastructure to test deployments before and after rollouts, and they lack the infrastructure to roll back changes that fail once deployed. So they make up this lack of investment by adding all these ad-hoc manual checks to compensate for lack of automated checks. If QA teams lack any technical skills, they will push for manual processes as self-preservation.

To make matters worse, there is also the propensity to pretend that having to go through these meetings is a sign of excellence and best practices, because if you're paid to mitigate a problem obviously you have absolutely no incentive to fix it. If a bug leaks into production, that's a problem introduced by the developer that wasn't caught by QAs because reasons. If the organization has automated tests, it's even hard to not catch it at the PR level.

Meetings exist not because of risk, but because organizations employ a subset of roles that require risk to justify their existence and lack skills to mitigate it. If a team organizes it's efforts to add the bare minimum checks to verify a change runs and works once deployed, and can automatically roll back if it doesn't, you do not need meetings anymore.