| "configuring a Linux base image for a specific server role" or "setting up a complex cloud environment from scratch" ^^^^^^^ I don't know if you have experience working this kind of thing but these processes have a single desired outcome and multiple undesired outcomes. They can break in many different ways and they do break more often than desired. One simple case related to setting up a Linux instance is when you want to have the "latest and greatest" python libraries without even checking if the stable versions of an LTS distribution would actually suit you. You end up with a huge list of things from pip, a list that sometimes breaks (even with fixed versions) and where security updates do not guarantee retro-compatibility. One day it works, next day it doesn't. Your process has more entropy than necessary and is less reproducible than desired. If instead of a server instance you have a docker image that is rebuilt in a CICD pipeline, you have the same problem but blocking an entire team that is expecting CICD to work all the time. This is just a single point of failure of many others that can coexist: a "corporate" transparent proxy with cache, an unreliable DNS server, a security appliance that interferes with downloads, etc. As for rebuilding cloud environments you really have to go through it to see how entropy scales with complexity. I could write a couple of pages about it.. and about how people are actually building cloud snowflakes. Here is a short take: https://logical.li/blog/devops/ |
And in the case where you're re-following the same process but with newer dependencies, there are going to be multiple 'desirable' states you should end up in in the event that the latest dependencies available aren't compatible.
I just don't follow how this insight is meant to make it possible to evaluate how to get out of this mess?