Hacker News new | ask | show | jobs
by impoppy 1251 days ago
My complaint is that there shouldn't be unknown or uncertain states in the first place. Infrastructure should be a finite state machine, not infinite. Failure in transition from state A to state B should result in rolling back to state A, not arbitrary state X.
3 comments

Sometimes you cannot rollback. The peril of infrastructure is that it is an imperfect, living state machine. Terraform is a compromise between runbooks and deterministic definitions. Some operations you are committed to the change and will need to figure out exceptions on the other side of the apply.

(infra engineer in a previous life when Terraform was first released)

What I got from this thread is that Terraform was created for this exact reason - to be able to work in a mysterious state that just happened to be there. Therefore Terraform normalizes existence of unknown unpredictable environments (which is already absolutely normal in the real world, as stuff is never exclusively black or white). But, on the other hand, isn't making stuff extremely predictable part of our job? Doesn't it contradict our goals we strive for when creating software?
I very much agree! With that said, I would argue that Terraform and other IaC tools make infra more predictable but not extremely predictable. The predictably is a function of the consistency, complexity, and failure modes of an execution context. It brings order to chaos, but you will still have some annoying or white knuckle chaos at times. Understanding that is key to effectively wielding the tools. My thesis is infrastructure will likely never be as deterministic as code due to its nature, and if you mistakenly treat them as equals, you’re gonna have a bad time.
Implementations across cloud providers are going to be different, and I don't know how AWS vs GCP vs Azure is handling failure, so now it's your responsibility.

Now the problem has grown from just write a few lines of bash script to, "create a script that can handle failure and reverts it so a known state", this is a more complex problem than just creating a resource. And now multiply this for all different resources, EC2, AKS, RDS, Security Groups ... and keep up with the API.

And if somebody joins your team, and wants to contribute to the solutions, they're going to have to understand the codebase.

If you could bring that up with cloud providers that would be great ;-)

The reality is infrastructure is commonly in unknown states, whether we like it or not.