Hacker News new | ask | show | jobs
by stackskipton 590 days ago
As SRE dealing with former Pulumi, "Hey Devs can use code to deploy infrastructure" is not great idea you think it is. I've seen some real ugly conditional behavior where I'm like "Is this or is this not going to run? I honestly can't tell."
2 comments

We had so much conflict with the ops team over their choice of Terraform. The three colors of variable thing is just fucking bonkers. Getting tests wrapped around it that actually did what we thought they meant was a giant pain in the ass.

I won't go as far as to say we burned bridges arguing back and forth about it but they were definitely significantly singed.

Config files simply don't work until they do. And if it's your job to stare at them for hours and hours a day then maybe that's okay with you, but if you expect other people to 'just learn' it you're an idiot or an asshole. Or both. Ain't nobody got time for magic incantations.

I also think it should tell you you're on the wrong path when your app is named after a verb and the data it deals with is all declarative.

Ever thought that "Ops" needs a different mindset than the devs are used to ?
And that’s why we don’t delegate that work to devs.
Honestly, the culture/org structure is a way bigger problem in this story than any proper noun tool.

If you’re ignoring guidance and patterns and getting mad reinventing the wheel, that’s on dev. If “ops” mandates tooling and doesn’t have any skin in the game, that’s on them. And both problems are on your leadership.

If y’all just hate each other and don’t listen or participate, then you can’t be successful. It is ironic that this is the pattern that the devops movement landed us in.

Honestly curious, I've been writing terraform for a while but I have never heard of "The three colors of variable thing". Could you expand on that?
They mean var vs local vs from-a-resource. There are some places you can’t use some types of variables. It can be annoying but it’s not really a huge problem if you design your approach with that in mind.

The worst part is that the Terraform team at Hashicorp often excuse not fixing these design issues as “safety measures” which isn’t entirely untrue but when over half of your users want something, sometimes you should get over yourself.

For what it’s worth, OpenTofu is fixing many of these sorts of things that cause people pain.

But my advice is to learn to use the tool. Terraform has such great benefits (in the right use cases). If you’re struggling, either you are missing something or you chose the wrong tool for your particular job. Either way, don’t gripe that this specialized tool for infra management doesn’t work exactly like every other general purpose programming language.

That makes sense I guess, I just never considered locals or data resources as variables.
Same, locals are in my head like consts. You define it and it stays that way. A shortcut for a repeated value.

Data resources are you requesting a dynamic value of your environment.

Variables are dynamic values that a user can change.

That’s only the case if you spend all day rerunning deployments. If your task is more frequently to transition the cluster config from A -> B then the distinction blurs and you go from a 10:1 delta ratio of the different classes of state to maybe 3:2, at which point it feels like splitting hairs.

Especially if the locals vary between prod and pre-prod, and worse if dev sandboxes end up with per-user instances, which for us was mercifully only needed for people working on the TF scripts, so we could run our tests locally.

So would you go opentofu or pulumi or Sir Not Appearing in This Film?
Seconded, as someone that really does developer / operations, depending on the project assignment, I have learned the hard way that infrastructure configuration code should be as declarative as possible.

Sure "use code to deploy infrastructure" sounds great, and that is why we get stuff like Ant, Gradle, Pulumi, Jenkins Groovy scripts, .NET Aspire,.... until someone has to debug spaghetti code on a broken deployment.

On the flip side dsl declarative stuff is obfuscated magic that you can't step through or drive into.

a dsl like SQL involves one basic substrate (data organized in tables) that you can compile in your head. But declarative infra as code involves a thousand different things across a dozen different clouds.

Declarative will hold off spaghetti for... A bit. But it devolves to spaghetti as well (think fine grained acls, or places where order of operations, which the dsl does not specify and is magically resolved, becomes ambiguous).

And if you need to go off the reservation (dsl support doesn't exist or is immature for rapidly evolving platforms, need some custom postprocess steps) then you are... What?

Probably writing code and scripts to autoinvoke on the new node, phone home to a central.... Yup that's code.

Finally, declarative code has an implicit execution loop. But for something like iac that is a very complicated, the execution loop that isn't well documented. And some committed changes to declarative code May trigger a destructive pass followed by a possibly broken constructive phase.

It's a tough problem.