| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by throwaway823882 1927 days ago

You are incorrect. edit according to your wording, you are sometimes correct. But terraform plan often can't show values that it can't predict ("Values known after apply"), so you often can't predict what apply will do. And you can never know for sure if the operations will succeed.

First of all, what does terraform apply do? It runs API calls. Those API calls may or may not work, based on criteria outside of the control of Terraform. It may be supposed to work, but that doesn't mean it will. Use AWS enough and you will run into this weekly.

Second, Terraform has semantics to do things like "create before delete" or "delete before create". This behavior is controlled by the provider, per-resource. But Terraform has no earthly fucking clue which it's supposed to do, only the provider knows. And the provider does not check before terraform apply runs whether an operation will conflict with a resource, even if Terraform is managing that resource.

So you run terraform plan -out=terraform.plan, and the whole thing output seems fine! So you terraform apply terraform.plan, and Terraform happily performs an operation which is literally impossible, and then fails. Then you go find the code that has been working just fine for 2 years and change it to fix the lifecycle for this particular scenario, and then you can terraform apply correctly.

Thirdly, a lot of terraform plan runs show output with unknown values, because the values won't be known until after you run terraform apply. But those values can conflict with existing resources with the same values. Terraform makes no attempt to verify if conflicting resources exist before apply. Therefore, your terraform apply can fail unpredictably (to be correct: you could predict it if Terraform were designed to actually look for conflicting resources before applying).

There's about a hundred of these edge cases that I run into every day. I already perform all operations in a non-production environment, but as we all know, the environments naturally drift (because cloud infrastructure is mutable infrastructure) so there are times production breaks in a way that didn't in non-production.

1 comments

jen20 1927 days ago

> edit actually, according to your wording, you are correct.

That’s uhh... why I chose that wording.

> Use AWS enough and you will run into this weekly.

I _really_ wish you understood how amusing this exhortation is.

I’m under no illusion about the reliability of AWS APIs - much less Azure or any of the other major public clouds.

The easiest way to reduce drift is to prevent read/write access via anything except a designated service account for Terraform. Per one of your earlier replies about a coworker having upgraded TF underneath you, it sounds like you aren’t doing that.

I’m with you on some aspects of your rant - specifically precious resource consumption such as names - for what it’s worth (and am a former core maintainer of Terraform, and still work with that codebase daily elsewhere), but if you’re going to rant about a project whose maintainers (or former maintainers) may be listening, you’d better be technically correct.

I’d suggest issues (yes I know there are a lot of them, I don’t know how they get triaged anymore) or mailing list posts, or <gasp> contributions as a more helpful way to engage rather than complaining on an internal Slack channel.

link

throwaway823882 1927 days ago

Been there. If Mitchell or anyone else on the core team doesn't want a feature to work a certain way, or feels that a bug isn't a bug, or maybe they want to wait until the next big release to look at something, tough luck for us. I suppose I have to become a Terraform developer in my spare time, and then I only need to spend years fighting the corporation that owns it over how to design it to actually function the way humans need it to.

Terraform doesn't look for existing resources. It doesn't properly plan for new values. It doesn't have a convention to define resources that can or can't be used idempotently. It doesn't support modern deployment methods. It doesn't make a best effort to fix things even when it clearly can detect what the solution or user's intent is. It can't auto import resources, even when it knows that something already exists, and it won't remove something in its own state when you're trying to import it, even though you clearly want to replace what's in state with what you're importing. And it's a monolith, so it's not like we can change small parts to fix our use cases.

There's no amount of GitHub issues or mailing lists that will fix all this crap. Some of it is poor functionality, some of it is buggy functionality, and some of it is just bad design. It needs to be redesigned, and its components separated so they can be replaced or improved as needed, without editing Go code or a DSL.

But mainly, it needs to work the way humans need it to work in production. Not the way a bunch of idealistic devs think is theoretically the best way. "Just don't ever touch an AWS account at all by any means other than Terraform. What's so hard about that?" Nobody who has an actual live product used by customers [for money] could possibly survive this way without hours of outages and weeks of delays. If your tool is designed for an impossible scenario, then it's impossible to use it "properly".

link

jen20 1927 days ago

> Terraform doesn't look for existing resources.

This is 100% at odds with your stated goal of predictability.

> It doesn't make a best effort to fix things even when it clearly can detect what the solution or user's intent is.

As is this.

> It can't auto import resources, even when it knows that something already exists, and it won't remove something in its own state when you're trying to import it, even though you clearly want to replace what's in state with what you're importing.

I’m noticing a pattern with your stated desires being at odds with your stated goals.

FWIW, I manage dozens of accounts which no one touches outside of TF (or other similar tools) and have worked this way since 2016 with basically no issues.

link