Hacker News new | ask | show | jobs
by btmcnellis 1496 days ago
I'm shocked that no one has said Terraform yet. It has its own declarative DSL, which some people complain about (because people complain about everything), but it works well for what it's intended to do.

Providers can be created for anything with an API, from the major cloud providers to k8s to anything else.

No agent is required: it just writes state to a file, and then it diffs that file against the actual state every time it runs. (In practice, you'll probably want to put that state in a remote location like an S3 bucket, but that's very easy to do. And if you're the only one using it, you can just save it locally, which is the default behavior.)

Depending on your use case for Ansible, it could be a very good fit.

3 comments

Terraform is great for certain tasks, but even they advise against using it for local execution. Whatever you use it for you really need a provider and their module system isn't very intuitive either. Ansible has/had potential, but sucks in a lot of ways too. Unfortunately, as much as I dislike certain aspects, it really is the best generic automation tool available at the moment.
Yeah, if you need to manage individual servers/VMs, it's not a great fit. I've used cloud-init files to configure EC2 instances on startup with things like packages and SSH keys, and that works pretty well if you can treat those servers as if they're immutable. But if you need to get in there and run something, it's not quite a replacement for Ansible.
Ansible for very small scale projects. It is not scalable for larger unreliable clusters. You probably need to go to salstack if you need any scale.
Yet people use daily to manage 10k-100k+ servers/devices.

The term "scaling" has very different meanings depending on context and how one product scales is very different from another.

You could setup a context that favors push vs pull and vice versa, you can also see different products scaling well or not depending on slight variations in context and implementation.

I am highly dubious that it would be possible to manage 100k servers, specially to do interactions with large numbers at a time. The way tower collects results in a thread pool, assuming success, simply does not work at any scale. I tried and tried. I fixed many bugs and got to about 4000 hosts before changing to another platform.

If almost every server is reliable, I am sure it would work fine. That is not going to happen at scale.

Terraform does the job but it's pretty dirty and unreliable. I've had so many cases where a plan looks all great, PR gets approved and merged, and then something happens during the apply causing it to fail because all validation is done in the cloud API, not in the provider code.

It's another tool pretending to be declarative.

Yeah, we had a PR bomb recently in a way that "plan" could have *trivially* caught had it used any of the "Get*" APIs from the cloud provider to ask about the current situ.

I appreciate that "the map is not the terrain," and that "plan" is speculating about a future configuration of the world, but come on -- if "terraform plan" is going to require _live credentials_ to run, and then only use those to enumerate the active regions, what are we even doing here?!

Shoudn't a `terraform plan` tell you that? If not then the state of the infra vs what's in the terraform state is different. I've had issues with version changes in the past and needing to update state files and all that malarky.
No, that's kind of my point. Terraform looks sexy and declarative on the surface but it's really just turning HCL into cloud API calls where the actual logic happens. Once you've got a few hundred lines the wheels start falling off. If it were truly declarative it wouldn't need to store what it knows about the existing infrastructure in a tfstate file.

Tform started off as a cool idea with good principles and over time has morphed into a shitty scripting language for managing multi cloud infra without clickops.

I'll do you one better: it's turning HCL into *an opaque golang intermediary*[1] of cloud API calls

It's like a game of telephone were every new participant in the chain is one more place to have "let me help you" turn into "what the hell was that?"

1 = and that's not even getting into the tire fire of the providers being either some Internet rando or an already overloaded team trying to have PRs make it through and out to release. I believe the the recent "we're not reviewing PRs anymore, exhausted" was just scoped to the hashicorp/terraform repo specifically, but it could very easily also apply to every code-gen shim that sits between TF and the underlying cloud SDK

You'll find a lot of places use Terraform and a config management tool. Terraform is great to build out cloud infra (not just instances but load balancers, object storage etc), but when it comes to mantaining system configuration and application state, it's less optimal.