| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by apparentlymart 3597 days ago

At work we use Packer, Terraform and Consul across all of our apps, and little smatterings of other stuff in some places. A little on each:

- Packer: Not my favorite, honestly. I can't argue that it's doing the job, but it feels inflexible and hard to integrate into a coherent workflow. It seems that Hashicorp Atlas can smooth this over in principle; we don't use it, because it didn't seem to fit with our use of Terraform at the time we got started, and we have a semi-home-grown alternative now.

- Terraform: We're using Terraform not only for low-level infrastructure stuff (VPCs, subnets, etc) but also for application deployment. I'd say our success with Terraform was due to a couple things. First: we picked up Terraform at a time when we were in the process of a total infrastructure rework in our org anyway, so we were effectively starting from scratch. Second: I spent a few months using Terraform for toy things and learning what it was good at, what it was less good at, and building a "pattern library" of techniques that had worked out. Once we started applying it to real problems, we just cherry-picked suitable patterns from that library and used them. I expect that Terraform is tougher for someone who already has significant infrastructure deployed and is trying to manage it with Terraform with few changes, since there are definitely approaches that are harder to model in Terraform than others.

- Consul: I really enjoy the simplicity of Consul. Getting a cluster up and running is pretty straightforward. Once you have it running, you get a highly-available, datacenter-aware key/value store and a service registry. We honestly don't use the service registry very much, but we have made extensive use of the consul-template utility in conjunction with Terraform's consul_key_prefix resource to have applications/services announce where their endpoints are for consumption by their clients.

We actually decided against using Vagrant because it was "more bulky" than our app developers were willing to tolerate. Instead we continued with our previous solution (running the apps direction on the users' laptops with a README in each app describing how to get it running) being optimistic that the new Docker for Mac and Docker for Windows would be awesome enough to get the good parts of Vagrant in a lighter package.

Vault showed up a bit late for our "architecture remix" so we solved our Vault-ish problems in other ways. I like its design in theory, and would probably give it a try if the opportunity arose.

Similar story with Nomad: too late for us, and we'd gone down an alternative path before it showed up. Can't really speak to it, since I only dabbled with it very briefly.

I'm sad but honestly not surprised to see Otto phased out. I was initially excited when it was announced last year but I could never really figure out how to get it to behave in the way I expected... I always felt like I was fighting it, and doing things in a way it didn't expect. I think there's room for the Hashicorp family of tools to "tessellate better", but Otto seemed like a very coarse, heavy solution -- essentially wrapping and templating the complex tools underneath -- where I was more hoping for the tools themselves to grow features to close the gaps.

This turned in to a bit of a rant, so I'll stop. :D

1 comments

apparentlymart 3597 days ago

Realized I missed a key point on Terraform:

I advise anyone using Terraform in production to wrap it up in some sort of automation. Hashicorp would of course like you to use Atlas :D but you can get a long way with CI/automation tools like Jenkins, Rundeck, ...

We have a wrapper script which: - configures the remote state in a predictable way (setting up remote state properly is one of the more fiddly parts of Terraform usage) - takes a snapshot of the current state - runs "terraform plan" to produce a plan file - takes a snapshot of the current state, which has now been refreshed by Terraform - pauses here and waits for human approval of the plan - takes a snapshot of the current state one more time, even though it's usually just another copy of the last state we snapshotted - runs "terraform apply" to apply the plan created earlier - takes a snapshot of the final state

All that state-snapshotting is an insurance policy against Terraform getting itself confused. There are definitely some gotchas in this area[1] but honestly we've only actually made use of these zealous state snapshots on two separate occasions, and they were both on our pre-production staging environment (which we deploy to more carelessly, as a dry run for production) rather than our production environment.

I have thought about open sourcing that wrapper script but sadly it has some assumptions about our environment built into it (e.g. locking using a specific service in our world, so that two deploys can't run concurrently) and I've not had the time to scrub them out and generalize it.

[1] https://gist.github.com/apparentlymart/657885e730d1e5abc6ea

roman_sf 3597 days ago

Just set remote with versioned s3 bucket, usually it's enough for insurance.

jacques_chester 3597 days ago

I'd rather use BOSH, which has an explicit compare-and-repair model.

On the other hand, Terraform is much easier to get started with and much less opinionated.

Disclosure: I work for Pivotal, we donate the majority of engineering on BOSH.