Hacker News new | ask | show | jobs
by oneplane 1073 days ago
I'm surprised nobody has mentioned Atlantis yet. Running bare terraform in CI is a bad idea (to the extent that running an 'expect' script for an interactive tool is a bad idea), and when you consider the impact it can have (both on resources and on escalation) it should be out-of-band anyway.
1 comments

Atlantis was a great tool back in the day and still works well in most scenarios. The main issue with it is that it also takes on running the jobs (as in Terraform binary runs on the same VM it runs). Which makes it similar to Jenkins and other first-generation CI systems.

Companies that use Atlantis at scale (eg Lyft) felt the need to fork it and use a scalable compute backend instead, eg Temporal. At which point you've basically got a DIY in-house CI.

Our view is that it's best to keep matters separate. The CI part with compute, jobs, logs etc is a solved problem. What's unsolved for Terraform is state-aware logic when / how to run those jobs. It's all about the orchestrator really.

I think in my case (and almost everyone else's case) we'll never go Lyft-scale, but with about 100 AWS accounts (and a bunch of Cloudflare, On-Prem [Compute, Networking], GitHub and other providers) and 300 terraform environments we haven't had the problem you described yet.

To us, CI is about integration while Terraform is about reconciliation. Technically both could be categorised as 'jobs', but by that metric, a CD event is also just a job, and so is a migration for an RDBMS and adding and removing products from inventory. But we don't call them jobs, because their specialisation warrants specialised handling. To be fair, we aren't based in the US so perhaps it's more of a localised thing.