Hacker News new | ask | show | jobs
by schmooser 2002 days ago
I’m building data pipelines in AWS (s3/sqs/dynamo/api gw/lambda/batch) + Snowflake.

Earlier this year I tried to use Terraform for everything, using principle “everything is a resource” (everything in my case is AWS, Datadog and Snowflake), so adopted “terraform apply” as universal deployment interface. Like if we need a ECR task and a Docker image, build the image from within Terraform (using null_resource which runs “docker build”). This approach works for everything but Lambda as Terraform requires a pointer to source code bundle at the plan stage. After unsuccessful fights I gave up for Lambda, so I build bundles prior to “terraform apply” (using “make build”, where build target does its magic of zipping either Go binary or Babashka Clojure sources).

That approach scales well for already two dozens of Lambdas and counting. Ping me if you want more details.

——

I disagree with this tutorial about tendency to use Terraform modules per AWS service, hiding well-documented AWS resources behind the facade of module with custom parameters with long names.

3 comments

I completely agree with your point about modules.

My rule of thumb is: only use a module if you want to enforce a very opinionated way to create resources across multiple projects. Most of the times you don't want that.

Regarding lambda, I also run a "make build" before "terraform apply". After all terraform is not a build tool. But I do make the zip with data.archive_file so that I can pass the output_base64sha256 straight to the lambda resource.

It really depends on your cloud; I recently worked with an openstack setup where I needed to create some 6 resources per compute instance, with ~200 instances that’s a whole lot of resource blocks to manage — hiding al that in a module abd passing in a dict of machines to create was imo the right choice..
I use a domain driven design (ddd) approach with terraform where different business domains and subdomains are isolated by modules. For example, if an application handles photo uploads and image processing, that would be in its own module.
I use modules to import serverless deployments as "data" blocks.
I don't fully understand your use case, but you can zip/package your lambdas using a null_resource TF resource too?
I tried it, doesn’t work - Lambda resource needs hash sum to compare against previous deployment to trigger updates of the source bundle, and Terraform needs a file to be present during plan stage. With null_resource the file is created after plan, during apply. To workaround this I tried to provide something else to bundle hash sum (like hash sum of source files, not bundle), but the value that TF keeps in the state is the one returned from AWS Lambda API, not the one you supply it; so it causes resource update on every apply, this is not what I wanted.

Your comment made me think of trying to skip bothering with Lambda’ hash sums and use custom refresh triggers instead, initiated by null_resource. Will do after holidays.

It can work. I wrote a script in Python to build stable zip files. The main thing is to ensure any timestamps are reset to zero and the files are added in a fixed order. It also filters out any unwanted files and directories to minimise the size of the bundle. I have a wrapper around it that generates the hash and uses pip3 (as I use Python for my lambdas) to download the dependencies and build the directory hierarchy for the lambda/layer, runs the stable zip script, and returns the path of the bundle and the hash.

This didn't take long to write, and reduced the amount of churn we had with our deploys. We had massive problems with one particular set of lambdas due to the sheer amount of code (mostly unavoidable dependencies, but shared, so they could go in a layer), and our deployment times plummeted to practically nothing after I knocked this together.

I'm not sure I can share the code as it's something I wrote for work, but it ought to be simple to recreate from the description above.

Yes. If we use a null_resource that has the hashes of the source code files as a trigger, then in the `local-exec` provisioner of the null_resource, we can run the build. The build can also be run remotely (we use google cloud build) to be independent of the developer's machine architecture and operating system, which is important for native dependencies. Terraform will not re-run the null resource provisioner so long as the source code does not change, there is no need for a reproducible build.
For various reasons (mainly auditing purposes, but it also reduces any incidental infrastructure churn, and makes it easier to guarantee a rollback happened as expected), we need to ensure reproducibility, so it's a bit more important for us that we guarantee the artifacts produced are exactly what we expect.
Instead of using source_code_hash you can push your code to s3 with the hash as the filename and update the lambda to point at the new file

Terraform can manage uploading objects to s3

It also seems a bit strange to have Terraform do the packaging. We do that in CI for most of our lambdas to ensure the test suite runs, linting, etc then it creates a zip at the end and pushes to S3

The only ones Terraform deploys directly are fairly trivial Python API "glue" lambdas

After reading this comment I decided to write a small blog how you could tackle this nevertheless https://smetj.net/deploying_aws_lambdas_with_terraform.html
Alternatively you could use the "archive_file" [0] resource provided by terraform. I use this resource to zip up my lambda source files and then use the hash of the zip file to determine if my application should be redeployed.

[0] https://registry.terraform.io/providers/hashicorp/archive/la...

This works fine only as long as you do not need a step to build or download dependencies, like `npm install` or `pip install`, as part of your run of terraform apply. Otherwise, a more complex solution is necessary, like talideon's above.
> I disagree with this tutorial about tendency to use Terraform modules per AWS service,

seconded