Hacker News new | ask | show | jobs
by pst 1489 days ago
I maintain a Terraform provider for Kubernetes. And one of the main reasons for that is because the Terraform state ensures purging of deleted resources.

Something that kubectl is not capable of. The lastAppliedConfig annotation does not help for purging, because once the manifest has been deleted on disk, there is no way of knowing what to delete from the server. The unusable apply --purge flag is the best example of this issue

I think the state mainly exist to know what has been created in the past but since been deleted from manifests and therefore needs to be purged. The caching/performance argument is rather weak, because Terraform refreshes by default anyway before any operation.

3 comments

> I think the state mainly exist to know what has been created in the past but since been deleted from manifests and therefore needs to be purged. The caching/performance argument is rather weak, because Terraform refreshes by default anyway before any operation.

Beautiful summary.

For resources with flexible tags, one could easily imagine tags like Kubernetes's:

    terraform.io/name
    terraform.io/instance
However, for tag-less resources you have no choice but to store state to map real-world IDs with what is in the config.

I wish Terraform "tried harder" to avoid state when it can be avoided. Perhaps it could introduce some soft state, where deleted resources are refreshed by looking at tags and not state.

Flux, IIRC, uses labels or annotations to do purging. Helm I'd argue falls into the state category with the secrets if uses to track releases.

I do everything with Terraform so I'm not super familiar with either of them. But teams are free to choose their poison.

i think kubernetes is not a great example in favor of more client state (like tf) since k8s has uniform resource structure (metadata.*) and first class labeling support. but as you point out kubectl doesnt use labels well (at least imho).

when building https://carvel.dev/kapp (which i think of as "optimized terraform" for k8s) the goal was absolutely to take advantage of those k8s features. we ended up providing two capabilities: direct label (more advanced) and "app name" (more user friendly). from impl standpoint, difference is how much state is maintained.

"kapp deploy -a label:x=y -f ..." allows user to specify label that is applied to all deployed resources and is also used for querying k8s to determine whats out there under given label. invocation is completely stateless since burden of keeping/providing state (in this case the label x=y) is shifted to the user. downside of course is that all apis within k8s need to be iterated over. (side note, fun features like "kapp delete -a label:!x" are free thanks to k8s querying).

"kapp deploy -a my-app -f ..." gives user ability to associate name with uniquely auto-generated label. this case is more stateful than previous but again only label needs to be saved (we use ConfigMap to store that label). if this state is lost, one has to only recover generated label.

imho k8s api structure enables focused tools like kapp to be much much simpler than more generic tool like terraform. as much as i'd like for terraform to keep less state, i totally appreciate its needs to support lowest common denominator feature set.

common discussion topics:

* whats the lowest common denominator for apis that need to be supported

* how much state to store client side vs server side (in the api itself e.g. tags or in "assistive service" e.g. s3 api)

* is it enough to just store resource identifiers vs whole resource content (e.g. can resource content be retrieved at a later point; if content is stored, is it sensitive)

* how easy is it to recover from complete state loss

But if you create a configmap to store that label isn't that state? It may be more lightweight than what Terraform or Helm store, but it's still state.
Was just about to call out kapp but I see Dmitry is on it. We need kapp for all cloud resources
Could you elaborate on the poor usability of the --purge flag?
It's not trivial to get the labels correct to avoid collateral deletions. Also, while it makes sense, I and many teams I consulted with found it rather unintuitive that apply --purge with a label selector will also only update resources with the label. Not all resources that are in the list of resources. Last time I checked it was also still marked experimental and has been for years.