Hacker News new | ask | show | jobs
by NathanKP 2128 days ago
Hi I'm a developer advocate in the AWS Container organization. This question of how to handle resource destruction is an active issue on the project, and we'd welcome your comments (or those of anyone else) on this Github issue: https://github.com/aws/aws-controllers-k8s/issues/82

Our goal is to make this project have "no surprises" and therefore no unexpected destruction of resources. The specifics of how we mark resource as safe to delete instead of retaining by default are under discussion on that Github issue.

2 comments

One way our team has dealt with this question in a highly serverless-oriented and infrastructure-as-code-driven (how's THAT for buzzword soup) environment is to explicitly separate stateful resources from stateless, while exposing reference hooks in a configuration store to cross from one to the other. We've found that doing so _greatly_ reduces the blast radius of mistakes and lets us move more quickly and confidently.

The stateless stacks generally have a lot of development activity going on, and rapidly iterate. This is where most of our code and logic lives. This is where the vast majority of our deployment (and related cloud configuration) activity happens.

All of that thrash is kept away from the stateful stacks - think S3 buckets or DynamoDB tables - where, if THOSE thrash, we potentially get an outage at best, or lose data at worst (backups notwithstanding).

We DO NOT WANT stateless oriented stacks to own the lifecycle for stateful stacks. They inherently need to be treated differently. Or, at least the impact of mistakes is different.

The trick comes when you need to tie them together. To do this, we've added CloudFormation hooks and other deployment time logic that publish ARN and other connectivity info to our configuration store. The stateless services look up config values either during deployment or at runtime and are able to find the details they need to reference the state resources they need access to.

We've poked at toolsets like Amplify that lump everything together and have already been bitten numerous times. We've found that the difference between stateful and stateless resources should not be papered over, but instead emphasized and supported explicitly by tooling.

... all of this being one team's experience over the years, of course.

Very curious to see how this paradigm evolves here!

[edit]… Riffing on this just a little bit further… as I’m thinking about it here, it comes down to abstraction level. In a deployment or resource management domain, a generic “this is a cloud resource” isn’t very useful. What’s way _more_ useful is something like “this is a stateful resource” or “this is a stateless resource”, because that level describes resource behavior more clearly, AND how to interface with or manage those resources.

There are echos of code development principles here intentionally - robust cloud infrastructure management is mirrors software dev practices as much as infrastructure management ones!

Went a very sim route with Serverless and Terraform. Things we couldn't afford to lose went into Terraform(RDS instances, Kinesis, etc). AWS resources that were more coupled to the individual services and could be deleted and recreated went into the serverless configs.

A large reason for this was that we couldn't trust CloudFormation or Lambda to not require blasting away and recreating resources. At least at the time it was not uncommon for a stack to get "stuck" or a lambda function to stop having it's EINs properly configured(essentially stuck).

I sincerely hope this works well. CloudFormation has already many issues with unintended changes, visibility of the changes to be done, auto update/deletion of resources, etc. Having another path to provision infrastructure that is safe, consistent, and provides visibility at all phases would be great.