I agree with you. This seems more complex than just having a auto scale group that auto rotates nodes after a certain amount of time and just picking a new update when the node launches.
I can provide a little background on this. In general yes I would recommend that you just use an ASG and roll out a new AMI. However that approach can be very expensive and time-consuming at truly massive scale (1000's or even 10's of thousands of machines).
Bottlerocket is built in part based on our experiences operating AWS Fargate, which obviously has as one of its needs the ability to patch a colossal number of hosts which are running people's containers, without downtime or disrupting their containers. Bottlerocket is designed to ensure that this is both efficient and safe. We aren't the only ones with this need. Many large orgs also have tremendous fleets, and its unacceptable to cause significant disruption by rotating at the host level.
Another aspect to consider is stateful workloads that are using the local disks. Bottlerocket lets you safely update your host if you are running something like a database or other stateful system where you don't really want to move your data around.
Not everyone will need to use this updating mechanism, but I think it will be very attractive to many of the larger organizations with a lot of infrastructure.
I agree with the confusion. There is the ability to rollout updates in a "wave"[1], but I'm not sure how this is better than a simple rollout strategy in kubernetes since a reboot of the node seems inevitable.
It seems to me, not to be combative, that if Fargate can't afford the "noschedule: node is old" overhead and customers of Fargate can't handle their containers restarting on a regular basis, there's something wrong with your management engine or with their design and implementation. Much of the point of containerization is that you can roll containers often and run enough of them that you never have a single point of failure. What part of that assumption is broken that destroying machines regularly doesn't work?
There are any number of reasons to avoid restarting things. Some customers are running code that has a cold start and needs some time to warm up its cache if it restarts. Some customers are running jobs (video rendering, machine learning training, etc) that might take literally days to complete. Interrupting these jobs and causing them to restart wastes the customer time and causes them to lose progress. Other containers may be hosting multiplayer game servers, and forcing them to restart would cause all people logged into the game instance to get disconnected or otherwise dropped from their game.
All of the above are use-cases that AWS Fargate is used for. Beyond this many folks simply don't like it when things happen unexpectedly outside of their control. We have Fargate Spot for workloads that can tolerate interruption, and we discount the price if you choose this launch strategy. However Fargate on-demand seeks to avoid interrupting your containers. You are in control of when your containers start and stop or autoscale.
This makes a ton of sense and I appreciate the response. I think what people aren't recognizing is that cloud services make you pay for performance, so doing things like relaunching containers which have slow warmup time literally costs extra money. While it's certainly important to design systems such that the containers can be tossed aside easily, that doesn't mean there isn't value in reducing how often that tossing aside occurs.
Any plans to reduce the minimum bill time for Fargate to accommodate short tasks?
With 1 minute minimum billing you have to turn to lambda for very short tasks or have a long running Fargate consuming tasks from some message bus.
If you choose lambda, your containers don’t work so you need to rebuild your runtime with lambda layers or ebs or squeeze into the lambda env.
If you choose messaging, say SQS from a lambda called by API gateway you’ve complicated your architecture and your Fargate instance is potentially hanging out billing, idle, and waiting for messages.
Fargate spot removed the last reason to consider AWS Batch. Short tasks could largely replace lambda.
This stuff is probably waaaay over my head, but isn't that why SIGTERM was made for ? To notify a running process that the host needs to be shutdown/restarted and to let the running process finish it's current task (current frame encoding / current multiplayer game / current request / ...) and that the state / cache / progress / ... needs to be saved.
The process on aws side would then be : send SIGTERM to all workloads. wait for [configurable] amount of time (maxed at xx hours) or until all workloads have exited (whichever comes first). Shutdown the node. Update the node. Start the node. Restart the workloads.
Yep you are right about SIGTERM, but let's think back to the original reason why we wanted to update the node: because of a patch, probably a security patch for a CVE?
What is the better option here? Implement a SIGTERM based process that allows the user to block the patch for a critical, possibly zero-day CVE for xx hours, remaining in a vulnerable state the entire time? Or implement a system that just patches the underlying host without interrupting the workloads on the box?
You aren't wrong, what you described is a possibility, but it is not the best possibility.
If there's a CVE vulnerability that is being actively exploited on your network, you should preempt running processes to deal with it, and absolutely must take the boot+nuke approach, because it already could be affecting any host that has not already been boot+nuked?
If there's not a CVE, AWS can significantly manage the lifecycle of their machines, and have ~5% of all of their machines "unschedulable" at any one time, waiting for existing processes to complete so that they may use an orderly restart before doing a boot+nuke. A SLA of "Tasks may never run longer than X days"(x=10-30) allows them to perform orderly restarts.
Nothing is "broken" about it. It's just that when you have tens of thousands of machines that might need an urgent security update, it's very inefficient and costly to destroy all of them at once instead of patching. Destroying machines regularly is not the same thing as frequently destroying all of them at once.
Bottlerocket is built in part based on our experiences operating AWS Fargate, which obviously has as one of its needs the ability to patch a colossal number of hosts which are running people's containers, without downtime or disrupting their containers. Bottlerocket is designed to ensure that this is both efficient and safe. We aren't the only ones with this need. Many large orgs also have tremendous fleets, and its unacceptable to cause significant disruption by rotating at the host level.
Another aspect to consider is stateful workloads that are using the local disks. Bottlerocket lets you safely update your host if you are running something like a database or other stateful system where you don't really want to move your data around.
Not everyone will need to use this updating mechanism, but I think it will be very attractive to many of the larger organizations with a lot of infrastructure.