| > Building VM images is just another step in the CI/CD pipeline, and patching and deploying a zero day fix becomes "kick off the CI/CD pipeline". You can even do automated unit testing of an image with tools like Inspec. There is a non-zero cost in maintaining that process, including paying people to know and understand things like Inspec, packer, and Linux troubleshooting. There are also full OS upgrades where assumptions made could be invalidated, along with revising your process accordingly. > Scaling is an API call to change a "desired instance count" value (if it's not already automated) That automation is significant complexity. You'll be maintaining whatever health / resource checks are necessary to determine when scaling up is necessary, when scaling down should be done, what initialization / teardown tasks need to be done, etc. You'll also need some kind of health checks / monitoring to ensure this process is operating as it should so that you can detect if there's a problem with it. All of that needs to be known / understood / documented / maintained by someone. And that's only for the stateless part. If you're trying to do with same with a relational database, it only gets tougher. > and complex problems with any individual instance can be resolved with STONITH (terminating the instance). Only if the problem is truly non-recurring and only in a single instance. Otherwise, it will be Linux troubleshooting to find out if it's your software, an OS patch, a third party software patch, or some other issue. |
> There is a non-zero cost in maintaining that process
Just as there is a non-zero cost associated with maintaining Lambda, API Gateway, and the associated CloudFormation scripts, and finding people who can (and are willing to) maintain them.
> That automation is significant complexity
99% of that complexity is already shouldered by AWS and their ilk. They implement log forwarding, metric dashboards, instance health checks, and simple (complete) examples of how to scale based on CPU and memory - the two metrics used for scaling in most cases.
As for OS upgrades, yes, those can require a bit more expertise. That said, those occur every two to four years, and for the past few OS upgrades I've had to handle, the pain was limited to converting sysvinit scripts to upstart scripts, to unit files (none of which were strictly required, as an aside, since both upstart and systemd support sysvinit scripts natively).
> If you're trying to do with same with a relational database, it only gets tougher.
You mean RDS? Databaes need to be maintained no matter how the application is run. For a quick personal anecdote, there's a world of hurt waiting unless someone is hired who knows how to manage and tune databases, no matter who runs the infrastructure.
> Otherwise, it will be Linux troubleshooting to find out if it's your software, an OS patch, a third party software patch, or some other issue.
How is this different? Linux troubleshooting skills won't help to identify if it's third party software or your software - and those pains don't go away magically with Lambda. In the exceptionally rare case that it is the OS, it will be fixable by kicking off your CI/CD pipeline.
A small tip: Like compilers, the problem isn't the OS. Even when you think it's the OS, it's not. It's your software. OOM killer taking out processes? Those processes are leaking memory. Running out of disk space? Clean up the logs. Cron is misbehaving? Fix the typo. It's also worth mentioning that all of those problems are at least temporarily resolved by STONITH; enough to give time to fix the application.