Hacker News new | ask | show | jobs
by falcolas 2872 days ago
> But removing the necessity of managing scaling, individual instances, and OS / software patching is significant.

With AWS (and other cloud provider) APIs and tools like Packer, these difficulties are vastly overstated.

Building VM images is just another step in the CI/CD pipeline, and patching and deploying a zero day fix becomes "kick off the CI/CD pipeline". You can even do automated unit testing of an image with tools like Inspec.

Scaling is an API call to change a "desired instance count" value (if it's not already automated), and complex problems with any individual instance can be resolved with STONITH (terminating the instance).

1 comments

> Building VM images is just another step in the CI/CD pipeline, and patching and deploying a zero day fix becomes "kick off the CI/CD pipeline". You can even do automated unit testing of an image with tools like Inspec.

There is a non-zero cost in maintaining that process, including paying people to know and understand things like Inspec, packer, and Linux troubleshooting. There are also full OS upgrades where assumptions made could be invalidated, along with revising your process accordingly.

> Scaling is an API call to change a "desired instance count" value (if it's not already automated)

That automation is significant complexity. You'll be maintaining whatever health / resource checks are necessary to determine when scaling up is necessary, when scaling down should be done, what initialization / teardown tasks need to be done, etc. You'll also need some kind of health checks / monitoring to ensure this process is operating as it should so that you can detect if there's a problem with it. All of that needs to be known / understood / documented / maintained by someone.

And that's only for the stateless part. If you're trying to do with same with a relational database, it only gets tougher.

> and complex problems with any individual instance can be resolved with STONITH (terminating the instance).

Only if the problem is truly non-recurring and only in a single instance. Otherwise, it will be Linux troubleshooting to find out if it's your software, an OS patch, a third party software patch, or some other issue.

TL;DR: Both paths require knowledge beyond how to write a web application; using Lambda doesn't absolve you of having to learn about or hire someone to manage your infrastructure.

> There is a non-zero cost in maintaining that process

Just as there is a non-zero cost associated with maintaining Lambda, API Gateway, and the associated CloudFormation scripts, and finding people who can (and are willing to) maintain them.

> That automation is significant complexity

99% of that complexity is already shouldered by AWS and their ilk. They implement log forwarding, metric dashboards, instance health checks, and simple (complete) examples of how to scale based on CPU and memory - the two metrics used for scaling in most cases.

As for OS upgrades, yes, those can require a bit more expertise. That said, those occur every two to four years, and for the past few OS upgrades I've had to handle, the pain was limited to converting sysvinit scripts to upstart scripts, to unit files (none of which were strictly required, as an aside, since both upstart and systemd support sysvinit scripts natively).

> If you're trying to do with same with a relational database, it only gets tougher.

You mean RDS? Databaes need to be maintained no matter how the application is run. For a quick personal anecdote, there's a world of hurt waiting unless someone is hired who knows how to manage and tune databases, no matter who runs the infrastructure.

> Otherwise, it will be Linux troubleshooting to find out if it's your software, an OS patch, a third party software patch, or some other issue.

How is this different? Linux troubleshooting skills won't help to identify if it's third party software or your software - and those pains don't go away magically with Lambda. In the exceptionally rare case that it is the OS, it will be fixable by kicking off your CI/CD pipeline.

A small tip: Like compilers, the problem isn't the OS. Even when you think it's the OS, it's not. It's your software. OOM killer taking out processes? Those processes are leaking memory. Running out of disk space? Clean up the logs. Cron is misbehaving? Fix the typo. It's also worth mentioning that all of those problems are at least temporarily resolved by STONITH; enough to give time to fix the application.

> Just as there is a non-zero cost associated with maintaining Lambda, API Gateway, and the associated CloudFormation scripts, and finding people who can (and are willing to) maintain them.

This is mostly true, although "maintenance" for those things is minimal. There are fewer moving parts you are responsible for maintaining, and the ones requiring ongoing changes (OS and software management) don't exist. To maintain an existing web application, you are on the hook to potentially ship updated libraries (but not runtimes) in your functions, and to pay your AWS bill. This is like a half-step above no maintenance at all.

If I build a web application for a client that I deploy using a modern serverless architecture, it will require virtually no hands-on maintenance from me for... years? If I build a web application with a more traditional stack, I will definitively need to charge some amount for maintenance because it's not feasible to ignore patching or assume patching won't break all the automation I'd have around scripting, health monitoring, deployment, and everything else.

That's a significant difference.

> 99% of that complexity is already shouldered by AWS and their ilk. They implement log forwarding, metric dashboards, instance health checks, and simple (complete) examples of how to scale based on CPU and memory - the two metrics used for scaling in most cases.

At what number do I scale up? At what number do I scale down? How do I detect when there's a problem with the instances coming up? And I'm familiar with AWS--they certainly help with those things, but it's still on you to have the log forwarding agent running on your box, to set up the dashboard, to ensure you have the separate agent running on your box to forward memory usage metrics, and to ensure you're not doing anything that won't break your automatic minor version upgrades for your AMI (or manage your own, if you're not using EB or don't use that feature).

It's a whole lot better than doing it without those AWS services, but it's a significant step away from what you get with a serverless architecture.

> You mean RDS? Databaes need to be maintained no matter how the application is run. For a quick personal anecdote, there's a world of hurt waiting unless someone is hired who knows how to manage and tune databases, no matter who runs the infrastructure.

If you use RDS, sure. If you're using DynamoDB or (soon) Serverless Aurora, it doesn't require nearly as much tuning or babysitting.

> How is this different? Linux troubleshooting skills won't help to identify if it's third party software or your software - and those pains don't go away magically with Lambda.

Sure they can. Linux troubleshooting skills would tell you if an updated third party tool is now leaking memory, for example. And they often do go away with Lambda because your functions run on a level of seconds or minutes instead of hours, days, or more. Every function is effectively terminated every few hours at most. You could have problems with your libraries, but that's a much more limited troubleshooting scope than an entire VM.