Hacker News new | ask | show | jobs
by mjb 1518 days ago
It's not entirely accurate that Lambda pulls container images from ECR at start-up time. Here's me talking about what happens behind the scenes (which, in the real world, often makes things orders of magnitude faster than a full container pull): https://www.youtube.com/watch?v=A-7j0QlGwFk

But your broader point is correct. Cold starts are a challenge, but they're one that the team is constantly working on and improving. You can also help reduce cold-start time by picking languages without heavy VMs (Go, Rust, etc), but reducing work done in 'static' code, and by minimizing the size of your container image. All those things will get less important over time, but they all can have a huge impact on cold-starts now.

Another option is Lambda Provisioned concurrency, which allows you to pay a small amount to control how many sandboxes Lambda keeps warm on your behalf: https://docs.aws.amazon.com/lambda/latest/dg/provisioned-con...

2 comments

Pardon the ignorance, but is the state of lambda containers considered to be single-threaded? Or can they serve requests in parallel?

If I had a Spring Java (well, Kotlin) app that processes stuff off SQS (large startup time but potentially very high parallelism), would you recommend running ECS containers and scale them up based on SQS back-pressure? Or would you package them up as Lambdas with provisioned capacity? Throughput will be fairly consistent (never zero) and occasionally bursty.

I would not use Spring, or Java for that matter, for lambdas, speaking from experience.

"Lambda containers" is a bit of a misnomer, as you can have multiple instances of a function run on the same container, it's just that initial startup time once the container shuts down that is slow (which can be somewhat avoided by a "warming" function set to trigger on a cron).

I would definitely go with containers if your intention is to use Spring. ECS containers can autoscale just the same as lambdas.

There's some work being done to package Java code to run more efficiently in serverless computing environments, but IIRC, it's not there yet.

Thanks! I wasn't planning it, but can't hurt to ask.

When I looked the Lambda API looked uncomplicated to implement (I saw an example somewhere) and it felt like you could just write a few controllers and gain the ability to run a subset of functionality in Lambda, especially if your app could be kept warm.

(to your cron comment, I thought that the reserved capacity would mean the container would be forcibly kept warm?)

Provisioned concurrency is nice, but can get pricey, especially in an autoscaling scenario. It moves you from a pay-per-usage situation to hourly fee + usage model. I would wait until your requirements show you absolutely need it. For most use cases, you will either have enough traffic to keep the lambda warm, or can incur the cost of the cold start. Warming functions did the trick for us. If you think about it provisioned concurrency is paying for managed warmers.
Spring is a one thing, Java is really another. One can use Java without reflection, and then the cold starts are really reduced. Additionally, there's a GraalVm which is optimized VM which should be even faster. On top of that, if the reflection is not used, these days one can compile Java to the native image, which has none of these problems.
When you say fast though, you really are talking in comparison to other methods of using Java on Lambda. But compared to using something like Go, they are all slow.
Each container handles requests serially. This doesn’t preclude you from spawning multiple threads in Lambda to do background work though.
Serially, but up to ten requests in a single batch

> By default, Lambda polls up to 10 messages in your queue at once and sends that batch to your function.

From https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

I'm not an expert in this area, but have you all considered using CRIU[0] (checkpoint restore in userspace) for container-based Lambdas to allow users to snapshot their containers after most of the a language's VM (like Python) has performed its startup? Do you think this would reduce startup times?

0. https://criu.org/Docker

That's a good question!

Accelerating cold starts with checkpoint and restore is a good idea. There's been a lot of research in academia around it, and some progress in industry too. It's one of those things, though, that works really well for specific use-cases or at small scale, but take a lot of work to generalize and scale up.

For example, one challenge is making sure that random number generators (RNGs) don't ever return the same values ever after cloning (because that completely breaks GCM mode, for example). More details here: https://arxiv.org/abs/2102.12892

As for CRIU specifically, it turned out not to be the right fit for Lambda, because Lambda lets you create multiple processes, interact with the OS in various ways, store local state, and other things that CRIU doesn't model in the way we needed. It's cool stuff, though, and likely a good fit for other use-cases.