Hacker News new | ask | show | jobs
by daenz 1518 days ago
I'm not an expert in this area, but have you all considered using CRIU[0] (checkpoint restore in userspace) for container-based Lambdas to allow users to snapshot their containers after most of the a language's VM (like Python) has performed its startup? Do you think this would reduce startup times?

0. https://criu.org/Docker

1 comments

That's a good question!

Accelerating cold starts with checkpoint and restore is a good idea. There's been a lot of research in academia around it, and some progress in industry too. It's one of those things, though, that works really well for specific use-cases or at small scale, but take a lot of work to generalize and scale up.

For example, one challenge is making sure that random number generators (RNGs) don't ever return the same values ever after cloning (because that completely breaks GCM mode, for example). More details here: https://arxiv.org/abs/2102.12892

As for CRIU specifically, it turned out not to be the right fit for Lambda, because Lambda lets you create multiple processes, interact with the OS in various ways, store local state, and other things that CRIU doesn't model in the way we needed. It's cool stuff, though, and likely a good fit for other use-cases.