Anyone know if there have been any improvements to cold start times for Lambdas in a VPC? That was the absolute death knell for us. If you're using Lambdas as a service backend for mobile/web apps, it's extremely common those Lambdas will be talking to a DB, and any decent security practice would require that DB to be in a VPC. Cold starts for Lambdas in a VPC could be on the order of 8-10 seconds: https://medium.freecodecamp.org/lambda-vpc-cold-starts-a-lat...
I just got out of a session at re:invent where they covered that they were working on improving VPC cold start times by reducing the amount of time it takes to associate an ENI with your lambda function. The method they're using also reduces the number of IPs needed to 1 per subnet.
We recently had to abandon Lambdas, 10+ second cold start, and for some reason when adding an API Gateway you get another +200ms on each request (Google it, common issue apparently).
So, 10+ seconds cold start, and 200 + 200-300ms (around 500-600ms avg) calls to the Lambda function. Complete garbage for our application at least (I imagine using it for background processing might not be an issue with latency).
Switched over to EC2, less than 200ms response total, no cold starts.
Agreed! I'm much more concerned with VPC performance - I don't have a single lambda outside of a VPC. Firecracker is extremely cool, and I'm very glad to see the improved perf at the VM level, but that's not my bottleneck.
Thankfully, in my case, I have a very steady flow of data so I don't expect too many cold starts.
One thing though, does your lambdas need both public and private access? Else you can place them in a subnet for private only, since the slow part is the ENI for the Nat Gateway.
Cold starts for the VM are only part of the problem. If you're on a JITed runtime, a cold start also means compilation and optimization. It would be nice if they had ways to cache the JITed machine code so they could start back up with it already compiled and optimized.
You can generally resolve it yourself by poking seldom used functions to keep them hot. But no, they haven’t provided a solution to cold start (unless you consider ec2 or fargate a solution).
> You can generally resolve it yourself by poking seldom used functions to keep them hot.
We've tried this and it helps somewhat but when AWS attempts to scale your function based on load, cold starts re-appear. We've moved away from Lambdas where a dependable response time is required.
If you are experiencing cold starts it means that function is not used very often. If it's not used very often that likely means it's not user facing (or something less important like a Terms of Service page). If that's the case, why do you need instant response times?
No, that's not what it means. If you have high concurrent execution, you get 'cold start' every time the underlying service 'scales out' to support more.
The MORE you use lambda concurrently, the more you hit the cold start issue.
Granted, it's just for that one cold start execution per-scale node (and they could probably just optionally pre-fire to warm things in that instance, like with a cache), but it's definitely there horizontally.
I really with they would add an init() callback that is called on cold start but before any traffic is sent to your lambda. It wouldn't help when there are no lambdas running but it could be useful when things are scaling up, especially if you can ask for additional concurrency above the actual concurrency necessary for spikes.
This is along the lines of what the other responses to this comment have said, but https://hackernoon.com/im-afraid-you-re-thinking-about-aws-l... gives a very detailed overview. It's titled "I'm afraid you’re thinking about AWS Lambda cold starts all wrong", because the way you're thinking about cold start times is common (and wrong).
that’s not entirely true. while your warm lambdas can and will take the traffic it your traffic ramps up, additional lambda instances will be spun up. you will pay cold start prices as they are spinning up. so, even if you have a heavily used lambda fn, depending on the traffic your p99 will still look pretty bad and you will not be able to guarantee that all requests will be processes in x ms or less.
That's not for in-VPC functions, although if the underlying instance changes with firecracker migration users might see ENI start improvements. Currently your ENI usage is roughly floor(3GB/Memory Allocation)*(Concurrent Executions). If the 3GB changes users will see huge gains as each ENI creation can take around 9s.
I'm wondering how that's even possible if it includes the time for downloading your code from S3. I.e. normal cold starts (as I understand it) involve fetching the code from S3 to install on a VM. Perhaps they aren't including that time when claiming single milli cold start times?