Hacker News new | ask | show | jobs
by djhworld 2906 days ago
I remember a few years ago we tried to implement a scheduled Lambda that needed to download a bunch of files from an S3 prefix, perform some aggregation on the data and then write the result to a database.

Our EC2 prototype of this on one of the m3 class instances could do the work in about 2 minutes which seemed a perfect opportunity to port to Lambda.

Even on the top memory instance at the time (1536mb), the job just couldn't finish, timing out after 5 minutes. The code was multi threaded, to parallelise the downloads, but not matter how much we tweaked this the Lambda would just never complete in time.

As you don't have visibility of the internal we didn't know whether this was due to CPU constraints (decompressing lots of GZIP streams), network saturation (downloading files from S3) or what.

In the end we gave up. Didn't have the time or resource to keep digging, and just pinned the problem on the use case we were trying to fit was against what Lamba is designed for

Not saying this is an indictment of Lambda, we use it in lots of places, with a lot of critical path code (ETL Pipelines).

3 comments

We’ve found lambda’s x-ray feature to be very helpful wrt finding the source of slowdowns. I know it wasn’t available during the project you were writing about but wanted to mention it for others.
I thought the use case for things like Lambda were more along the lines of rarely used web requests that you'd save money on by not running a full box. I do remember them being slow too.
Nah, I think the scope is wider then that.

In my case we use lambda to perform ETL based on S3 events, so when a file drops into S3, Lambda is invoked to process it.

That works very well for us and is cheaper than running a box 24x7, as the file drops arrive sprodically throughout the day and Lambda can scale to meet the demand.

If your job is easily parallelizable then you can run multiple lambdas in parallel. For the above use case they probably should have kicked off one lambda per prefix or similar.
That's exactly what we were doing. 1 Lambda to download and aggregate all files under a prefix.

The problem was the task just couldn't complete in < 5 minutes.

You have to fan out further then, process each file separately and aggregate the aggregates, using SQS or something else to queue up the processing.

Azure's Durable Functions have an advantage here in making extreme fan-out situations easy.

We considered it, but at the time we just felt implementing map/reduce over Lambda would just introduce a more complex architecture for such a simple problem.

Maybe the recently introduced SQS->Lambda support might make it a bit cleaner, but in the end we opted for EC2.

This is a fight I currently have with our dev-team atm. Situational awareness is a real concern, especially if the code is misbehaving in productive loads. If a solution doesn't give us situational awareness if things go wrong, I object to that solution.