|
|
|
|
|
by jerf
4182 days ago
|
|
It probably won't be suitable for map-reduce. Buried in the FAQ is a statement that a "Lambda" function (scare-quoting because darn it, that name has been taken for longer than anybody working on it has been alive and is still in active use... grrrr... I'd like to see their trademark on that denied) can only run for up to a minute, with the initial default set to 3 seconds ("How long can a Lambda function execute?"). It's suitable for flinging a map-reduce job in response to some event, but I wouldn't try to jam a map-reduce job into those constraints. I mean, sure, yeah, theoretically possible, but really the wrong way to do it. If you're doing a task that even takes a second or two in Lambda you're coming perilously close to being less than an order of magnitude from a hard cutoff, which isn't a great plan in the public cloud. You really ought to be targeting getting in and out of Lambda much faster than that, and anything that needs to be longer being a process triggered in another more permanent instance. |
|
I can stream a 100 MB chunk from S3 and map it concurrently as it streams in 10 to 15 seconds. Sixty seconds is more than enough time to process a chunk.
The bigger issue is that during the preview, Lambda is limited to 25 concurrent functions.
If Amazon delivers a product where "the same code that works for one request a day also works for a thousand requests a second[1]," then you might be able to analyze hundreds of gigabytes of data in a few seconds, spin up no servers, and only pay for the few seconds that you use.
500gb = 5000 chunks of 100mb each.
1000 concurrent tasks each running 10 seconds could process 500gb in 50 seconds.
You would use 5000 Lambda requests out of your free monthly allotment of 1,000,000. You'd also consume 5000 * 0.1gb * 10 seconds = 5000 gb-sec of your free monthly allotment of 400,000.
S3 transfer is free within the same region, and S3 requests cost $0.004 per 10000 GETs, or $0.002 for this query.
Even after you exhaust the free Lambda allotment, processing 500gb would cost $0.000000208 * 100 * 5000 or about 10 cents.
Scaling this up, querying 10 terabytes would take about 20 minutes to execute, cost $2 for the query, and about $300 per month for storage.
For sporadic workloads it might be more responsive and much cheaper than spinning up a fleet of machines for Hadoop or Spark.
[1] http://www.allthingsdistributed.com/2014/11/aws-lambda.html