| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by quacker 2745 days ago

> 1. It does not seem impossible to imagine a function that spawns code close to data, be it on a VM with a connected fast SSD drive already populated with data. Also, Lambda-at-edge and Cloudflare workers are already more like “shipping code to data.”, or " the customer" in this case.

This would work, of course. But doesn't it defeat at least some of the convenience of a "serverless" architecture if I still need to manage/configure servers with attached (and pre-populated) storage?

> 2. Functions are load-balanced and potentially parallelisable to millions of invocations...

Continuing from point (1), if the code needs to run proximate to data it may be difficult to achieve a huge number of parallel invocations. My parallel capacity is limited by the number of servers available for function execution, which is only those servers with direct/fast access to storage.

1 comments

jonhohle 2745 days ago

> This would work, of course. But doesn't it defeat at least some of the convenience of a "serverless" architecture if I still need to manage/configure servers with attached (and pre-populated) storage?

It might not be you who maintains the server. Internally, Amazon’s DynamoDB equivalent allows code owned by teams to run on data nodes triggered by events (writes, deletes, fetches). That code is run in a sandbox with certain constraints that ensure computation stays local. It’s serverless for the function owners.

link

callalex 2745 days ago

In my experience that’s really only true at small scale. Once your dataset/traffic volume gets bigger you have to start getting much more hands on with sharding, keying/affinity, and availability.

link

jonhohle 2740 days ago

When I left Amazon, this was a single data store with thousands of partitions, hundreds of billions of records, dozens of teams writing functions that ran on it, thousands of data sets, and hundreds of thousands of requests per second being made. Our team had several functions that handless thousands of requests per second. It was a critical piece of infrastructure, for among other things, Amazon retail, Prime, etc.

Sure, there was a team that owned the platform, but that wasn’t us. We were customers akin to AWS customers.

link