Hacker News new | ask | show | jobs
by stingraycharles 2868 days ago
Not parent, but I have the same question; I worked in adtech and video analytics before, now with social media. It's usually a mix of some REST APIs, which already are very easy to scale and manage without using serverless, with long-running backend processes, such as:

* video encoding;

* ETL processes;

* other analytical workloads;

* long-running websocket connections with Twitter/Facebook/etc APIs.

From my perspective, serverless solves the "boring" part of making REST APIs easier to manage, which were already very easy to manage.

How would serverless be applies to, say, a python script that streams Twitter tweets using websockets?

2 comments

You would probably use something like a queue[0] that takes in data from the websocket and dishes it out to lambda functions. You might also use something like Kinesis[1] or other alternatives.

[0] https://aws.amazon.com/sqs/ [1] https://aws.amazon.com/kinesis/

Yes of course, or I could send it into Kafka instead (which makes more sense to me). The point is, how would a serverless process looks like which doesn’t have a REST API and does this long term polling of websockets?
Some platforms like Amazon Lambda let you set up functions to consume data from a variety of event sources: https://docs.aws.amazon.com/lambda/latest/dg/invoking-lambda...
How can I consume the Twitter stream API using that?
It depends what you're doing with that stream, most basically you would create a nano/micro EC2 instance that will just trigger Lambda events on every new tweet. Or you could create some more intricate script that does a lot of pre-processing and then stores it in RDS or S3, and with each new update to either of those sources kick off a Lambda.
Unless the API can stream directly into one of those sources you'd probably need a long-running process, perhaps running on a CaaS like AWS Fargate.

I guess you could argue where to draw the "serverless" line, at functions or containers, but Zeit is calling this container service "serverless" so I think Fargate would fall into the same category. I think it would make sense for Zeit to eventually support long-running containers too (looks like the current max is 30 minutes, I'm not sure how they chose that number)

I don't know this specific API.

Either it works with web-hooks that you could lead to a Lambda via API-Gateway.

Or it needs to pull the data, then you could trigger Lambda via CloudWatch intervals.

Serverless is fantastic for ETL and data analysis, especially for workloads that vary in scale (eg cronjobs). Feed data in, get data out with scaling as needed.
but how do you feed data in? Usually, it's some other service on one of the big 3 cloud providers. I'm using google for my projects these days so it's a mix of Google PubSub and Dataflow.

I think this is the issue/risk with serverless. You either get locked into one of the big 3, or you end up doing all of the ops work to run your own stateful systems. As some of the people above you said, managing and scaling the stateless HTTP components is not the hard/expensive part of the job.

Can't you use a queue service that's essentially just managed kafka/activemq/other-standard-system? I mean sure if you wanted to move off the cloud vendor you'd have to run your queues yourself, but if you're programming to the API of well-known open-source queue system then you're never going to be very locked into a particular vendor.
The short answer is yes, you can do that, but it starts to get nuanced rather quickly. The context of this is a desire to go “serverless” and that solutions like this only give you serverless for the relatively easy parts of your stack. If your goal is to go “serverless” I take that to mean a few things listed below.

    1) you don’t have to manage infrastructure
    2) you don’t have to think about infrastructure (what size cluster do i need to buy?)
    3) you pay for what you use at a granular level. (GB stored, queries made, function invocations, etc)
    4) scale to zero (when not in use, you don’t pay for much of anything)

Most things don’t hit all of these points, but typical managed services hit very few of these points. Sure, I can use a managed MySQL, but it only satisfies 1 of the 4 points.
How does one get locked in when it’s a simple function in X language? Seriously, serverless is just an endpoint they provide. You write the code and they handle everything else.
Because the function is the stateless easy part. To make any non trivial system, in a serverless way, you have to use their proprietary stateful systems. IN my case, google pubsub, google data flow, google datastore, Spanner, etc. that’s where the lock in happens.
Right, because serverless is actually just a cover for "de-commoditizing" the cloud services that companies like AWS built to commoditize datacenters. You hit the nail on the head. It's not completely useless to help less technical people solve the problems that folks like you and I consider "the easy part" and so people will find a use for it.

But the primary utility of serverless is an attempt at solving Amazon's problem of being commoditized by containers.

I’d say something more nuanced. Serverless is increasing commoditization of one layer of the stack at the cost of de-commoditizing a high layer of the stack. This is what makes it a hard decision to grapple with. You’re getting very real benefits from it, and potentially paying a very real cost sometime down the road when being locked into the propietary system bites you.
> all of the ops work to run your own stateful systems

Can we please call it "stateless" instead of "serverless"?

Well, there are stageful serverless products like Google Datastore.