Hacker News new | ask | show | jobs
by symlock 2867 days ago
I'm not a "serverless hater", but every company I've ever worked with had backend processes that were not tied to HTTP requests. I still keep actual servers around because the HTTP gateway is not the pain point. It's long-running processes, message systems, stream processing, and reporting.

That said, I look forward to the company (or side project) where "serverless" can save me from also assuming the "devops" role.

4 comments

At my last gig, we were using Firebase, Google's acquired-and-increasingly-integrated serverless solution. It was straightforward to have custom GCP instances that integrated with and extended our regular serverless workflows. In that scenario, it meant the compute instances tended to be extremely simple, as they were essentially just glorified event handlers.

Interestingly, as Firebase evolved during our use, nearly all of our external-instance use cases were obsoleted by more powerful native serverless support, esp. around functions.

All of which is the best of both worlds for serverless: an easy escape hatch to custom instances, and an ever-decreasing need for that escape hatch.

Hello, I'm building a serverless platform. Could you please expand your "It's long-running processes, message systems, stream processing, and reporting" bit?
Not parent, but I have the same question; I worked in adtech and video analytics before, now with social media. It's usually a mix of some REST APIs, which already are very easy to scale and manage without using serverless, with long-running backend processes, such as:

* video encoding;

* ETL processes;

* other analytical workloads;

* long-running websocket connections with Twitter/Facebook/etc APIs.

From my perspective, serverless solves the "boring" part of making REST APIs easier to manage, which were already very easy to manage.

How would serverless be applies to, say, a python script that streams Twitter tweets using websockets?

You would probably use something like a queue[0] that takes in data from the websocket and dishes it out to lambda functions. You might also use something like Kinesis[1] or other alternatives.

[0] https://aws.amazon.com/sqs/ [1] https://aws.amazon.com/kinesis/

Yes of course, or I could send it into Kafka instead (which makes more sense to me). The point is, how would a serverless process looks like which doesn’t have a REST API and does this long term polling of websockets?
Some platforms like Amazon Lambda let you set up functions to consume data from a variety of event sources: https://docs.aws.amazon.com/lambda/latest/dg/invoking-lambda...
How can I consume the Twitter stream API using that?
Serverless is fantastic for ETL and data analysis, especially for workloads that vary in scale (eg cronjobs). Feed data in, get data out with scaling as needed.
but how do you feed data in? Usually, it's some other service on one of the big 3 cloud providers. I'm using google for my projects these days so it's a mix of Google PubSub and Dataflow.

I think this is the issue/risk with serverless. You either get locked into one of the big 3, or you end up doing all of the ops work to run your own stateful systems. As some of the people above you said, managing and scaling the stateless HTTP components is not the hard/expensive part of the job.

Can't you use a queue service that's essentially just managed kafka/activemq/other-standard-system? I mean sure if you wanted to move off the cloud vendor you'd have to run your queues yourself, but if you're programming to the API of well-known open-source queue system then you're never going to be very locked into a particular vendor.
The short answer is yes, you can do that, but it starts to get nuanced rather quickly. The context of this is a desire to go “serverless” and that solutions like this only give you serverless for the relatively easy parts of your stack. If your goal is to go “serverless” I take that to mean a few things listed below.

    1) you don’t have to manage infrastructure
    2) you don’t have to think about infrastructure (what size cluster do i need to buy?)
    3) you pay for what you use at a granular level. (GB stored, queries made, function invocations, etc)
    4) scale to zero (when not in use, you don’t pay for much of anything)

Most things don’t hit all of these points, but typical managed services hit very few of these points. Sure, I can use a managed MySQL, but it only satisfies 1 of the 4 points.
How does one get locked in when it’s a simple function in X language? Seriously, serverless is just an endpoint they provide. You write the code and they handle everything else.
Because the function is the stateless easy part. To make any non trivial system, in a serverless way, you have to use their proprietary stateful systems. IN my case, google pubsub, google data flow, google datastore, Spanner, etc. that’s where the lock in happens.
> all of the ops work to run your own stateful systems

Can we please call it "stateless" instead of "serverless"?

Well, there are stageful serverless products like Google Datastore.
I think all that is still possible in Serverless. I'm not a serverless architect or anything, but that's typically handled by various serverless queues and related event systems.
Sounds expensive.
Don't most serverless calls have a time limit?
Yes, but you typically build your functions to split up the work if necessary. (Create additional queued events)
Functions that respond to events, where an event is triggered by some sort of message queue (ie, Lambda + Kinesis streams)
Which usually have very low time restrictions in the order of a few minutes.
Break your operation into a series of discreet tasks. For 99% of use cases, if you have an discreet task that takes 5+ minutes, there's a problem. In most cases, it can be split up.
"Discrete" rather than "discreet", but yes.