Hacker News new | ask | show | jobs
by rawrmaan 2882 days ago
I think a lot of people have tried “serverless” and found it to present more challenges than it solves. How, for example, do you connect to a Postgres database from Lambda/Cloud Functions? As far as I can tell, the answer is: You don’t, you use a different database.

No-worries devops experiences are nothing new. See Heroku.

12 comments

I think the example you give is more because there's a mismatch between technologies, rather than the "fault" of serverless. In serverless, your endpoints become infinitely scalable. This doesn't go well when they're backed by technologies where there is a hard limit on the number of connections, for instance SQL servers or a Redis server. I think therefore that SQL database technologies have to adapt to the serverless paradigm rather than dismissing serverless "because it brings so many other issues". I think AWS has already started that with Amazon Aurora.

That aside, I agree that there are a lot of secondary concerns that are important when running things in production but that aren't available out-of-the-box when you run something on AWS Lambda. I'm thinking about error monitoring, performance monitoring, logging,... All those things need to be set up and that's quite time-consuming.

However, I think that's more due to serverless being relatively new and not as mature as the traditional way of doing. I don't think it will take long before we'll have the equivalent for serverless of adding `gem 'newrelic_rpm'` to your Gemfile and magically having performance and error monitoring across your app.

> I think therefore that SQL database technologies have to adapt to the serverless paradigm rather than dismissing serverless

Empty statement that means nothing. SQL/RDBMS is backed by computer science and robust engineering examples that make the world spin. Alternatives are usually full of fanfare and false promises.

> I think AWS has already started that with Amazon Aurora.

Just in time when we were talking about fanfare. Spend 10-20 minutes searching on the Internets to see what actual experiences people have with it.

Its 3X write increase? Bollocks. Usually the performance is worse than when you administer your own DB(Postgresql/MySQL). You might(or not) see some read-performance increase, which... well everyone can scale on reads so I don't see the point.

I suspect it has other goodies pertaining to administration/provisioning, but performance/scaling is not one of them.

>> I think therefore that SQL database technologies have to adapt to the serverless paradigm rather than dismissing serverless

> Empty statement that means nothing. SQL/RDBMS is backed by computer science and robust engineering examples that make the world spin. Alternatives are usually full of fanfare and false promises.

Traditional relational databaases have indeed solved many issues that some newer datastores struggle with. But the flip side is that it is non-trivial to design traditional databases that are not Single Points Of Failure.

Storing data is surprizingly hard in a cloud environment, and involves trade-offs. Reaching a comprehensive solution (fast, HA, consistent, easily recoverable, scalable volum, evolvable schemas...) is hard no matter what technology you pick.

Not being able to handle more than a few connections without connection pooling is nothing to do the using SQL. It is just a different bit of optimization that is needed to support fast transient connections without pooling.
It just seems to me like the "infinite scalability" promise of serverless is only realistic if you have no database. Because inevitably, you'll hit database scaling issues due to query patterns and suboptimal indexes LONG before you'll have a hard time scaling up your fleet of servers because you're getting too many requests.
I'd also like to see the AWS bill once you hit infinite scale. AWS is already pretty expensive on their own...
You connect the same way as you do in a regular app. Which is to say, you open the database connection outside of the request handling method (for example, as a global) and then use it from within the request handling method. When your app wakes up again for another request, the database connection is still open and you just use it.
As far as I know this works but more as a hack not as a robust officially supported solution.
How is that a “hack”? You create your DB and you get a connection string to a publicly accessible database or you create it inside a VPC and you configure your lambda to run inside a subnet within your VPC and you configure your security group. This can all be configured within the console.
The main issue with this approach is that running your lambda in a VPC results in painfully slow cold starts, on AWS at least.
IDK why the parent mentioned VPC. It's not necessary.
Without a VPC, how do you not expose your Aurora cluster to the world?

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Auror...

Aurora DB clusters must be created in an Amazon Virtual Private Cloud (VPC). To control which devices and Amazon EC2 instances can open connections to the endpoint and port of the DB instance for Aurora DB clusters in a VPC, you use a VPC security group. These endpoint and port connections can be made using Secure Sockets Layer (SSL). In addition, firewall rules at your company can control whether devices running at your company can open connections to a DB instance. For more information on VPCs, see Amazon Virtual Private Cloud (VPCs) and Amazon RDS.

I may misunderstand your comment.

That said, where I work (MaaS Global) we have a production PostgreSQL database hosted on AWS Relational Database Service (RDS):

https://aws.amazon.com/rds/postgresql/

We connect to the AWS RDS instance in our lambda functions using an ES library called knex.js and some environment variables to store the DB credentials:

https://knexjs.org/

How do you deal with the 10+ second cold start times for Lambda when using it in a VPC? Are you pre-warming your lambda functions? Did you open up your RDS instance to the world so you could connect to it from a public lambda network? I know you had to pull some magic, because I've been down that road.

It's been a problem for years and there's been no sign of a solution. Example article from last month: https://medium.freecodecamp.org/lambda-vpc-cold-starts-a-lat...

These are the sorts of problems that turn people off from using serverless architectures.

Your right that the cold start times are not ideal. But you get a huge free request load per month. Put an uptime pinger on it and keep it warm. Or do what I do and write your functions in golang. My average cold start time is around 4 seconds.

For the DB connection you put the lambda in the same vpc that the RDS exists in. Then you open the connection pool and reuse it if its active. Not that a new connection is a big overhead over leveraging an established socket.

Wonder where all this misinformation is coming from on lambda DB access issues.

I know uptime pingers are easy and obvious solutions (I use them myself), but everytime I have to resort to this sort of hack it reminds me of how immature serverless is.
Here's the problem. Uptime pingers work great if you have a low volume service. You keep 1, 2, or maybe 3 instances of the function warm, and you don't have to deal with cold start times. But there's 2 places that idea falls seriously flat.

1. This doesn't work if you were actually trying to build your API as microservices. You might have 60+ functions, some which call each other, and keeping them all warm is not really a good option.

2. Keeping a minimum number of instances warm fails to account for half the point of using serverless architectures: being able to scale. Sure, if you have little to no traffic, you can keep a couple instances warm and be up, but if your app needs to scale to 5 or 10 or more instances to handle bursts of traffic, the surfers who hit that cold start end up dealing with an extremely bad experience.

More importantly, as Lambda gets more popular, uptime pingers get less and less useful because of tragedy of the commons. The reason for needing cold starts at all is that AWS is rotating out instances to be able to keep up with overall demand with limited resourcs. If only a few people are sending heartbeats to their instances, their instances stay in rotation because other people's get rotated out instead. If everyone is sending heartbeat requests, some of them will still end up getting rotated out, and therefore everyone will need to increase the frequency of the heartbeat requests to keep their functions warm. It's not a sustainable solution, and I'm baffled that AWS tacitly promotes it as a resolution to the problem they themselves have caused.

It's been years. AWS needs to fix Lambda VPC cold starts.

AWS is fixing it by moving to IAM authentication in the serverless ecosystem, rather than network segmentation. Serverless Aurora will support IAM auth at scale via its HTTP query protocol.

Keeping Lambda functions warm is great until you have 2 or more requests hitting the function simultaneously. They won't queue behind the pre-warmed function, they will spin up additional Lambda containers to serve in parallel. Unless you don't expect to get concurrent requests, there's no effective way to pre-warm Lambda functions.

Have any data on the variance on that 4 second average? That sounds very tolerable on its face.
My employer offers FaunaDB with a pay-per-request pricing model. To bypass cold-start lambda issues, I code the app to talk directly to the database. For certain richer functions I might invoke a Lambda, but for basic crud operations the database access control does the trick. And no cold-start issue.

Here's the data model part of my todo app if you want to see queries in the app: https://github.com/fauna/todomvc-fauna-spa/blob/master/src/T...

AWS also has NoSQL cloud solutions, particularly DynamoDB, and maybe SimpleDB if you want to risk building on a someday deprecated service.

Those options work fine, if you were OK with using a NoSQL DB. But what if you wanted to use an actual relational database? For that you pretty much need Lambda in VPC, and it's not really usable because of the cold start issue.

At some point Amazon will release Aurora Serverless[1], giving a serverless option for an on demand relational database. Will that work somehow with Lambda without needing VPC, therefore defeating the cold start issue? What cold start issues we'll it have itself? I guess we'll wait and see for now.

1. https://aws.amazon.com/rds/aurora/serverless/

> My employer offers FaunaDB with a pay-per-request pricing model.

Tried FaunaDB few month ago the latency was beyond 200ms for a simple a read , and beyond 600ms for an insert.

Would not recommend it at this point.

We don’t expect you to see that lag. Other users don’t see it or haven’t reported it. What region are you accessing in and how did you generate the result?
> How do you deal with the 10+ second cold start times for Lambda when using it in a VPC?

And I was complaining about 500ms cold start times on Firebase Functions.

I think I'll stop complaining now.

It’s still the early days so there are pain points, but Amazon already announced a solution to this: Serverless Aurora. It’ll be some time still until it’s public and Lambda-friendly though. And MySQL comes before Postgres.
With Google Cloud Functions (they got the name right), you can simply link with a Cloud SQL instance using a special local socket interface provided by Google Cloud[1]. Their documentation provides complete examples as well on how to use global connection pools for MySQL and PostgreSQL.

[1] https://cloud.google.com/functions/docs/sql

Okay, that's a great solution. I actually had no idea that was possible.
> How, for example, do you connect to a Postgres database from Lambda/Cloud Functions?

Exactly the same way you would with an EC2 instance...

Might not be that easy because only postgres can take a few hundreds of connections which won't work out if you have a few thousands of serverless functions? No persistent connection pools.
Well, just put the connection logic outside of the main handler so it's shared between invocations!

Wait, oops, you have state now.

Realistically could you do this with Redis or something similar? Not sure about security implications of this though...
How do you handle database connection pooling?
I fully agree with you. Based on my experience, Lambda is perfect if you only want to perform some relatively standalone task (such as a compute intensive rendering). As soon as it needs to connect to 3rd party entities, it becomes very slow and loses some of its benefits. Connecting Lambda to an AWS DB for example is challenging to say the least. It also takes a couple tens milliseconds just to setup the DB socket and connection, that on a normal server can simply remain open and wait for the next request.

Serverless is nice, but the ecosystem of serverless tool is really missing today IMO

Connecting Lambda to an AWS DB for example is challenging to say the least.

Why does this keep getting repeated? You get a publicly accessible host and use the same drivers you use on prem or you put both the lambda and the database inside your VPC.

As others have pointed out, connecting to RDS from Lambda is actually pretty trivial by having them run in the same VPC. We actually came up against an issue where our Lambda function needed access to RDS but also to the outside World which meant some extra hurdles to jump [1] but overall our experience with Lambda has been a positive one.

I don't think we'll be considering a fully serverless architecture anytime soon due to cold-start times, but it's awesome for anything outside of the user request/response loop or internal microservices where response time is perhaps not such of a problem.

[1] https://aws.amazon.com/premiumsupport/knowledge-center/inter...

Umm, for AWS, you place the Lambda function into a VPC and open a db connection the same way you would from another server?
I think what they are referring to is that Postgres and most other databases were built in the before time, the long long ago when every connection was a process and you limited concurrency of connections in the configuration file. If you have a 1000 concurrent calls on lambda you aren't going to be able to have them all talking to the same database at the same time. You'll run out of connections and the application will crash. Same reason you see this happen to PHP web applications when HN or Slashdot is pointed at them and they say they can't connect to the database. They have hit the concurrency limit. Connection pooling solves this problem but currently requires another layer between the application and the database.
What do you mean, how do you connect? The same way you connect on prem. Using, the Postgres drivers. You can either have a publicly accessible Postgres instance with a DNS entry (not recommended) or you can run both the DB and the lambda inside of a VPC.
That’s really what’s keeping me from developing stuff with it. I can get the job done with dynamodb but I’m really looking forward to building something with Postgres in a “serverless” context.
imo, the most interesting data point is how in demand serverless architectures are WITH these issues. Imagine how compelling serverless will be once these issues are solved.