Hacker News new | ask | show | jobs
by jedberg 3532 days ago
This seems to come up every time serverless comes up. There should probably be some better docs around this.

It's true each function needs it's own connection, but in reality:

1) The containers actually stick around for a while and get reused so if you write the code correctly it only has to establish the connection once per container invocation

2) Unless you are doing a lot of traffic you'll probably only realistically have a few containers running your functions so it will only be a few connections.

3) If you end up with enough traffic that it actually becomes a problem, it would have been a problem anyway because you'd be running a lot of servers with persistent connections in a more traditional model.

In other words, the number of connects and set up and tear down is about the same as in a traditional setup, maybe just a little bit more.

Edit: One more thing. Sometimes a counter I hear is "Yeah but every function needs it's own connection". I counter that with the contention that even with a traditional setup, a good abstraction means only one or maybe two functions actually talks to the database -- everyone else should be getting their data from that function. Also if you do it that way that one function can do some smart caching (which survives at least a few minutes with serverless).

2 comments

> If you end up with enough traffic that it actually becomes a problem, it would have been a problem anyway because you'd be running a lot of servers with persistent connections in a more traditional model.

I think this is the part I disagree with. DB connection pools are much, much smaller than than the total # of functions that touch a database in any reasonably complex application.

Yes, scale is always an issue, but it seems to me that in this serverless world where you have 1 connection per function you run into scale issues a lot(order of magnitude?) faster than the "traditional" way.

> a good abstraction means only one or maybe two functions actually talks to the database

In a serverless world, does this mean you would run a handful of functions with DB connections, and other functions would proxy db requests through them? I can see that working ok I suppose.

For what it's worth, what we've been doing is building a separate service for talking to the database, which itself maintains a single common database connection pool.

I can't say yet whether that turns out to be a good idea or a poor one. (I'm not the one who designed and built it.) One notable feature of our implementation is that it eliminates all possibility of using transactions -- a design oversight that worries me.

You could build the transaction support into the database service. Then when you need to write multiple things, you put them into a queue as a single work unit, and let your abstraction deal with taking the work unit off the queue and putting into the database in a single transaction.

This has the added effect of making your system more reliable because you'll be using queues and you have a shorter window when a process can die and hang a db connection that is trying to roll back.

You certainly COULD (and if I were designing it I would have either done that or allowed callers to request a transaction in which case a connection is temporarily reserved for that client and a token returned which can be used to continue the transaction). But the people designing it DIDN'T do either. Which is part of why I question their design.
I wouldn't call it so much proxying the DB requests through it as all the other functions do business logic, only one or two actually marshal data into and out of the datastore.

So yeah it's kind of like a proxy, but think of a monolithic application. Do you make it so every object in your application talks to the DB, or do you have a DB object, which handles the connection pooling and all of that other stuff? If you have a DB object, that becomes your DB function, and all the other functions talk to it for getting and writing data.

Wouldn't it also be reasonable to use a remote API call to store data too?

It's one of the things I have in mind for https://dbhub.io. Thinking it should be a good fit (especially for Serverless apps), but haven't yet written the code to try it out.

Likely to do so in near future though (weeks, not months).