Hacker News new | ask | show | jobs
by stickfigure 1859 days ago
I'm an enormous fan of Postgres, it's my default go-to RDBMS. But the memory expense of connections is a huge issue and this article doesn't convince me that it's solved.

The machine being used for this benchmark has 96 vCPUs, 192G of RAM, and costs $3k/mo.

My business runs just fine on a 3.75G, 1 vCPU instance. But idle connections eat up a huge amount of RAM and I sometimes find myself hitting the limits when a load spike spins up extra frontend instances.

Sure I could probably setup pgbouncer and some other tools but that's a lot of headache. I'm acutely aware that MySQL (which I dislike because no transactional DDL) does not suffer from this issue. I also don't see this being solved without a major rewrite, which seems unlikely.

So Postgres has at least one very serious fault that makes room in the marketplace. The poor replication story is another.

5 comments

It isn't solved, and no one claimed it to be solved. The scalability improvement is related to how we build MVCC snapshots (i.e. information which transactions are visible to a session). That may reduce the memory usage a bit, but it's more about CPU I think.

As for the per-connection memory usage, the big question is whether there really is a problem (and perhaps if there's a reasonable workaround). It's not quite clear to me why you think the issues in your case are are due to idle connections, but OK.

There are two things to consider:

1) The fixed per-connection memory (tracking state, locks, ..., a couple kBs or so). You'll pay this even for unused connections.

2) Per-process memory (each connection is handled by a separate thread).

It's difficult to significantly reduce (1) because that state would no matter what the architecture is, mostly. Dealing with (2) would probably require abandoning the current architecture (process per connection) and switching to threads. IMO that's unlikely to happen, because:

(a) the process isolation actually a nice thing from the developer perspective (less locking, fewer data races, ...)

(b) processes work quite fine for reasonable number of long-lived connections, and for connection pools address a lot of the other cases

(c) PostgreSQL supports a lot of platforms, some of which may not may not have very good multi-threading support (and supporting both architectures would be quite a burden)

But that's just my assessment, of course.

I wonder if the amount of RAM used by a new process can be reduced. Code and other RO segments are shared anyway, so it's only basically the new heap and various buffers.

Reducing this amount would also run Postgres in more constrained environments.

There are two parts of this - the memory allocated by OS and internally.

At the OS level, we can't really do much, I'm afraid :-( I don't think we're wasting too much memory there, exactly because a lot of the memory is shared between processes. Which also makes it difficult to determine how much memory is actually used by the processes (the sharing makes the various metrics in ps/top are rather tricky to interpret).

As for the internal memory, it's a bit more complicated. We need a little bit of "per process" memory (per-backend entries in various internal data structures, etc.) - a couple dozen/hundred kBs, perhaps. It's hard to give a clear figure, because it depends on max_locks_per_transaction etc. This is unlikely to go away even if we switched to threads, because it's really "per session" state.

But then there are the various caches the processes keep, memory used to run queries etc. Those may be arbitrarily large, of course. The caches (with metadata about relations, indexes etc.) are usually a couple MBs at most, but yes, we might share them between threads and save some of this memory. The price for that would be the need for additional synchronization / locking, etc. The memory used to run queries (i.e. work_mem) is impossible to share between threads, of course.

There's a blog post by Andres Freund with more details: https://www.citusdata.com/blog/2020/10/08/analyzing-connecti...

Overall, there's very little chance PostgreSQL switch to threads (difficulty of such project, various drawbacks, ...). But I do agree having to run a separate connection pool may be cumbersome, etc. There was a proposal to implement integrated connection pool, which would address at least some of those problems, and I wouldn't be surprised if it happened in foreseeable future.

And this right here is why PostgreSQL will never overtake MySQL and its forks. The entire industry is sick of these excuses regarding process-per-client instead of a proper multi-threaded model. There may have been a valid argument for this 15 years ago, but not anymore.

Your definition of "reasonable number of long-lived connections" is anything but reasonable. Then "connection pools address a lot of the other cases", when a connection pool/bouncer is unwanted, unwarranted, and just adds another point of failure that needs to be deployed and maintained.

I disagree, for a number of reasons.

Firstly, it's not the goal of the PostgreSQL project to overtake MySQL or other databases, but to serve the existing/new users. This also means we're investing the development effort in a the highest benefit / effort ratio. Even if switching from process-based to thread-based model improved the per-connection overhead, the amount of work needed is so huge the benefit / effort ratio is so utterly awful no one is going to do it. There are always better ways to invest the time / effort. Especially when there are practical solution / workarounds like connection pools.

Secondly, every architecture has pros/cons, and switching from processes to threads might help in this respect but there are other consequences where the process model is superior (some of which were already mentioned). Focusing on just this particular bit while ignoring the other trade-offs is rather misleading.

And no, the arguments did not really disappear. To some extent this is about the programming model (locking etc.), and that did not really change over time. Also, PostgreSQL supports platforms, some of which may not have particularly great threading support.

I'm not claiming there are no workloads / systems that actually need that many long-lived connections without a connection pool. In my experience it's usually "We don't want to change the app, you have to change the DB!" but fine - then maybe PostgreSQL is not the right match for that application.

> Even if switching from process-based to thread-based model improved the per-connection overhead, the amount of work needed is so huge the benefit / effort ratio is so utterly awful no one is going to do it.

Then other products will emerge and overtake some of PostreSQL's marketshare in the long run. It's already happening in fact. Just like more efficient and easier to configure webservers like nginx and caddy are gaining marketshare over Apache httpd.

I love PostgreSQL and don't want to see it becoming the next Apache httpd, slowly but surely fading. Perhaps FAANGs could fund such refactor.

Perhaps a cheaper solution was to incorporate pgBouncer inside PostgreSQL so it would naturally sit in front of PostreSQL in the default installation without extra configuration.

> Then other products will emerge and overtake some of PostreSQL's marketshare in the long run. It's already happening in fact. Just like more efficient and easier to configure webservers like nginx and caddy are gaining marketshare over Apache httpd.

Maybe, we'll see.

It however assumes the other (thread-based) architecture is somewhat universally better, and I doubt that's how it works. It might help the workloads actually requiring many connections to some extent, but it's also likely to hurt other workloads for which the current architecture works just fine.

But let's assume we decide to do that - such change would be a massive shift in programming paradigm (both internally and for extensions developed by 3rd parties) and would probably require multiple years. That's a huge investment of time/effort, with a lot of complexity, risks and very limited benefits until it's done. I'd bet there'll always be a feature with better cost/benefit ratio.

So reworking the other architecture might actually gain us some users but loose others, and drain insane amount of development resources.

> Perhaps a cheaper solution was to incorporate pgBouncer inside PostgreSQL so it would naturally sit in front of PostreSQL in the default installation without extra configuration.

Yes, I already mentioned that's quite likely to happen. There has already been a patch / project to do exactly that, but it didn't make it into PG14.

> There has already been a patch / project to do exactly that, but it didn't make it into PG14.

That's great to hear!

Setting up pgbouncer is not much headache and for for OLTP workloads, it works great. You can even see it in the graph, that best performance is when number of CPU cores = number of connections. And so will be memory use. :)
You may be right that it's easy to set up, but pgbouncer doesn't help with this problem most of the time. It's a problem that needs to be solved within postgres.

There are three pooling modes:

- Session pooling. Doesn't help with this issue since it doesn't reduce the total number of required connections.

- Transaction pooling / statement pooling. Breaks too many things to be usable. (eg. prepared statements...)

See the table at https://www.pgbouncer.org/features.html for what features cannot be used with transaction pooling.

Personally I don't expect this to be ever improved in PostgreSQL (ie. change from process per connection model to something else), so I design my multi-user apps so that everything works fine with session pooling (quick short sessions/connections to pgbouncer) and connections that wait for NOTIFY get made directly to postgresql server, and are also limited in number.

And it works fine on low-resourced machines that I tend to use for everything.

Switching to a threaded model would be a lot of work, but there is a simpler solution that retains most of the benefits: using a process-per-connection model for active connections only, and allowing a single process to have multiple idle connections.

I follow the mailing list because I'm interested in this exact issue. Konstantin Knizhnik sent a patch implementing a built-in connection pooler in early 2019 that uses a similar approach to what I just described. The work on that has continued to this day, and I'm hopeful that it will eventually be merged.

But how's that different from pgbouncer?

EDIT: I see, it would have session state restore, not just DISCARD like pgbouncer.

I agree - the disparity between the cost of idle connections in Postgres vs MSSQL is hampering our ability to migrate.
Can you elaborate / quantify the memory requirements a bit? I don't have much experience with MSQQL in this respect, so I'm curious how big the difference is.
Sure, SQL Server supports a maximum of 32767 connections each of which use around 128kB. Meaning that if you use the max connections you’ll need 4GB for the connection overhead.

We see no noticeable drop in performance with increased idle connection with our workload.

Why are you migrating out of curiosity? Price reasons?
Yes, we have multiple RDS instances and wish to reduce costs.
Out of curiosity, do you know what causes this?
They mention this in the article. But to sum up, each connection in PG is handled by its own OS process. Postgres behind the scenes is composed by multiple single-threaded applications.

This comes with the advantes for Pg developers (and us!) that they don't need to deal with tons of data races issues, but the trade off is that memory wise, a process takes way more memory than a thread.

Say more about the "poor replication story". I thought replication was pretty good. What's wrong with it?
There's some stuff here with some links you can follow: https://rbranson.medium.com/10-things-i-hate-about-postgresq...