Hacker News new | ask | show | jobs
by mickeyp 1019 days ago
I mean... if you multiplex disparate statements into the same connection and session then, well... yes, that is fraught with an incredible amount of complexity. That stuff's for OLAP, read-only replica servers and so on. High-throughput "hey I just need this one thing."

Your large-scale app probably won't need that by default. You pool connections in pgBouncer (or your application) because they're slow and expensive to set up. If you run a standard N-tier architecture and you have fifty 'backend apps' (say web apps) that talk to a communal PG server, you want pgBouncer to avoid connection churn. If your scale's a lot smaller than that then you don't need pgBouncer and you can get by with your app pooling a handful of conns --- subject to the usual vagaries that come with hand-wavey explanations. Measure, monitor, etc.

It's an interesting and illuminating article, but the take-away is don't do weird statement-based transaction multiplexing unless you know exactly why you want that!

If you're running into "connection limits" on Heroku or whatever... maybe now is the time to stop letting other people manage your DB and just run it yourself? You're clearly big enough to run up against that.

Or, maybe, release your session back to the pool (app/bouncer/whatever) instead of sitting on it. For OLTP stuff like 99% of all web apps, that's key. With 100 connections and a p99 request-response cycle with DB hits being <50ms, you can handle a lot of traffic.

3 comments

I've been running pgBouncer in large production systems for years (~10k connections to pgbouncer per DB, and 200-500 active connections to postgres). We have so many connections because microservices breed like rabbits in spring once developers make the first one, but I could rant about that in a different post.

We use transaction level sharing. Practically, this means we occasionally see problems when some per-connection state "leaks" from one client to another when someone issues a SQL statement that affects global connection state, and it affects the query of a subsequent client inheriting that state. It's annoying to track down, but given the understanding of behavior, developers generally know how to limit their queries at this point. Some queries aren't appropriate for going through pgbouncer, like cursor based queries, so we just connect directly to the DB for the rare cases where this is needed.

Why so many connections? Say you make a Go based service, which launches one goroutine per request, and your API handlers talk to the DB - the way the sql.dB connection pooling works in Go is that it'll grow its own pool to be large enough to satisfy the working parallelism, and it doesn't yield them for a while. Similar things happen in Java, Scala, etc, and with dozens of services replicated across multiple failure domains, you get a lot of connections.

It's a great tool. It allows you to provision smaller databases and save cost, at the cost of some complexity.

> microservices breed like rabbits in spring once developers make the first one

microservices talking to the same db... thats not microservices thats a disaster. you basically combine the negatives of the microservice world with the negatives of the monolith - tight coupling.

they can have their own databases, but still be on the same Postgres instance (aka cluster in Postgres parlance).
Databases are there to share data and provide transactional guarantees and even locking. Your data often must be tightly coupled like this, and most databases designed with this in mind and provide benefits when doing so. It doesn't mean your apps need to be, and there are still plenty of benefits in deployment and operations to be had with microservices. Silo the data when it makes sense, but force the issue you end up with a different problem trying to reimplement the benefits of a database in the app layer or with a fault tolerant, guaranteed delivery messaging system (itself a database under the hood).
> maybe now is the time to stop letting other people manage your DB and just run it yourself

> With 100 connections and a p99 request-response cycle with DB hits being <50ms, you can handle a lot of traffic.

You would be shocked and appalled at how little many devs know about DBs, or how long something _should_ take. I've had to explain that triple-digit msec latency on a simple SELECT is in fact not normal, and they should fix it. And that's just using it, not running it. There are a massive number of orgs that don't want to have to ever think about things like backup strategies or patching, and so they happily fork money over to a cloud provider to do it for them.

> If you're running into "connection limits" on Heroku or whatever... maybe now is the time to stop letting other people manage your DB and just run it yourself? You're clearly big enough to run up against that.

This is top advice actually. I would usually always suggest this! I've felt a lot other services too that can run better on the local/server deployment.