Hacker News new | ask | show | jobs
by ioltas 2663 days ago
There is a range of interesting problems here for Postgres. One is related to the costly way snapshots are built which needs to scan all entries in PGPROC to get back a list of transaction IDs. So even idle connections come at a cost. There have been proposed patches to improve the scalability in this area, one being called CSN (Commit Sequence Number) which I think is rather promising: https://www.postgresql.org/message-id/CA+CSw_tEpJ=md1zgxPkjH...
2 comments

Interesting, but surely snapshots are only needed for connections with active transactions?

My understanding was that the difficulty here was Postgres's currently very simple forked-process-per-connection model. In order for idle connections to take minimal resource, connections can no longer be mapped 1:1 with processes. Instead, active connections will have to be balanced across a process or thread pool, whilst idle connections' states will have to be stored in a master process.

Not only that, but it must be possible to transmit a full connection state between processes, and also to queue-up idle connections transitioning to the active state if there is a sudden burst exceeding the size of the process pool.

> Interesting, but surely snapshots are only needed for connections with active transactions?

Right, but the problem is that the cost to compute them essentially scales with the number of active connections (and to some degree with the number of allowed connections). Which means if your system actually does things, all those idle connections cost. We can, and should, make the constant factor for computing this considerably smaller. But obviously larger architectural issues need to be addressed too, at some point..

Edit: expand.

FWIW, I don't think CSN actually gets us that far, because the data structures to map xids to CSNs aren't that cheap to maintain either (which is needed for MVCC visibility tests).