|
|
|
|
|
by andrewstuart
1859 days ago
|
|
It would be nice to hear how much of problem XID wraparound is in Postgres 14 - do the fixes below address it entirely or just make it less of a problem? I see no mention of addressing transaction id wraparound, but these are in the release notes: Cause vacuum operations to be aggressive if the table is near xid or multixact wraparound (Masahiko Sawada, Peter Geoghegan) This is controlled by vacuum_failsafe_age and vacuum_multixact_failsafe_age. Increase warning time and hard limit before transaction id and multi-transaction wraparound (Noah Misch) This should reduce the possibility of failures that occur without having issued warnings about wraparound. https://www.postgresql.org/docs/14/release-14.html |
|
Clearly it doesn't eliminate the possibility of wraparound failure entirely. Say for example you had a leaked replication slot that blocks cleanup by VACUUM for days or months. It'll also block freezing completely, and so a wraparound failure (where the system won't accept writes) becomes almost inevitable. This is a scenario where the failsafe mechanism won't make any difference at all, since it's just as inevitable (in the absence of DBA intervention).
A more interesting question is how much of a reduction in risk there is if you make certain modest assumptions about the running system, such as assuming that VACUUM can freeze the tuples that need to be frozen to avert wraparound. Then it becomes a question of VACUUM keeping up with the ongoing consumption of XIDs by the system -- the ability of VACUUM to freeze tuples and advance the relfrozenxid for the "oldest" table before XID consumption makes the relfrozenxid dangerously far in the past. It's very hard to model that and make any generalizations, but I believe in practice that the failsafe makes a huge difference, because it stops VACUUM from performing further index vacuuming.
In cases at real risk of wraparound failure, the risk tends to come from the variability in how long index vacuuming takes -- index vacuuming has a pretty non-linear cost, whereas all the other overheads are much more linear and therefore much more predictable. Having the ability to just drop those steps if and only if the situation visibly starts to get out of hand is therefore something I expect to be very useful in practice. Though it's hard to prove it.
Long term, the way to fix this is to come up with a design that doesn't need to freeze at all. But that's much harder.