Hacker News new | ask | show | jobs
by hyper_reality 1718 days ago
PostgreSQL is one of the most powerful and reliable pieces of software I've seen run at large scale, major kudos to all the maintainers for the improvements that keep being added.

> PostgreSQL 14 extends its performance gains to the vacuuming system, including optimizations for reducing overhead from B-Trees. This release also adds a vacuum "emergency mode" that is designed to prevent transaction ID wraparound

Dealing with transaction ID wraparounds in Postgres was one of the most daunting but fun experiences for me as a young SRE. Each time a transaction modifies rows in a PG database, it increments the transaction ID counter. This counter is stored as a 32-bit integer and it's critical to the MVCC transaction semantics - a transaction with a higher ID should not be visible to a transaction with a lower ID. If the value hits 2 billion and wraps around, disaster strikes as past transactions now appear to be in the future. If PG detects it is reaching that point, it complains loudly and eventually stops further writes to the database to prevent data loss.

Postgres avoids getting anywhere close to this situation in almost all deployments by performing routine "auto-vacuums" which mark old row versions as "frozen" so they are no longer using up transaction ID slots. However, there are a couple situations where vacuum will not be able to clean up enough row versions. In our case, this was due to long-running transactions that consumed IDs but never finished. Also it is possible but highly inadvisable to disable auto-vacuums. Here is a postmortem from Sentry who had to deal with this leading to downtime: https://blog.sentry.io/2015/07/23/transaction-id-wraparound-...

It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound event, and as with every PG feature highly granular settings are exposed to tweak this behaviour (https://www.postgresql.org/about/featurematrix/detail/360/)

5 comments

> Also it is possible but highly inadvisable to disable auto-vacuums.

When I was running my first Postgres cluster (the reddit databases), I had no idea what vacuuming was for. All I knew was that every time it ran it slowed everything down. Being dumb, I didn't bother to read the docs, I just disabled the auto vacuum.

Eventually writes stopped and I had to take a downtime to do a vacuum. Learned a few important lessons that day. I also then set it up to do an aggressive vacuum every day at 3am, which was the beginning of low traffic time, so that the auto-vacuuming didn't have as much work to do during the day.

Everytime I've seen people having "vacuuming too expensive" problems, the solution was "more vacuum"!
"vacuum during the times when I want instead of randomly at peak traffic"
As it turns out, yes!
> Each time a transaction modifies rows in a PG database, it increments the transaction ID counter.

It's a bit more subtle than that: each transaction that modifies, deletes or locks rows will update the txID counter. Row updates don't get their own txID assigned.

> It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound

When close to wraparound, the autovacuum daemon stops cleaning up the vacuumed tables' indexes, yes. That saves time and IO, at the cost of index and some table bloat, but both are generally preferred over a system-blocking wraparound vacuum.

What is wrong with using a 64 bit, or even 128 bit transaction id?
It would increase disk usage by a significant amount, since transaction IDs appear twice in tuple headers (xmin/xmax). Essentially they are overhead on every database row. This submission has a discussion on it: https://news.ycombinator.com/item?id=19082944
I wonder how does MS SQL work differently.
By default MS SQL uses pessimistic locking, depending on isolation levels. There's only one version of the data on disk and the isolation level of a transaction determines what happens - for example if a transaction in SERIALIZABLE reads a row SQL Server takes a shared read lock on that row preventing any other transaction from writing to it.

MS SQL also has snapshot (and read committed snapshot) isolation levels. These are much more like the Postgresql isolation levels - in fact Postgres only has two 'real' isolation levels, read committed and serializable, you get upgraded to the next higher level as permitted in the spec.

In snapshot isolation instead of taking a lock SQL Server copies the row to a table TempDB when it would be overwritten, additionally it adds a 14-byte row version to each row written. There's a lot of detail here: https://docs.microsoft.com/en-us/sql/relational-databases/sq...

This is also why MS SQL maintains a clustered index on the main table - the main table only contains the latest globally-visible version, so it can be sorted. Postgres stores all versions (until vacuum removes dead rows), so the main table is a heap, only indexes are sorted.

It has a similar concept if you need MVCC (with InnoDB). It also has a concept of transaction IDs. And also need to clean them up (purge). They will both have table bloat if not done.

Since details matter, there's a post that explains it far better than I could:

https://www.enterprisedb.com/blog/mysql-vs-postgresql-part-2...

I asked about MS not My though.
Yeah, making this not be optional is the issue. Only some situations don't get by with 32 bit txids, but imposing the cost on everyone would be bad.

Oh, and C codebases make such changes far harder than more rigid newtyping in e.g. Rust, which is why I assume no one made the necessary patches yet.

I believe mostly how much code needs to be changed, which they are working towards slowly, but there is more overhead (memory / disk) associated with those larger data types which are used everywhere:

https://wiki.postgresql.org/wiki/FullTransactionId

SRE?
Thank you for this explanation!