Hacker News new | ask | show | jobs
Improved vacuum in PostgreSQL 13 (amitkapila16.blogspot.com)
185 points by amitkapila 2230 days ago
6 comments

Among modern successful RDBMSes employing MVCC only PosgreSQL (correct me if I'm wrong) stores stale row versions on the page. Other vendors store them in some physically separate place (tempdb in MSSQL, UNDO tablespace in Oracle, etc.) This reduces both regular read (scans) and write (page splits) I/O and eliminates the need to vacuum (more I/O).

Of course this approach requires some maintenance of that separate place. But such maintenance doesn't compete for I/O with regular workload and doesn't hurt performance.

So after decades of progress and improvement it looks like this latter approach turned out to be more efficient and is winning historically.

Actually MSSQL still has a vacuum system under the hood ("ghost record cleanup task") that is needed even if you have disabled all MVCC features. It is not as critical as in PostgreSQL, since there will be fewer rows to clean up.

It runs automatically in the background, with rather conservative settings that usually prevent it from having much perf impact.

The system also ties into MVCC because deleted rows in the main table cannot be removed while any query can still access some other version of that row. The same is true when changing any key column in an index. In those cases, the update is a delete and insert within that index. And the deleted row must stick around if there is any transaction that could read it.

So while SQL Server does not get as many rows that needs to be vacuumed as PostgreSQL does, just storing the versions separately certainly does not completely eliminate the need for after the fact cleanup.

This is exactly what MySQL (and I think Oracle) does. The final delete of a row remains tombstoned in the main table until a background process determines it is safe to remove based on transaction visibility (MySQL calls this purge).

It seems the important difference is whether the table needs to be scanned to find dead rows. In Postgres it seems the answer is yes. In MySQL it keeps this info as a separate scratchpad to quickly find the rows to remove once they are allowed to be.

Both approaches have advantages and disadvantages. For example, in MySQL, if you have a very long running transaction such that it can't purge rows for a long time, you can actually overflow the rollback segment onto disk and purging slows down dramatically. You might even need to pause writes to get it to recover, although I haven't ever seen it get that bad. With a table scan approach it degrades more gracefully in that scenario.

One annoying wrinkle in MySQL's purge implementation: it won't progress to any old row versions that are newer than the oldest transaction, even if there are no transactions that could ever possibly access those rows. [1]

As you noted, this can get painful since purge effectively gets blocked -- if you have a high-volume OLTP workload with heavy emphasis on quick small UPDATEs / DELETEs, and then introduce a single very long-running read transaction (such as a logical dump/backup), purge can't make progress. But this is just due to the current implementation. In theory a smarter system could differentiate between row versions actually needed by old transactions vs ones that are inaccessible to all transactions.

[1] https://bugs.mysql.com/bug.php?id=74919

You can read how different databases provide MVCC at my other blog: https://amitkapila16.blogspot.com/2015/03/different-approach...
As far as I understand, the trade-off of storing stale rows in a separate place is that you need to bring them back if the transaction rolls back, making rollbacks more expensive.

Also, you need to copy entire row before updating it or deleting it, not sure how significant this cost is in practice.

Yes, but expansive rollbacks are acceptable as they are very rare in production.
This is somewhat confused/mistaken.

There are several successful databases that use in page MVCC storage. I'd suggest going through Andy Pavlo's slide decks for a summary: https://15721.courses.cs.cmu.edu/spring2020/, in particular slide 33 from lecture 3 has a summary. Also there are papers wheree his group directly benchmarks the various design alternatives within their test database, Peloton.

The TL;DR, is there's a lot of details to how this stuff works, many tradeoffs involved, and it's by no means whatsoever clear that Postgres has somehow made the wrong choice.

Thank you for the link, I'm really interested in this topic. However I can't figure out how to access those slides. Do you have to be enrolled?
Yeah, he does two big database classes, and the last couple years of each are online with slides, youtube of the lectures, etc. Just click on the schedule for each class. They're amazing resources for learning the state of the art.
Today I just woke up to find yet another "ORA-01555: snapshot too old" on something I left running overnight.

Honestly, performance isn't as important as the capacity of getting an atomic snapshot of your data with any reliability.

Avoiding ORA-01555 is a matter of properly sizing your undo.
And a matter of not getting any task that is slower than what you expected.
I just tried postgres yesterday for the first time after years of being on mysql only.

I immediately got some warnings in pgadmin that some tables need to be vacuumed. Is this really necessary? Can i just vacuum everything at night on a cronjob?

In general you don't need to vacuum manually, this happens in the background automatically. If you have a lot of activity on the DB, or fit into certain corner cases you have to tune the autovacuum parameters. The default is generally getting better with each Postgres release.

And if you want to play with the Autovacuum settings, you really should read up on this in detail. There are some counter intuitive aspects to it, e.g. in many cases if you have problems with autovacuuming you should vacuum more, not less.

One exception is after you restore a backup, you might want to run "VACUUM ANALYZE" so that you get some good statistics immediately. That also will happen in the background later, but until then you can get weird query plans.

Normally you only need to adjust autovacuum settings per table if your tables have a very large number of rows:

https://lob.com/blog/supercharge-your-postgresql-performance...

Regarding the last point, I wonder if that will be necessary after the changes in the OP. Example 2 might cover this.
Is auto-vacuum enabled by default?
Just ANALYZE also works
I think that's a false warning from pgAdmin.

In general, if you don't have extremely high UPDATE rates (thousands per second) you don't really need to care about vacuum these days (assuming you correctly end all your transactions and don't have any sessions in "idle in transaction" for a long time)

Adding to that comment: You can tweak the times where automatic VACUUMing happens to a certain degree,

https://www.postgresql.org/docs/12/runtime-config-autovacuum...

> Can i just vacuum everything at night on a cronjob?

It might not be technically pure, but you absolutely can do this, and if your load is really asymmetrical and much quieter at night, then it may well make sense. It may also be unnecessary, as there is an auto-vacuum feature.

You don't generally ever need to manually trigger them, they happen automatically when needed
If you do stuff like drop columns, you totally need to do a vacuum full or you are never getting back disk space...
Actually, you should avoid using vacuum full at all. What that operation does is creating a new table with contents of an old table then deleting the old one. This operation is also requesting an exclusive lock, so any write query will cause lockup.

Another problem is that vacuum full compacts the database, it reduces database size on disk, but doesn't mean it makes database faster, often it might actually reduce performance.

Vacuum full might be useful in scenarios where you want to reclaim space, because autovacuum was not tuned correctly and tables get out of hand, if you need to run vacuum full frequently you should tune autovacuum to run more frequently.

Getting back space is real important. Otherwise you have runaway table growth that never ends.

Not saying do it all the time, but it is important to use surgically. You can easily calculate wasted table space

https://www.citusdata.com/blog/2017/10/20/monitoring-your-bl...

When it gets to be too big, it's a good time to do a vacuum full. I normally do it a few times a year on a few giant tables. Yes it locks up a table for a few minutes, but it is a big win and totally worth it.

I'm not denying that VACUUM FULL is useful when things are out of hand. What I'm trying to say is that you probably don't have autovacuum tuned to run often enough (this is also what your link suggests)
Does it affect performance? I might not want VACUUM to trigger during high workload.
Yes it does, it makes things faster ;)

A common misconception about (auto) vacuum is, that you need to turn it off, to avoid performance problems.

The truth is: you need to make it more aggressive (= run it more frequently) if you have problems with the (additional) I/O load it generates.

Obviously I meant to ask if performance is affected during VACUUM.

@pg-gadfly's link answered it for me:

> VACUUM creates a substantial amount of I/O traffic, which can cause poor performance for other active sessions.

Since a vacuum operation uses processor cycles, it's impossible to say no.

Their wiki is pretty amazing source for postgres's operation: https://www.postgresql.org/docs/12/routine-vacuuming.html

I think you can check n_dead_tup, n_mod_since_analyze from pg_stat_all_tables, and see if the numbers are high, then it would be a good idea to perform vacuum. I found some query on the net which might help you. See https://gist.github.com/lesovsky/b0e4033380b0992789118ab35c4... I think you might need to change this depending on the version of Postgres.
Maybe a silly question but what is the point of an insert-only table. Surely it must be read/used in some capacity ? Do they mean "insert heavy"?
Means not updated. Say Www visitor logs. No reason to ever update a previous log. What happened happened.
.
Selects would still work. But non insert DML would be disallowed I assume (delete, update).
Data is only inserted, not changed or deleted.
.
You seem to be missing that data can still be read after it is insterted, provided it is not updated. So think if more like WORM rather than write-only.
for compliance. you want this behavior to be enforced at the database level so that any buggy code does not edit a previous row.
Some argue that having a separate vacuum process is a bad idea. That the right time to clean up a no longer transaction-tied version is the next time someone visits the row in a query. That way there are no extra database accesses to each row beyond normal use.
I see several pretty big issues with that. It would leave empty rows and pages since there is no guarantee that there will ever be a scan which hits a particular page which contains deleted rows. Additionally if you recently have deleted a lot of rows the next sequential scan could become very exprnsive since suddenly a read query needs to dirty a lot of pages, maybe more pages than the number of available buffers forcing our read query to have to wait on write io.

What PostgreSQL has though is micro vacuum where PostgreSQL tries to remove any no longer visible rows when it already has to modify a page. But it only does this as long as no index updates are necessary.

MSSQL does this with a two pronged approach.

If a dead record is encountered in a scan, it is queued up to cleaned up by a background task, that runs every 5 seconds. This typically happens soon enough that the record is still cached, so no extra read I/O is needed, and in any case the location of the record is already known, so no extra scans are needed.

To handle other records, it maintains a bitmap of pages with ghost records. Each time it is awakened, it cleans up the queue, and then does up to 10 pages indicated in the bitmap. Then it goes back to sleep, and only wakes up again 5 seconds later.

This super slow cleanup of dead records that have not been recently scanned helps to avoid overloading the system, but does ensure cleanup will eventually occur, even if the table never gets scanned.

Pedantic note: (Technically it is a bytemap, as the bit is part of a larger structure, that holds other flags about a page.)

You probably understand it better than I do but FWIW here is what I was basing my comment https://sourceforge.net/p/firebird/mailman/message/17015961/ and in particular Jim Starkey's post
Your last sentence is not very clear to me. "But it only does this as long as no index updates are necessary.". Do you mean to say it does only for HOT updates? If so, I don't think that is the case, it can do it during a scan of the heap.
So I'm running a single Postgres 10 instance for a small project that probably hits about 200 queries a day at most.

I haven't upgraded but would it actually be worth the effort?

I think it's worth the effort just to make sure you keep familiar with the process for upgrading (including making appropriate backups). It's pretty quick and easy to upgrade and makes sure you're on top of deployment for your project.
No real advantage especially with a mostly idle system.

But it's a good idea to keep up with the versions anyway. The upgrade processes isn't really complicated and if you can afford a downtime not really challenging.

In my experience it's typically easier to migrate from version x to x+1, rather than waiting a long time and then migrate from x to x+5 (or similar big jumps).

But regardless of the strategy for major version upgrades (10 -> 11 -> 12) you should at least keep up with the minor updates (10.1 -> 10.12)

Even if you don't want to upgrade to major version as you don't need additional features, it is advisable to upgrade to latest minor version of PG-10. See docs https://www.postgresql.org/docs/release/ for the latest releases and what is fixed in each minor release.
Well what are the issues you are currently having with Postgres 10? It doesn't sound like there is any compelling reason.
Yeah. That's the question haha.

I've got pgBackrest set up for backups and pgBouncer for connection pooling set up so it's a case of wondering if there is some benefit to the newer versions for someone using it "casually"

Since it's require quite a bit of reconfig and testing.

Curious why you bothered with pgBouncer for 200 queries per day? Just future proofing?

EDIT: forgot, pgBouncer also allows you to do hitless restart/upgrade, which is useful at any query volume I suppose ;)

Indeed.

Also the server the DB is on is RAM constrained. And Postgres making a process per connection brings it right down versus using pgBouncer to pool connections.

It seems complicated but it's actually quite simple.

I suggest an improvement, to be able to vacuum without having to have the free space of the table. Example to vacuum a 1Gb table you must have a 1Gb free
Is that always the case? I'd expect that only to be necessary with a complete table rewrite, i.e. VACUUM FULL, which is not what (auto)VACUUM does by default.
Even with a full table rewrite, why do you need to rewrite to RAM?
The "you must have a 1Gb free" refers to disk space, not main-memory (RAM).
That is only needed for VACUUM FULL, not for (auto) vacuum.
But sometimes VACUUM FULL is necessary for tables as fragmentation will otherwise never be resolved. A common scenario are tables that store some kind of time series events that are updated while active. Older partitions have no write activity but remain heavily fragmented unless VACUUM FULL is issued. For that you need space at least as big as the table. I don't really understand why as page rewrites could happen partly in place.

Overall I love PostgreSQL but the vacuum/MVCC system of MSSQL and Oracle is much better suited for heavy writes, PostgreSQL needs a lot of tweaking in table structures to handle those.

I totally agree. Postgres' vacuum is definitely its weakest point.

I hoped to see the "pluggable storage" show up in Postgres 13 (including zheap which uses UNDO logs) but apparently that didn't make it.