Among modern successful RDBMSes employing MVCC only PosgreSQL (correct me if I'm wrong) stores stale row versions on the page. Other vendors store them in some physically separate place (tempdb in MSSQL, UNDO tablespace in Oracle, etc.) This reduces both regular read (scans) and write (page splits) I/O and eliminates the need to vacuum (more I/O).
Of course this approach requires some maintenance of that separate place. But such maintenance doesn't compete for I/O with regular workload and doesn't hurt performance.
So after decades of progress and improvement it looks like this latter approach turned out to be more efficient and is winning historically.
Actually MSSQL still has a vacuum system under the hood ("ghost record cleanup task") that is needed even if you have disabled all MVCC features. It is not as critical as in PostgreSQL, since there will be fewer rows to clean up.
It runs automatically in the background, with rather conservative settings that usually prevent it from having much perf impact.
The system also ties into MVCC because deleted rows in the main table cannot be removed while any query can still access some other version of that row. The same is true when changing any key column in an index. In those cases, the update is a delete and insert within that index. And the deleted row must stick around if there is any transaction that could read it.
So while SQL Server does not get as many rows that needs to be vacuumed as PostgreSQL does, just storing the versions separately certainly does not completely eliminate the need for after the fact cleanup.
This is exactly what MySQL (and I think Oracle) does. The final delete of a row remains tombstoned in the main table until a background process determines it is safe to remove based on transaction visibility (MySQL calls this purge).
It seems the important difference is whether the table needs to be scanned to find dead rows. In Postgres it seems the answer is yes. In MySQL it keeps this info as a separate scratchpad to quickly find the rows to remove once they are allowed to be.
Both approaches have advantages and disadvantages. For example, in MySQL, if you have a very long running transaction such that it can't purge rows for a long time, you can actually overflow the rollback segment onto disk and purging slows down dramatically. You might even need to pause writes to get it to recover, although I haven't ever seen it get that bad. With a table scan approach it degrades more gracefully in that scenario.
One annoying wrinkle in MySQL's purge implementation: it won't progress to any old row versions that are newer than the oldest transaction, even if there are no transactions that could ever possibly access those rows. [1]
As you noted, this can get painful since purge effectively gets blocked -- if you have a high-volume OLTP workload with heavy emphasis on quick small UPDATEs / DELETEs, and then introduce a single very long-running read transaction (such as a logical dump/backup), purge can't make progress. But this is just due to the current implementation. In theory a smarter system could differentiate between row versions actually needed by old transactions vs ones that are inaccessible to all transactions.
There is work to implement a similar approach for PostgreSQL, called zheap. One of the persons behind that is the one who wrote and submitted to HN this article.
As far as I understand, the trade-off of storing stale rows in a separate place is that you need to bring them back if the transaction rolls back, making rollbacks more expensive.
Also, you need to copy entire row before updating it or deleting it, not sure how significant this cost is in practice.
There are several successful databases that use in page MVCC storage. I'd suggest going through Andy Pavlo's slide decks for a summary: https://15721.courses.cs.cmu.edu/spring2020/, in particular slide 33 from lecture 3 has a summary. Also there are papers wheree his group directly benchmarks the various design alternatives within their test database, Peloton.
The TL;DR, is there's a lot of details to how this stuff works, many tradeoffs involved, and it's by no means whatsoever clear that Postgres has somehow made the wrong choice.
Yeah, he does two big database classes, and the last couple years of each are online with slides, youtube of the lectures, etc. Just click on the schedule for each class. They're amazing resources for learning the state of the art.
I just tried postgres yesterday for the first time after years of being on mysql only.
I immediately got some warnings in pgadmin that some tables need to be vacuumed. Is this really necessary? Can i just vacuum everything at night on a cronjob?
In general you don't need to vacuum manually, this happens in the background automatically. If you have a lot of activity on the DB, or fit into certain corner cases you have to tune the autovacuum parameters. The default is generally getting better with each Postgres release.
And if you want to play with the Autovacuum settings, you really should read up on this in detail. There are some counter intuitive aspects to it, e.g. in many cases if you have problems with autovacuuming you should vacuum more, not less.
One exception is after you restore a backup, you might want to run "VACUUM ANALYZE" so that you get some good statistics immediately. That also will happen in the background later, but until then you can get weird query plans.
In general, if you don't have extremely high UPDATE rates (thousands per second) you don't really need to care about vacuum these days (assuming you correctly end all your transactions and don't have any sessions in "idle in transaction" for a long time)
> Can i just vacuum everything at night on a cronjob?
It might not be technically pure, but you absolutely can do this, and if your load is really asymmetrical and much quieter at night, then it may well make sense. It may also be unnecessary, as there is an auto-vacuum feature.
Actually, you should avoid using vacuum full at all. What that operation does is creating a new table with contents of an old table then deleting the old one. This operation is also requesting an exclusive lock, so any write query will cause lockup.
Another problem is that vacuum full compacts the database, it reduces database size on disk, but doesn't mean it makes database faster, often it might actually reduce performance.
Vacuum full might be useful in scenarios where you want to reclaim space, because autovacuum was not tuned correctly and tables get out of hand, if you need to run vacuum full frequently you should tune autovacuum to run more frequently.
When it gets to be too big, it's a good time to do a vacuum full. I normally do it a few times a year on a few giant tables. Yes it locks up a table for a few minutes, but it is a big win and totally worth it.
I'm not denying that VACUUM FULL is useful when things are out of hand. What I'm trying to say is that you probably don't have autovacuum tuned to run often enough (this is also what your link suggests)
I think you can check n_dead_tup, n_mod_since_analyze from pg_stat_all_tables, and see if the numbers are high, then it would be a good idea to perform vacuum. I found some query on the net which might help you. See https://gist.github.com/lesovsky/b0e4033380b0992789118ab35c4...
I think you might need to change this depending on the version of Postgres.
You seem to be missing that data can still be read after it is insterted, provided it is not updated. So think if more like WORM rather than write-only.
Some argue that having a separate vacuum process is a bad idea. That the right time to clean up a no longer transaction-tied version is the next time someone visits the row in a query. That way there are no extra database accesses to each row beyond normal use.
I see several pretty big issues with that. It would leave empty rows and pages since there is no guarantee that there will ever be a scan which hits a particular page which contains deleted rows. Additionally if you recently have deleted a lot of rows the next sequential scan could become very exprnsive since suddenly a read query needs to dirty a lot of pages, maybe more pages than the number of available buffers forcing our read query to have to wait on write io.
What PostgreSQL has though is micro vacuum where PostgreSQL tries to remove any no longer visible rows when it already has to modify a page. But it only does this as long as no index updates are necessary.
If a dead record is encountered in a scan, it is queued up to cleaned up by a background task, that runs every 5 seconds. This typically happens soon enough that the record is still cached, so no extra read I/O is needed, and in any case the location of the record is already known, so no extra scans are needed.
To handle other records, it maintains a bitmap of pages with ghost records. Each time it is awakened, it cleans up the queue, and then does up to 10 pages indicated in the bitmap. Then it goes back to sleep, and only wakes up again 5 seconds later.
This super slow cleanup of dead records that have not been recently scanned helps to avoid overloading the system, but does ensure cleanup will eventually occur, even if the table never gets scanned.
Pedantic note: (Technically it is a bytemap, as the bit is part of a larger structure, that holds other flags about a page.)
Your last sentence is not very clear to me. "But it only does this as long as no index updates are necessary.". Do you mean to say it does only for HOT updates? If so, I don't think that is the case, it can do it during a scan of the heap.
I think it's worth the effort just to make sure you keep familiar with the process for upgrading (including making appropriate backups). It's pretty quick and easy to upgrade and makes sure you're on top of deployment for your project.
No real advantage especially with a mostly idle system.
But it's a good idea to keep up with the versions anyway. The upgrade processes isn't really complicated and if you can afford a downtime not really challenging.
In my experience it's typically easier to migrate from version x to x+1, rather than waiting a long time and then migrate from x to x+5 (or similar big jumps).
But regardless of the strategy for major version upgrades (10 -> 11 -> 12) you should at least keep up with the minor updates (10.1 -> 10.12)
Even if you don't want to upgrade to major version as you don't need additional features, it is advisable to upgrade to latest minor version of PG-10. See docs https://www.postgresql.org/docs/release/ for the latest releases and what is fixed in each minor release.
I've got pgBackrest set up for backups and pgBouncer for connection pooling set up so it's a case of wondering if there is some benefit to the newer versions for someone using it "casually"
Since it's require quite a bit of reconfig and testing.
Also the server the DB is on is RAM constrained. And Postgres making a process per connection brings it right down versus using pgBouncer to pool connections.
It seems complicated but it's actually quite simple.
I suggest an improvement, to be able to vacuum without having to have the free space of the table.
Example to vacuum a 1Gb table you must have a 1Gb free
Is that always the case? I'd expect that only to be necessary with a complete table rewrite, i.e. VACUUM FULL, which is not what (auto)VACUUM does by default.
But sometimes VACUUM FULL is necessary for tables as fragmentation will otherwise never be resolved. A common scenario are tables that store some kind of time series events that are updated while active. Older partitions have no write activity but remain heavily fragmented unless VACUUM FULL is issued. For that you need space at least as big as the table. I don't really understand why as page rewrites could happen partly in place.
Overall I love PostgreSQL but the vacuum/MVCC system of MSSQL and Oracle is much better suited for heavy writes, PostgreSQL needs a lot of tweaking in table structures to handle those.
Of course this approach requires some maintenance of that separate place. But such maintenance doesn't compete for I/O with regular workload and doesn't hurt performance.
So after decades of progress and improvement it looks like this latter approach turned out to be more efficient and is winning historically.