Hacker News new | ask | show | jobs
by stingraycharles 2075 days ago
Thanks for this. So I’m fairly familiar how MVCC relates to CoW, and the way that I read this is that it’s kind of the inverse of CoW: it’s an in-place update and a copy of the old data.

I can completely imagine the corner cases this ends up touching, as there must be an insane amount of logic in Postgres that assumes immutability of data on disk: if I read data block X for transaction version 1234, it will always remain the same.

It’s a very interesting approach though. Once all the corner cases are solved and delt with, I wonder: are there any reasons not to choose zheap over the standard storage engine?

3 comments

> I can completely imagine the corner cases this ends up touching, as there must be an insane amount of logic in Postgres that assumes immutability of data on disk: if I read data block X for transaction version 1234, it will always remain the same.

Most of the code having such low level assumptions has been centralized for PG12, as part of the introduction of pluggable table storage. Also note that that assumption as stated isn't actually true - we don't expect on-disk data to be immutable. There is on-access "pruning" of dead rows, there's VACUUM, there's new row versions on the same page etc.

There are probably some cases where traditional heap would be better that (ideal, finished) zheap. Some ideas: Crash recovery is faster without an UNDO phase. Accessing data that has been modified many times since you took your snapshot may be slower if you have to walk back in time through multiple hops to find the right version of each tuple (compared to the regular heap, which also has to step over lots of dead tuples in that case but can do so sequentially). There may be space management problems when there are too many active transactions interested in a page. There may be microscopic overheads that show up in some workloads that don't benefit from updating in place, perhaps because they are insert/delete-only.

I guess both options would be good to have, and it's interesting that recent SQL Server can do it both ways at the user's option (see "Accelerated Database Recovery", something that sounds conceptually a bit like PostgreSQL vacuum instead of ARIES style rollback where you have to undo changes).

the cow approach likely plays better with flash based storage since rewrites happen less often. so if you've got a lot of otherwise extra storage then maybe it's a good idea. though if you run on top of something like f2fs maybe it'll be better in that respect than pg does on it's own.
IIRC one issue with the current design is that as new rows are written in a different place, all indexes on that table have to be updated as well, leading to write amplification. If you do an update in place, you don't need to modify indices (but now that I think of it, what about concurrently running transactions that might want to read the old values using the old indices...?)

Also during the vacuum there's a lot of writes marking rows as available.

In theory, yes, but PostgreSQL's current storage is not optimized for flash.