Hacker News new | ask | show | jobs
by abhinav22 2075 days ago
Thanks for the explanation. In layman terms how much is the benefit from separating the old rows from the database vs the cost of accessing it from a separate file?

Just curious why it’s being done now as sounds like a major design decision that would have been considered a long time ago

3 comments

In Postgres 12, the pluggable storage API was introduced. That allows use of multiple different storage engines with Zheap being one of the new storage engines.

IIRC, one of the big reasons for implementing pluggable storage was for Zheap.

FWIW, the existing implementation has worked really well for most use cases.

It's a complex tradeoff. It depends on the specific workload which scheme is better. There's no clear winner. This is why it's being introduced as a pluggable choice vs a wholesale change.

Andy Pavlo's class has a lot of good general information on the topic: https://15721.courses.cs.cmu.edu/spring2020/slides/03-mvcc1....

To put it simply (very simply), most of the time you don't need old rows (rollbacks are rare, crashes are rare, etc.). If you store old rows (undo) in a separate place, you don't have to read (and skip) them all the time while reading actual data. That's the main benefit.