Hacker News new | ask | show | jobs
by ysleepy 2118 days ago
Nice!

I implemented pretty much the same trade off set in an authenticated storage system.

single writer, radix merkle tree, persistent storage, hashed keys, proofs.

I guess it is a local maxima within that trade off space.

I like how the time travelling/history is always touted as a feature (which it is), but it really just means the garbage collector/pruning part of the transaction engine is missing. Postgres and other mvcc systems could all be doing this, but they don't. The hard part of the feature is being able to turn it off.

I'll probably have a look around later, the diffing looks interesting, not sure yet if it's done using the merkle tree (likely) or some commit walking algorithm.

1 comments

> I like how the time travelling/history is always touted as a feature (which it is), but it really just means the garbage collector/pruning part of the transaction engine is missing. Postgres and other mvcc systems could all be doing this, but they don't.

Postgres actually did tout it as a feature in "THE IMPLEMENTATION OF POSTGRES" by Michael Stonebraker, Lawrence A. Rowe and Michael Hirohama[1] search for "time travel" in the PDF. I added the relevant quotes below for easier access ;)

This was back when PostgreSQL had the postquel language (before SQL was added) there was special syntax to access data at specific points in time:

> The second benefit of a no-overwrite storage manager is the possibility of time travel. As noted earlier, a user can ask a historical query and POSTGRES will automatically return information from the record valid at the correct time.

Quoting the paper again:

> For example to find the salary of Sam at time T one would query:

    retrieve (EMP.salary)
    using EMP [T]
    where EMP.name = "Sam"

> POSTGRES will automatically find the version of Sam’s record valid at the correct time and get the appropriate salary.

[1] - https://dsf.berkeley.edu/papers/ERL-M90-34.pdf

Is this still possible with Postgres?
Yes and no or to be precise - to a certain degree but not through an exposed language feature.

PostgreSQL still does copy-on-write so the old versions of the row exist and are present in storage. However now there is an autovacuum process going over the records regularly marking those no longer seen by any transactions as re-usable so eventually the old records would get overwritten.

You can get at the older versions of the rows directly on disk or perhaps it would be possible to get the db to return such older versions of the rows. It seems that by default even trying to get at them with `ctid` is not possible so that may require hacking PostgreSQL itself or using some extension which seem to actually exist[1].

[1] - https://github.com/omniti-labs/pgtreats/tree/master/contrib/...