| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by saurik 5084 days ago

The reason I claim this would be simple is that PostgreSQL is almost already doing this. The way the data is stored on disk, every row has two transactions identifiers, xmin and xmax, which represent the transaction when that row was inserted the the transaction that row was deleted; rows, meanwhile, are never updated in place, so the old data stays around until it is deleted by a vacuum.

To demonstrate more tangibly how this works, I just connected to my database server (running PostgreSQL 9.1), created a table and added a row. I did so inside of a transaction, and printed the transaction identifier. I then queried the data in the table from a new transaction, showing that the xmin is set to the identifier of the transaction that added the row.

Connection 1:

    demo=> create table q (data int);
    CREATE TABLE
    demo=> begin; select txid_current();
    BEGIN
    189028
    demo=> insert into q (data) values (0); commit;
    INSERT 0 1
    COMMIT
    demo=> begin; select xmin, xmax, data from q;
    BEGIN
    189028|0|0

Now, while this new transaction is still open, from a second connection, I'm going to create a new transaction in which I am going to update this row to set the value it is storing to 1 from 0, and then commit. In the first connection, as we are still in a "snapshot" (I put this term in quotes, as MVCC is obviously not copying the entire database when a transaction begins) from a transaction started before that update, we will not see the update happen, but the hidden xmax column (which stores the transaction in which the row is deleted) will be updated.

Connection 2:

    demo=> begin; select txid_current();
    BEGIN
    189029
    demo=> update q set data = 1; commit;
    UPDATE 1
    COMMIT
    demo=> select xmin, xmax, data from q;
    189029|0|1

Connection 1:

    demo=> select xmin, xmax, data from q;
    189028|189029|0

As you can see, the data that the other transaction was referencing has not been destroyed: the old row (the one with the value 0) is still there, but the xmax column has been updated to indicate that this column no longer exists for transactions that began after 189029 committed. However, at the same time, the new row (with the value 1) also exists, with an xmin of 189029: transactions that begin after 189029 committed will see that row instead. No data was destroyed: and this data is persisted this way to disk (it isn't just stored in memory).

My contention then is that it should be a fairly simple matter to take a transaction and backdate when it began. As far as I know, there is no reason that this would cause any serious problems as long as a) it was done before the transaction updated or inserted any data, b) there have been no vacuums during the backdated period, c) HOT (heap-only tuple) updates are disabled (in essence, this is an optimization designed to do online vacuuming), and maybe d) the new transaction is read only (although I am fairly confident this would not be a requirement).

For a more complete implementation, one would then want to be able to build transactions (probably read-only ones; I imagine this would cause serious problems if used from a writable transaction, and that really isn't required) that "saw all data as if all data in the database was alive", which I also believe would be a pretty simple hack: you just take the code that filters dead rows from being visible based on these comparisons and add a transaction feature that lets you turn them off. You could then use the already-implemented xmin and xmax columns to do your historical lookups.

P.S. BTW, if you want to try that demo at home, to get that behavior you need to use the "repeatable read" isolation level, which uses the start of the transaction as the boundary as opposed to the start of the query. This is not the default; you might then wonder if it is because it is expensive and requires a lot more coordination, and as far as I know the answer is "no". In both cases, all of the data is stored and is tagged with the transaction identifiers: the difference is only in what is considered the reference time to use for "which of the rows is alive".

However, it does mean that a transaction that attempts to update a value that has been changed from another transaction will fail, even if the updating transaction had not previously read the state of the value; as most reasonable usages of a database actually work fine with the relaxed semantics that "data truly committed before the query executes" provides (as that still wouldn't allow data you update to be concurrently and conflictingly updated by someone else: their update would block) and those semantics are not subject to "this transaction is impossible" errors.

Both Connections (setup):

    demo=> set session characteristics as transaction isolation level repeatable read;
    SET

2 comments

krosaen 5084 days ago

Thanks for taking the time to elaborate, very interesting. I wonder if the sql db vendors or open source projects will take the next step to make querying against a transaction ID possible given the underlying implementation details bring it pretty close.

I also see Rich has made some interesting points elsewhere in this thread about consistent views being available outside of transactions and without need for coordination (within datomic) - seems more appropriate to comment directly there though.

Overall I think it's important to understand these nuances, and not view datomic as some revolutionary leap, even if I am excited about the project. I appreciate your insight into the power already within sql db engines.

link

saurik 5084 days ago

I am not certain whether your response comes from a reading of my comment before or after the paragraph I added that started with "for a more complete implementation", but if it was from before I encourage you to read that section: the ability to do the query is pretty much already there due to the xmin/xmax fields that PostgreSQL is already reifying.

link

saurik 5084 days ago

(edit: Apparently, nearly an hour ago, jeltz pointed out that PostgreSQL 9.2 actually has implemented nearly this identical functionality through the usage of exported snapshots, so I recommend people go read that comment and the linked documentation. However, my comment is still an example of the functionality working.)

(edit: Ah, but the feature as implemented actually saves a file to disk and thereby has a lot of server-side state: the way I've gone ahead and implemented it does not have this complexity; I simply take a single integer and store nothing on the server.)

http://news.ycombinator.com/item?id=4448472

> I wonder if the sql db vendors or open source projects will take the next step to make querying against a transaction ID possible given the underlying implementation details bring it pretty close.

For the hell of it, I just went ahead and implemented the "backdate a transaction" feature; I didn't solve the vacuum guarantees problem, however: I only made it so that a transaction can be backdated to another point in time.

To demonstrate, I will start with a very similar sequence of events to before. However, I am going to instead use txid_current_snapshot(), which returns the range (and an exception set that will be unused for this example) of transaction identifiers that are valid.

Connection 1:

    demo=# create table q (data int);
    CREATE TABLE
    demo=# begin; select txid_current_snapshot();
    BEGIN
    710:710:
    demo=# insert into q (data) values (0); commit;
    INSERT 0 1
    COMMIT
    demo=# begin; select txid_current_snapshot();
    BEGIN
    711:711:
    demo=# select xmin, xmax, data from q;
    710|0|0

Connection 2:

    demo=# begin; select txid_current_snapshot();
    BEGIN
    711:711:
    demo=# update q set data = 1; commit;
    UPDATE 1
    COMMIT
    demo=# select xmin, xmax, data from q;
    711|0|1

Connection 1:

    demo=# select xmin, xmax, data from q;
    710|711|0
    demo=# begin; select txid_current_snapshot();
    BEGIN
    712:712:
    demo=# select xmin, xmax, data from q;
    711|0|1

So far, this is the same scenario as before: I have two connections that are seeing different visibility to the same data, based on these snapshots. Now, however, I'd like to "go back in time": I want our first connection to be able to use the same basis for its consistency that we were using in the previous transaction.

Connection 1:

    demo=# set snapshot_txid = 711;
    SET
    demo=# select txid_current_snapshot();
    711:711:
    demo=# select xmin, xmax, data from q;
    710|711|0

This new variable, snapshot_txid, is something I created: it gets the current transaction's active snapshot and modifies it to be a range from that transaction id to that same id (I think a better version of this would take the exact same string value that is returned by txid_current_snapshot()).

From that previous basis, the row with the value 0 is visible, not the row with the value 1. I can, of course, go back to the future snapshot if I wish, in order to view the new row. (I am not yet certain what this will do to things writing to the database; this might actually be sufficient, however I feel like I might need to either mess with more things or mark the transaction read-only.)

Connection 1:

    demo=# set snapshot_txid = 712;
    SET
    demo=# select txid_current_snapshot();
    712:712:
    demo=# select xmin, xmax, data from q;
    711|0|1

link

campnic 5084 days ago

A couple questions about your example:

I noticed that the xmin changed from Connection 1 from 189018 -> 189028. Is that just a typo?

Is the concept of transaction in this regard a 'state of the entire db'? If a transaction included multiple modifications would they all get the same xmax? If so, I see this as a difference between the presentation and your example. The transaction is a modification to the entirety of the db and is a state of the db. In Hickey's presentation, he very clearly says that the expectation is that the transaction component of the datom is specific to an individual datom.

Since its been a while since I've worked on DBs, and even then I didn't know much, your demo has helped put it in perspective.

link

saurik 5084 days ago

Thank you so much for noticing that error. What happened is that I started doing it, and then made a mistake; I then redid the entire flow, but forgot I needed to re-copy/paste the first half as the transaction identifier would have changed. I have updated my comment to fix this.

Yes: the transaction is the state of the entire database. However, you can make your transactions as fine-grained as you wish: the reason to not do so, however, is that you are likely to end up with scenarios where you want to atomically roll-back many changes that other people using the database should not see in the case of a failure. You certainly then will at least want the ability to make multiple changes at once.

The tradeoff in doing so, however, is that you will need to make a new transaction, which will have a different basis. I agree: if you then have a need to be able to make tiny changes to individual items one at a time that need to be from a shared consistency basis and yet have no need to be atomic with respect to each other (which is the part I am going to be quite surprised by), then yes: you need the history query function to implement this.

I would be fascinated by a better understanding of that use case. Does he go into an explicit example of why that would be required later in the talk? (If so, I could try to translate that use case as best I can into a "traditional database" to see whether you really need that feature; if you do, it might be valuable to try to get something related to this design into PostgreSQL: I am starting to get a better understanding of the corner cases as I think more about it, and think I can come up with a proposal that wouldn't make this sound insane to the developers.)

link