Hacker News new | ask | show | jobs
by rmunn 123 days ago
I like to think of transactions, in an MVCC system like Postgres, as being like snapshots in copy-on-write filesystems like btrfs or zfs. When you BEGIN a transaction, the DB takes a snapshot of your data, so now there are two versions of the data, the snapshot (visible to everyone else) and the "private" version visible only to your transaction. Then as you run UPDATEs, the new data is written to the private copy, but everyone else continues to work with the snapshot. (And might be creating their own private copies for other transactions).

If you do a ROLLBACK, then your private copy of the data is discarded, and its changes never make it into the official copy. But if you do a COMMIT, then your private snapshot is made public and is the new, official, copy for everyone else to read from. (Except those who started a transaction before you ran COMMIT: they made their private copies from the older snapshot and don't have a copy of your changes).

This is probably obvious to nearly everyone here, but I figured I'd write it anyway. You never know who might read an analogy like this and have that lightbulb moment where it suddenly makes sense.

P.S. Another analogy would be Git branches, but I'll write that in a different comment.

1 comments

Not quite. Databases use both branching and locking. Two transactions that conflict can cause one thread to block, rather than rolling back.

SELECT followed by an update is the most usual case for a block. (I have to code one today, and I want to see if I can rewrite it as one MySQL statement.)

Yes, the analogy isn't perfect. I didn't want to get into all the subtleties in an introductory analogy, but I should probably have mentioned blocking. Too late for me to edit my post with your correction, though.