| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by magicalhippo 2075 days ago
	> replace the old row with the new one and write a copy of the old row to a separate file Does it do this in the reverse order? If not, how does it prevent dataloss in case the main table update succeeds but writing to the "old copy" table fails?

2 comments

macdice 2075 days ago

Well, conceptually it happens atomically. The change is described in a single WAL (write ahead log) record, and data in the buffers representing the table and undo log isn't allowed to touch the disk until the WAL is on disk, so in any crash/restart scenario that leaves the job half done, we'll simply do it again when we replay all changes since the last checkpoint. A related question is: what happens to the uncommitted change after that? The system recognises that the change belongs to an uncommitted transaction, and rolls it back, which in this case means that we'll copy the old copy back into the main table before allowing any user to access it, and free up the space in the undo log. This is approximately how most other RDBMSes work.

link

magicalhippo 2075 days ago

Ah of course, how silly of me to forget about the log. Thank you, that makes perfect sense.

link

jasonwatkinspdx 2075 days ago

It's more complex than the above. The key constraint is actually that the transaction is not allowed to return as committed until the WAL is stable on disk. Whether the writes it did to various pages are allowed before that point or held until after is an implementation choice, and different databases make different choices. It gets complex fast, but the short summary is that the recovery protocols understand how to reconstruct the state after the fact, redo what needs to be redone, and undo what needs to be undone.

link

malisper 2075 days ago

> Does it do this in the reverse order?

I would guess so, but I haven't looked up the implementation so I'm not sure. There's a ton of race conditions that can come up depending on the exact order you write things so I'm sure the actual implementation is pretty messy.

link