Hacker News new | ask | show | jobs
by fulafel 1830 days ago
This is quite common in traditional DBs too. Eg PostgreSQL has its write-ahead log. Both LMDB and PostgreSQL then occasionally need to do do some kind of compaction, checkpoint or garbage collection, whatever it's called in various systems, the write-only log is reset and any live data in it improted into the main db data.
1 comments

I only have a cursory knowledge on LMDB (listening to a podcast while biking). Anyway, LMDB has no transaction log nor write ahead log. There's no overwrite during update. Data page update is copy-on-write and b+tree index update is append only. The update on the b+tree pages is performed from the bottom of the tree to the root, linking newly appended pages to higher level pages. The transaction is committed when the new root page is appended. When there's a crash, the incomplete appended index pages have not been linked up to the root page yet and are not reachable from the previous valid root page. They can be just thrown away. Recovery just means searching for the last valid root index page. There's no need for a WAL and undo/redo of the transaction log.

Deleted pages and obsolete pages are actively put back into a free list (tracked by another b+tree), which will be reused for new page allocation. This avoids the long garbage collection phase to walk all the live pages for compaction (no vacuum is needed).

No. LMDB is copy-on-write, with double buffering/shadow pages for the root page updates. No searching for the last valid root page.

Looks like you have the other details right.

Thanks for the explanation. Clever stuff, LMDB is taking advantage of not having to support multiple writers here.