| HN Mirror

I've built an append-only scheme where the on-disk format consists of 1 gigantic splay tree with updates being written as a modified sub-tree. The last element you write to disk is always the (latest) root node. Append-only implicitly solves most disk fragmentation concerns. It also has GC capabilities by having the physical log divided into multiple files. The actual "cleanup" is simply taking an old file and feeding its items into the storage engine again. The heuristics for determining when to do this are based upon statistics collected at insert/update time.

Assuming the root node knows how to find the various things you are looking for (i.e. physical offsets to child nodes), this is how you can address the storage.

The trickiest part is finding the latest root node in adverse scenarios (i.e. plug pulled/partial write to disk). You can develop some 2-tier system where a 32-bit magic cookie is scanned for in blocks from back to front, and once it is encountered the relevant offsets are applied and the candidate attempts deserialization+checksum. No partial writes can be recovered in this scheme and wind up as wasted bytes in the log.

At some point I had intended to use this for a work project, but then SQLite came in and ruined my little party.