Hacker News new | ask | show | jobs
by vitiral 1219 days ago
How are the keys indexed and looked up? How does it handle disk fragmentation? This has always seemed to me to be one of the harder problems with databases.
3 comments

I've built an append-only scheme where the on-disk format consists of 1 gigantic splay tree with updates being written as a modified sub-tree. The last element you write to disk is always the (latest) root node. Append-only implicitly solves most disk fragmentation concerns. It also has GC capabilities by having the physical log divided into multiple files. The actual "cleanup" is simply taking an old file and feeding its items into the storage engine again. The heuristics for determining when to do this are based upon statistics collected at insert/update time.

Assuming the root node knows how to find the various things you are looking for (i.e. physical offsets to child nodes), this is how you can address the storage.

The trickiest part is finding the latest root node in adverse scenarios (i.e. plug pulled/partial write to disk). You can develop some 2-tier system where a 32-bit magic cookie is scanned for in blocks from back to front, and once it is encountered the relevant offsets are applied and the candidate attempts deserialization+checksum. No partial writes can be recovered in this scheme and wind up as wasted bytes in the log.

At some point I had intended to use this for a work project, but then SQLite came in and ruined my little party.

If you want to read about an option that doesn't suffer from fragmentation (no pedantic replies please), check out LSM Trees which are what RocksDB uses.

Designing Data-Intensive Applications has a VERY clear and understandable chapter about them. Once you've finished reading it, you'll have the tools to whip together a toy implementation for something like this.

Thanks!
The hardest problem with the databases seems to be how it handles a sudden power down event. Most implementations don't pass that test.