| Hi, I work at Ripple and we've just gotten done putting some
finishing touches on a new open source key/value database
written in C++. Its called NuDB, and its got these features: * Header-only, C++11 * For SSDs or high-IOPS devices * Low memory footprint, zero caches! * Performance independent of growth! * Insert-only (no update or delete) * Data size up to 2^32-1 * Database size up to 2^64-1 * Fault tolerant (uses a rollback file) * Concurrent, fast reads This database keeps the SAME performance no
matter how big the data set grows! We were using RocksDB which was fairly good but as the size
of our distributed ledger grew, the performance started to
go down. Some investigation showed that RocksDB allocates
memory to cache the various bloom filters and indexes that
it needs to implement the log-structured merge algorithm. We took a step back and said, if we're going to have an
insert-only database that is huge (hundreds of terabytes),
with a random access pattern (keys uniformly random
distributed, for when the key is a cryptographic digest
like SHA256 of the data), then no amount of RAM for caches
is going to help. The thinking was to write a new key/value store that implements
a hash table but on disk. Every lookup for an item would
require on average, only a single I/O to read the block from
the key file (and subsequent I/O to read the value). This
is by no means novel, there are other implementations that
do this such as Berkeley DB, Sparkey, et. al. But we believe
we have invented something new, in the treatment of full buckets,
that is performing spectacularly in our production Ripple
environment, and we'd like to share it with you: https://github.com/vinniefalco/NuDB |