| Very nice work, and the wiki is also quite nice -- I wish more projects had a page like https://github.com/facebook/rocksdb/wiki/Rocksdb-Architectur.... It's really nice to see a clear, terse summary of what makes this project interesting relative to its predecessors. At my company (scalyr.com), we've built a more-or-less clone of LevelDB in Java, with a similar goal of extracting more performance on high-powered servers (and better integration with our Java codebase). I'll be digging through rocksdb to see what ideas we might borrow. A few things we've implemented that might be interesting for rocksdb: * The application can force segments to be split at specified keys. This is very helpful if you write a block of data all at once and then don't touch it for a long time. The initial memtable compaction places this data in its own segment and then we can push that segment down to the deepest level without ever compacting it again. It can also eliminate the need for bloom filters for many use cases, as you often wind up with only one segment overlapping a particular key range. * The application can specify different compression schemes for different parts of the keyspace. This is useful if you are storing different kinds of data in the same database. * We don't use timestamps anywhere other than the memtable. This puts some constraints on snapshot management, but streamlines get/scan operations and reduces file size for small values. Do you have benchmarks for scan performance? This is an important area for us. I don't have exact figures handy, but we get something like 2GB/second (using 8 threads) on an EC2 h1.4xlarge, uncached (reading from SSD) and decompressing on the fly. This is an area we've focused on. I'd enjoy getting together to compare notes -- send me an e-mail if you're interested. steve @ (the domain mentioned above). |