| There seems to be three things they are doing to reach their higher performance: 1. Multi-threaded synchronized write ordering. I'm interested in the internal synchronization mechanism... "LevelDB uses very coarse-grained synchronization which forces all writes to proceed in an ordered, first-come-first-served fashion, effectively reducing throughput to that of a single thread. HyperLevelDB increases concurrency by allowing multiple threads to agree on the order of their respective writes, and then independently apply the writes in a manner consistent with the agreed-upon order." 2. Tuned write delay on compaction. What external instrumentation/markers are they passing into hyperleveldb to tune write delay? "HyperLevelDB removes this artificial delay, allowing the application (in our case, HyperDex) to independently decide to delay writes, using information available outside the scope of LevelDB." 3. Tuned intra-level re-writes. "LevelDB's compaction algorithm is not efficient, and in the "fillrand" benchmark will, on average, rewrite 3MB of data in the upper level for every 1MB of data in the lower level. HyperLevelDB avoids this waste by selecting the compaction with the smallest overhead." |
1. Our internal synchronization mechanism is a simple change. The stock LevelDB does the following:
HyperLevelDB does this a little differently. We made the log and the memtable concurrent datastructures, so that multiple threads can write to each one at a time. We then do a little synchronization to ensure that we don't reveal the writes to readers in the wrong order. For the actual implementations, check out the code for LevelDB (Lines 1135-1196 of https://github.com/rescrv/HyperLevelDB/blob/28dad918f2ffb80f...) and HyperLevelDB (Line 1307-1428 of https://github.com/rescrv/HyperLevelDB/blob/master/db/db_imp...).Effectively, this change moves from a model where there is exactly one writer at a time, to one where the bulk of the work (inserting into log/memtable) is done in parallel by writer threads.
2. LevelDB provides a GetProperty call. We can inspect the number of files in Level-0 and back-off where appropriate. There is no write delay in LevelDB itself. By the end-to-end principle, the storage server is in a better position to decide whether to delay writes, or just keep pushing them into the database.