| HN Mirror

I'm the HyperDex developer who did the work on HyperLevelDB. I'll answer your questions in order as best I can.

1. Our internal synchronization mechanism is a simple change. The stock LevelDB does the following:

    place our current on the back of wait_queue
    wait for (our thread to be head of wait_queue || thread ahead of us to do our work)
    if work done: exit
    possibly build a batch of our writes and 
    append data to the log
    insert data into the memtable
    signal the next writer, and any writer whose work we finished

HyperLevelDB does this a little differently. We made the log and the memtable concurrent datastructures, so that multiple threads can write to each one at a time. We then do a little synchronization to ensure that we don't reveal the writes to readers in the wrong order.

    get a ticket, indicating the order of our writes
    insert the data into the log
    insert the data into the memtable
    wait for writes with a lower token to complete

For the actual implementations, check out the code for LevelDB (Lines 1135-1196 of https://github.com/rescrv/HyperLevelDB/blob/28dad918f2ffb80f...) and HyperLevelDB (Line 1307-1428 of https://github.com/rescrv/HyperLevelDB/blob/master/db/db_imp...).

Effectively, this change moves from a model where there is exactly one writer at a time, to one where the bulk of the work (inserting into log/memtable) is done in parallel by writer threads.

2. LevelDB provides a GetProperty call. We can inspect the number of files in Level-0 and back-off where appropriate. There is no write delay in LevelDB itself. By the end-to-end principle, the storage server is in a better position to decide whether to delay writes, or just keep pushing them into the database.