Hacker News new | ask | show | jobs
by mmcgrana 6003 days ago
The database and metadata are not locked during compaction, at least to the extent that you seem to think they are.

When a compaction is requested, the compact query enters the write queue. When the query reaches the head of queue, an asynchronous compaction is started on a snapshot of the database at that time, to a file in /tmp. Also, a buffer is added to the database metadata in which subsequent queries will be stored. After this essentially instant operation, the database can proceed to process write queries as normal. While the compaction is ongoing, write queries are appended to the buffer mentioned above. When the compaction thread that was spawned earlier finishes writing its snapshot, it inserts into the write queue a request to finalize compaction. When this request reaches the head of the queue, it appends all the buffered queries to the compacted database file. Finally, it swaps the compacted file in /tmp to the regular database path. So writes are blocked once for an instant and once for however long it takes to write those buffered queries, which shouldn't be long either. Note that reads are never blocked by compaction; indeed they are never blocked in general.

3 comments

Thank you for explaining how compaction works in greater detail. As I understand it, the "buffer" used for write queries (and more) while compaction is active is in-memory, correct, an ArrayList? http://github.com/mmcgrana/fleetdb/blob/master/src/clj/fleet...

Why not just open a new file at compaction-start instead of an in-memory buffer? When compaction ends, append the newly open file to the compacted file, then swap-in the compacted file as the current log file.

I suppose deciding on whether to buffer in memory or on disk would depend on several factors:

1) how much compaction is required and thus how long compaction might take to complete

2) historical write-rate average

3) buffer size threshold

4) compaction time threshold

By thresholding I mean: start buffering in memory and then switch to a file on disk if compaction starts taking "too long" to complete or the buffer in memory becomes "too large".

How do you recover from a compaction thread that hung or silently died? You could end up w/ quite a buffer to append in one go.

edit: very nice project, btw. I always fancied writing a db in lisp.

Right, there are actually two potential problems here: 1) the initial compaction takes a long time and there are a correspondingly large amount of buffered writes to append and 2) the compaction never finishes, either because compaction hung or died - and then we are adding to the buffer indefinitely.

I don't think that 1) will be that much of a problem: the amount of writes that the db can process in the time it takes to do a compaction of even a large db should be small compared to the rate at which they can be appended to the compacted file.

2) is more problematic; I'll need to add a timeout-like guard to prevent a runaway write buffer.

this is more or less exactly like the BGREWRITEAOF redis command. I wonder if we both happened to think at the same solution or if you in some way know about Redis and got some idea from it. I could love to know that some Redis algorithm could be of general use for other DBs implementation.

Thanks for your work, and welcome to the in-memory database developers crew ;)