|
|
|
|
|
by mmcgrana
6003 days ago
|
|
The database and metadata are not locked during compaction, at least to the extent that you seem to think they are. When a compaction is requested, the compact query enters the write queue. When the query reaches the head of queue, an asynchronous compaction is started on a snapshot of the database at that time, to a file in /tmp. Also, a buffer is added to the database metadata in which subsequent queries will be stored. After this essentially instant operation, the database can proceed to process write queries as normal. While the compaction is ongoing, write queries are appended to the buffer mentioned above. When the compaction thread that was spawned earlier finishes writing its snapshot, it inserts into the write queue a request to finalize compaction. When this request reaches the head of the queue, it appends all the buffered queries to the compacted database file. Finally, it swaps the compacted file in /tmp to the regular database path. So writes are blocked once for an instant and once for however long it takes to write those buffered queries, which shouldn't be long either. Note that reads are never blocked by compaction; indeed they are never blocked in general. |
|
Why not just open a new file at compaction-start instead of an in-memory buffer? When compaction ends, append the newly open file to the compacted file, then swap-in the compacted file as the current log file.
I suppose deciding on whether to buffer in memory or on disk would depend on several factors:
1) how much compaction is required and thus how long compaction might take to complete
2) historical write-rate average
3) buffer size threshold
4) compaction time threshold
By thresholding I mean: start buffering in memory and then switch to a file on disk if compaction starts taking "too long" to complete or the buffer in memory becomes "too large".