Hacker News new | ask | show | jobs
by jsmith45 2230 days ago
MSSQL does this with a two pronged approach.

If a dead record is encountered in a scan, it is queued up to cleaned up by a background task, that runs every 5 seconds. This typically happens soon enough that the record is still cached, so no extra read I/O is needed, and in any case the location of the record is already known, so no extra scans are needed.

To handle other records, it maintains a bitmap of pages with ghost records. Each time it is awakened, it cleans up the queue, and then does up to 10 pages indicated in the bitmap. Then it goes back to sleep, and only wakes up again 5 seconds later.

This super slow cleanup of dead records that have not been recently scanned helps to avoid overloading the system, but does ensure cleanup will eventually occur, even if the table never gets scanned.

Pedantic note: (Technically it is a bytemap, as the bit is part of a larger structure, that holds other flags about a page.)