| It uses a background thread to do most of the work, and it appears the 7ns latency numbers are a little cooked: 1. The paper's 7ns like number is 8ns for microbenchmarks but 18ns in applications. The 7ns number I'm guessing is microbenchmarks, and the true application level number is prob more in the 17ns range. 2. It isn't precisely clear what that is measuring. The says that is the invocation time of the logging thread. Considering the thread making the call to log just passes most of the work to a background threads through a multi-producer single consumer queue of some sort, this is likely the time to dump it in the queue. So you really aren't logging in 7ns. The way I'm reading this is you're dumping on a queue in 17ns and letting a background thread do the actual work. The workload is cut down by preprocessing the creating a dictionary of static elements do reduce the I/O cost of the thread doing the actual writing (I assume this just means take the format strings and index them, which you could build at runtime, so i'm not sure the pre-processing step is really needed). My logger than dumps binary blobs onto a ring buffer for another process to log might be able to beat this invocation latency. This isn't really groundbreaking. I know a few place that log the binary blobs and format them later. None of them do the dictionary part, but when that is going to a background thread, I'm not sure how much that matters. |
And yes, it boils down to writing data to a message queue. Most of the overhead is probably the call to the hardware timestamp counter.