Hacker News new | ask | show | jobs
by roskilli 1809 days ago
Right exactly. As a point of reference, within M3DB each unique time series has a list of “in-order” compressed timestamp/float64 tuple streams. When a datapoint is written the series finds an encoder that it can append while keeping the stream in order (timestamp ascending), and if no such stream exists a new stream is created and becomes writeable for any datapoints that arrive with time stamps greater than the last written point.

At query time these streams are read by evaluating the next timestamp of all written streams for a block of time and then taking the datapoint with the lowest timestamp of the streams.

M3DB also runs a background tick that targets to complete within a few minutes each run to amortize CPU. During this process each series merges any streams that have sibling streams created due to out of order writes, producing one single in order stream. This is done by the same process used at query time to read the datapoints in order and they are consequently written to a new single compressed stream. This way extra computation due to our of order writes is amortized and only if a large percentage of series are written in time descending order do you end up with a significant overhead at write and read time. It also reduces the cost of persisting the current mutable data to a volume on disk (whether for snapshot or for persisting data for a completed time time window).