|
|
|
|
|
by twa927
3570 days ago
|
|
Thanks for the analysis of their benchmark, I wanted to view the details by myself but it required creating an account on their page. > There is no superior "compression" technology Isn't it feasible to employ special encoding for time series data? For example, to encode a series of timestamps like 1473333629, 1473333630, 1473333631 you could encode it as 1473333629, +1, +2 (where +1, +2 are encoded in one byte). And there are many cases of such metrics with adjacent values, like averages, counters. |
|
(On the other hand, columnar storage also has a bunch of tradeoffs/downsides so it's not a superior choice for every db product.)
My point about no "superior compression technology here" was specific to the linked benchmark. I.e. the lack of this potential optimization in cassandra does not appear to be the reason for the space blowup in the benchmark, but rather that they're duplicating the series ID for each sample.