| Nice work!
Averaging percentiles is well-known to give terrible results.
Glad to see more people, taking this problem serious, and providing viable alternatives! A note on Accuracy:
At Circonus, we have been using a version of HDR-Histograms [1] for many years to aggregate latency distributions, and calculate accurate aggregated percentiles.
Accuracy was never a problem (worst-case error <5%, usually _much_ better). If I read your evaluation results correctly, you also found HRD-Histograms to be as-accurate or more-accurate, than DDSketches, correct? The differentiator to HDR Histograms seems to be merging speed and size, where DDSketches seem to have an edge. One thing that is not immediately clear to me from reading the paper is, how much of the distribution function can be reconstructed from the sketch?
E.g. for SLO calculations one is often interested in latency bands: "How many requests were faster than 100ms?" [2]. Is it possible to approximate CDF values ("lower counts") from the sketch with low/bounded error? [1] https://github.com/circonus-labs/libcircllhist [2] http://heinrichhartmann.com/pdf/Heinrich%20Hartmann%20-%20La... |