Hacker News new | ask | show | jobs
by heinrichhartman 2484 days ago
> number of bins in terms of memory

Less than that, with the appropriate compression scheme (sparse encoding).

> But then I got completely lost in the math to show why this guarantees anything.

You get guarantees on _relative_ error:

- 100 to 110 is a 10% error

- 1000 to 1010 is a 1% error

Depending on your logarithmic base, you can detect errors up to a certain relative %-tage.

1 comments

That makes a lot sense, thank you! I was stuck on the visualization analogy where choosing the wrong axes for the histogram makes it useless (e.g., when all the data falls into the same bin you can't see the shape of the distribution). But for answering quantile, this doesn't matter. Even if P1 and P99 are in the same bin you still know what their values are up to a multiplicative constant. Fantastic!