|
|
|
|
|
by zawerf
2484 days ago
|
|
No background in this, my layman interpretation is: Track a histogram with log-scaled bins. These are easy to increment and only takes up size of the number of bins in terms of memory. Two histograms from different servers can easily be merged together since they use the same bins. But then I got completely lost in the math to show why this guarantees anything. What's actually preventing the scenario where everything falls into the same or just a few bins? (as so happens in the real world where all your data is of a relatively similar order of magnitude) For example what would happen if my data is random from 1 to 10? And on a different server random 1000 to 1010? |
|
Less than that, with the appropriate compression scheme (sparse encoding).
> But then I got completely lost in the math to show why this guarantees anything.
You get guarantees on _relative_ error:
- 100 to 110 is a 10% error
- 1000 to 1010 is a 1% error
Depending on your logarithmic base, you can detect errors up to a certain relative %-tage.