Hacker News new | ask | show | jobs
by noamhacker 3313 days ago
How do you test a system like this for accuracy? Is this done by simulating millions of unique requests?
3 comments

The algorithm's accuracy is known. From the wiki[1]:

    The HyperLogLog algorithm is able to estimate 
    cardinalities of > 10^9 with a typical error rate of 2%
[1] https://en.wikipedia.org/wiki/HyperLogLog
But what about the implementation accuracy? :)
Tests against both historical and synthetic datasets.
Reddit probably has enough analytics to be able to show mathematically that it will be accurate without simulating any requests.
Can't you just use Apache Benchmark and some proxies?