Hacker News new | ask | show | jobs
by Lukassus 3313 days ago
The HyperLogLog was a very nice article, but I wanted to ask, is this related to the estimation of Naci tanks during WW2 by Allies? https://www.wired.com/2010/10/how-the-allies-used-math-again...
1 comments

The German Tank Problem guesses the size of a set, given a limited sample and successive serial numbers. If they had randomized the serial numbers it wouldn't have worked.

HyperLogLog is different because you have the entire population (not just a sample), and it's a multiset (the same element can appear more than once). Getting the size of a (non-multi) set is easy, you just keep a counter and increment it for each element; it only takes enough memory to maintain the counter. Counting the distinct members of a multiset takes a lot more memory because you have to remember whether you've already seen a particular element or not.

The tl;dr is that the German Tank Problem is about making an estimate of size when you have imperfect information, and HyperLogLog gives you an estimate when you have perfect information, but it's too expensive to make an exact calculation.