|
|
|
|
|
by ajnin
1831 days ago
|
|
My intuitive answer to that problem was to say that if we assume the captured serial numbers are randomly distributed, and the numbering starts at 1, then they will have the same average as all the numbers, so the estimate should be the average of captured serial numbers times 2. Which gives a result close to the formula used in this article, but not the same. I'm not sure where is the flaw. |
|
Your error is in stating "if we assume the captured serial numbers are randomly distributed" - you're assuming they're -uniformly- distributed. Randomly distributed != uniformly distributed.
Their method would give you 125 as a guess. It's including the known info (i.e., adding "m") to take into account the fact that they're not necessarily evenly distributed.
On that note, if you continued to get tanks at low numbers (3, 4, 6, etc), averaging gets -less- accurate, because that 99 becomes more and more of an outlier. Their method gets MORE accurate, again, because they're taking advantage of all data that is known (we know it goes at least to 99), and averaging doesn't. The new low numbers we've added mean that there are less likely to be many tanks, and the formula in the link takes that into account with m/k.
Both methods will be accurate if you have 100% of the data, but taking twice the average ignores known data, so the sparser the data the less likely it is to be correct.