|
If there are 100 tanks, and you get 1, 2, 5, and 99, your method would give 54 tanks ((1 + 2 + 5 + 99)/4 * 2), which is obviously wrong. Your error is in stating "if we assume the captured serial numbers are randomly distributed" - you're assuming they're -uniformly- distributed. Randomly distributed != uniformly distributed. Their method would give you 125 as a guess. It's including the known info (i.e., adding "m") to take into account the fact that they're not necessarily evenly distributed. On that note, if you continued to get tanks at low numbers (3, 4, 6, etc), averaging gets -less- accurate, because that 99 becomes more and more of an outlier. Their method gets MORE accurate, again, because they're taking advantage of all data that is known (we know it goes at least to 99), and averaging doesn't. The new low numbers we've added mean that there are less likely to be many tanks, and the formula in the link takes that into account with m/k. Both methods will be accurate if you have 100% of the data, but taking twice the average ignores known data, so the sparser the data the less likely it is to be correct. |
Then the next 50 tanks you find are all from the range [1, 100].
Is it more reasonable to assume that there are around 1258 tanks, or that there are probably closer to 100 tanks, and that first one with the very large serial number was not a sequentially numbered tank?