Hacker News new | ask | show | jobs
by rahulcap 2791 days ago
Very interesting article. In the end, I was a bit confused on how you converted the binomial regression to a single number. I understood that the output was a probability that I know each of the 10,000 items, so then did you need to use some cutoff to decide that I "knew it"?

Anyways, I am interested to see what analysis you do after you get more data.

1 comments

Thanks for the interest -- it's actually just a sum of the probabilities for the items from 1 to 10,000. For example, if there's a 0.1 chance you know each of 10 items, it adds up to a total value of 1 -- no cutoff needed.

Mathematically, there's a trick where you don't even need to compute the sum item-by-item... I calculate the binomial regression which gives me the two relevant parameters, from which I can calculate the probability density function (PDF) [1] for an item of given rank. Then I just calculate the associated cumulative distribution function (CDF) with the same two parameters [2] for rank 10,000 -- and that's the final result.

[1] https://en.wikipedia.org/wiki/Probability_density_function

[2] https://en.wikipedia.org/wiki/Cumulative_distribution_functi...