|
|
|
|
|
by wcbeard10
3417 days ago
|
|
The funnel shape of the scatter plot immediately reminded me of an article on the insensitivity to sample size pitfall [0], which points out that you'll expect entities with smaller sample sizes to show up more often in the extremes because of the higher variance. Looks like the tags with the biggest differences exemplify this pretty well. [0]- http://dataremixed.com/2015/01/avoiding-data-pitfalls-part-2... |
|
I originally got on this topic when reading Bayesian Methods for Hackers [1]. I am still hunting for a good method to correct/compensate for this when I am doing these types of comparisons in my own work.
[0] -http://faculty.cord.edu/andersod/MostDangerousEquation.pdf
[1] - https://github.com/CamDavidsonPilon/Probabilistic-Programmin...