|
A less popular but perhaps more influential phenomenon is Stein's Paradox [1]. Here's a provocative example often given to illustrate it: Say you have a baseball player, soccer player, and football player, and you wish to estimate the true mean number of home runs, goals, and touchdowns each scores per year. If you have their last ten seasons worth of data for each, then the obvious thing to do, for each player, is to estimate the true yearly mean score for each player by their average yearly score from the last ten years. (E.g., the baseball player hits an average of 20 home runs each year, so let's estimate their true mean yearly home runs by 20). Stein's Paradox says that you can actually do a lot better than this. Even more crazy, the James-Stein Estimator which does this actually uses data about the football player and soccer player to make predictions about the baseball player, (and vice-versa). This is deeply unintuitive to most people since the players aren't related to each other at all. The phenomenon only holds with at least three players; it doesn't work for two. (More generally, Stein's Paradox is the fact that if you have p >= 3 independent Gaussians with a known variance, you can do better in estimating their p-dimensional mean than just using their sample means). I've spent a bunch of time trying to understand why this actually works [2]; to be honest I still don't deeply understand. But nonetheless the consensus is that the same shrinkage phenomenon is what causes improved performance for a variety of high-dimensional estimators, (lasso or ridge regression, e.g.), making the paradox very very influential. [1] https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator
[2] https://www.naftaliharris.com/blog/steinviz/ |