Hacker News new | ask | show | jobs
by ralfd 3362 days ago
I don't understand. If the average of the last 10 seasons is 20 home runs, what would be a better predicted value? You are a bit short in explaining here?

Your site, and the Wiki link, is very math formular heavy. Is there an explanation for someone who forgot all his statistic courses and greek letter thingys?

3 comments

This is maybe a better explanation:

https://jmanton.wordpress.com/2010/06/05/comments-on-james-s...

It's still math heavy, but there is some explanation. It's hard to explain without the math, since the math if fairly integral to it, that's why it's such an amazing discovery. My understanding is that it's saying that the variables are independent, but the measurement is not. So in the case of the athletes, it's not that home runs predicts touchdowns or goals, but that by using a Stein Estimation we would get a more accurate measure of all three in aggregate. The example used in the article is less interesting, but probably better for understanding:

For example, if i=1,...3 represents the financial cost of claims a multi-national insurance company will incur in the next year in three different countries, the company may be less concerned with estimating the values of the individual means accurately and more concerned with getting an accurate overall estimate.

> If the average of the last 10 seasons is 20 home runs, what would be a better predicted value?

You are correct, 20 is the best estimate for this single variable (or similarly for any single variable in isolation).

Only if the objective is to minimize the total MSE (Mean Square Error),

          (Ph - h)² + (Pg - g)² + (Pt - t)²
    MSE = ---------------------------------
                          3
then it pays off to bias each estimate – Ph, Pg, Pt – slightly towards zero. If any of the observed values is larger than the true value, we do improve the estimation by using a correction coefficient slightly under 1. If the observation happens to be smaller than the true value, we do make a mistake. But we make a smaller mistake when the observed value was small because it was small, than what we improve when it was large. A set of 3 independent variables is already large enough that this gamble pays off in average (in the combined total error of the 3 estimates).
Here's my intuition. Let's say you have 1000 coin flippers. They flip a coin 10 times, and none of them has any special powers, and the coin is fair. Some of them will get an equal number of heads as tails, but there's a good chance you'll get tsome who get 9 or 10 heads, and also some who get 9 or 10 tails. As the probability to get 10 heads in a row is 1/1024, if you see one or two guys how get only heads, or only tails, you will attribute that to the natural variability of the outcomes.

Now imagine that these are not coin flippers, but some guys who have some skills to do something, but the outcome has a large variability nonetheless. For example running backs in the NFL league. There are running backs (RB) who average 2 years per carry (ypc), and others who average 5. 5 ypc is stellar by the way, 4 is very good, 3 is decent, and 1 or 2 not so much. But obviously, RBs get a different yardage for each carry. Now, let's say you follow the first 4 games of the season and get the average ypc for each RB. You would like to predict for each RB the average ypc for the rest of the year. The classical statistical estimation is that the current average is the best estimator for the future average, but from the extreme example with the coin flippers above, we know that this is not quite the case. Using a bayesian estimation, we get that a better estimator is if we move the current average towards the overall mean. This is called a shrinkage or James-Stein estimator. In the case of the coin flippers, you move the average all the way to 1/2, and that estimator is correct. In the case of the running backs, you don't shrink that much, and it's a cute exercise in math to see how much you shrink if you assume some distributions around the overall ypc for RBs in the league and around the ypc of an RB given his average ypc.

If you want some further intuition, think of the Sports Illustrated curse. It was observed that NFL players who make it to the cover of the SI magazine are generally "cursed", i.e. they don't do as well after as they did before. One amusing case is the (former) New England Patriot Jonas Gray, who made the cover of SI after a phenomenal game with the Indianapolis Colts in 2014 (201 rushing yards, 4 touchdowns), but then he showed up late to work and was promptly benched for the rest of the season. Generally though, players don't do anything stupid like that, but simply "regress to the mean". That regression to the mean is what explains the shrinkage estimator, and the Stein paradox.