Hacker News new | ask | show | jobs
by wavemode 775 days ago
(Disclaimer, stats noob here) - I thought the point was that, you have a better chance of being -overall- closer to the mean (i.e., the 3D euclidean distance between your guess and the mean would be the smallest, on average), even though you may not necessarily have improved your odds of guessing any of the single individual means.

So it's not that "you get a better estimate of the US wheat yield by estimating also the number of Wimbledon spectators and the weight of a candy bar in a shop", it's simply that you get a better estimate for the combined vector of the three means. (Which, in this case, the vector of the three means is probably meaningless, since the three data sets are entirely unrelated. But we could also imagine scenarios where that vector is meaningful.)

Am I misunderstanding something?

1 comments

You are most likely right.

I am personally bothered by the way it is presented as a "paradox", with the implication that it would have real world applications.

I have zero doubts that you can't improve the estimate of the US wheat yields by looking at some other unrelated things, like candy bars. Presenting the result as if it a real "improvement" is false advertisement.

On the other hand, if we look at related observations, then the improvement is not a paradox at all. Let's say I want to estimate the average temperature in the US and in Europe. They are related, and combining the estimates will result to a better result, to nobody's surprise.

Since when does “paradox” imply real world application?

In your last paragraph, what you’re describing is just inference based on correlation, which is unrelated to this topic.