Hacker News new | ask | show | jobs
by mitthrowaway2 775 days ago
The James-Stein estimator does not respect translational symmetry. If I do a change of variables x2 = (x - offset), for an arbitrary offset, it gives me a different result! Whereas an estimator that just says I should guess that the mean is x, is unaffected by a change of coordinate system.

This is a big problem if the coordinate system itself is not intended to contain information about the location of the mean.

This makes sense if "zero" is physically meaningful, for example if negative values are not allowed in the problem domain (number of spectators at Wimbledon stadium, etc). Although in that case, my distribution probably shouldn't be Gaussian!

1 comments

This is what the original paper from Stein says:

"We choose an arbitrary point in the sample space independent of the outcome of the experiment and call it the origin. Of course, in the way we have expressed the problem this choice has already been made, but in a correct coordinate-free presentation, it would appear as an arbitrary choice of one point in an affine space."

The James-Stein estimator in its general form is about shrinking towards an arbitrary point (which usually is not the origin). It respects translational symmetry if you transform that arbitrary point like everything else.

That just means that it's assuming arbitrary additional prior information about the problem, which is different than zero information.
I don't understand what you mean. Who assumes what?

Take any point and shrink your least-squares estimator in that direction. You get an estimator that it's strictly better - in some technical sense - which renders the original estimator inadmissible - in some technical sense.

That's a mathematical fact, it has nothing to do with prior information about the problem.

The article's presentation of the James-Stein estimator sets the arbitrary point at the origin. (My previous comments should be read in this context). Of course, we could set it anywhere, including [42,...]. Let's call it p. Regardless of where you set it, the estimator suggests that your best estimate û, of the mean μ, should be nudged a little away from x and towards p.

My point is that the choice of 'p' (or, in the article's presentation, the choice of origin) cannot truly be arbitrary because if it reduces the expected squared difference between μ and û, then it necessarily contains information about μ. If all you truly know about μ is x and σ, then you will have no way to guess in which direction you should even shift your estimate û to reduce that error.

If you do have some additional information about μ, beyond just x alone, then sure, take advantage of it! But then don't call it a paradox.

(I cannot speak for the original article, I’ve not put the effort to fully understand it so I won’t categorically say it’s wrong but it didn’t seem right to me.)

The “paradox” is that it can truly be arbitrary! Pick a random point. Shrink your least-squares estimator. You got yourself a “better” estimator - without having any additional information.

That’s why the “Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution” paper had the impact that it had.

Then you'll have to clarify what you mean by "random" when you say "pick a random point".

Unless you mean that every point on a spherical surface centered on x would have a lower expected squared error than x itself?