Hacker News new | ask | show | jobs
by kgwgk 775 days ago
There is nothing magical about the origin, the shrinkage can be done towards any point and in fact when estimating multiple means it's customary to move each point closer to their average.

https://www.math.drexel.edu/~tolya/EfronMorris.pdf

1 comments

There is something magical about the origin when the result does not respect translational symmetry.

In fact, in a real world setting I would probably use my first measurement to define the origin, having no other reference to reach for.

What does not respect translational symmetry?

You have an estimator. If you apply shrinkage towards the origin you have another estimator. If you apply shrinkage towards [42, 42, ..., 42] you have yet another estimator. Etc. Is it a problem that different estimators produce different results?

That's my understanding as well, FWIW. This is how I would phrase it:

Shrinking helps. In R^d there's no such thing as shrinking generally, only shrinking in the direction of some point. (The point that's the fixed point of the shrinking.) Regardless of what that point is, it's a good idea to shrink.

The James-Stein estimator does not respect translational symmetry. If I do a change of variables x2 = (x - offset), for an arbitrary offset, it gives me a different result! Whereas an estimator that just says I should guess that the mean is x, is unaffected by a change of coordinate system.

This is a big problem if the coordinate system itself is not intended to contain information about the location of the mean.

This makes sense if "zero" is physically meaningful, for example if negative values are not allowed in the problem domain (number of spectators at Wimbledon stadium, etc). Although in that case, my distribution probably shouldn't be Gaussian!

This is what the original paper from Stein says:

"We choose an arbitrary point in the sample space independent of the outcome of the experiment and call it the origin. Of course, in the way we have expressed the problem this choice has already been made, but in a correct coordinate-free presentation, it would appear as an arbitrary choice of one point in an affine space."

The James-Stein estimator in its general form is about shrinking towards an arbitrary point (which usually is not the origin). It respects translational symmetry if you transform that arbitrary point like everything else.

That just means that it's assuming arbitrary additional prior information about the problem, which is different than zero information.
I don't understand what you mean. Who assumes what?

Take any point and shrink your least-squares estimator in that direction. You get an estimator that it's strictly better - in some technical sense - which renders the original estimator inadmissible - in some technical sense.

That's a mathematical fact, it has nothing to do with prior information about the problem.