Hacker News new | ask | show | jobs
by mitthrowaway2 776 days ago
Sorry, I'm siding with the physicists here. If you're going to declare that your seemingly arbitrary choice of coordinate system is actually not arbitrary and part of your prior information about where the mean of the distribution is suspected to be, you have to put that in the initial problem statement.
2 comments

There is nothing magical about the origin, the shrinkage can be done towards any point and in fact when estimating multiple means it's customary to move each point closer to their average.

https://www.math.drexel.edu/~tolya/EfronMorris.pdf

There is something magical about the origin when the result does not respect translational symmetry.

In fact, in a real world setting I would probably use my first measurement to define the origin, having no other reference to reach for.

What does not respect translational symmetry?

You have an estimator. If you apply shrinkage towards the origin you have another estimator. If you apply shrinkage towards [42, 42, ..., 42] you have yet another estimator. Etc. Is it a problem that different estimators produce different results?

That's my understanding as well, FWIW. This is how I would phrase it:

Shrinking helps. In R^d there's no such thing as shrinking generally, only shrinking in the direction of some point. (The point that's the fixed point of the shrinking.) Regardless of what that point is, it's a good idea to shrink.

The James-Stein estimator does not respect translational symmetry. If I do a change of variables x2 = (x - offset), for an arbitrary offset, it gives me a different result! Whereas an estimator that just says I should guess that the mean is x, is unaffected by a change of coordinate system.

This is a big problem if the coordinate system itself is not intended to contain information about the location of the mean.

This makes sense if "zero" is physically meaningful, for example if negative values are not allowed in the problem domain (number of spectators at Wimbledon stadium, etc). Although in that case, my distribution probably shouldn't be Gaussian!

This is what the original paper from Stein says:

"We choose an arbitrary point in the sample space independent of the outcome of the experiment and call it the origin. Of course, in the way we have expressed the problem this choice has already been made, but in a correct coordinate-free presentation, it would appear as an arbitrary choice of one point in an affine space."

The James-Stein estimator in its general form is about shrinking towards an arbitrary point (which usually is not the origin). It respects translational symmetry if you transform that arbitrary point like everything else.

That just means that it's assuming arbitrary additional prior information about the problem, which is different than zero information.
You can put the origin anywhere and for almost all choices the adjustment is almost zero. But if the choice happens to be very close to the sample point, against all (prior) probabilities, then that fact affects the prior.
Not quite: If the origin is within a standard deviation of |x|² or so (depending on D), then term inside the ReLU is negative, and the adjustment is exactly zero. If the origin is moderately far away from x, then the adjustment is large. If the origin is a vast distance from x, then the adjustment is small in relative terms, but not in absolute terms. The scaling factor approaches zero for large |x| but the displacement between x and û converges toward a constant.

Either way, this is absurd unless we have some additional background information about μ other than our sample x itself. But it's easy to resolve the paradox: Since the choice of origin is arbitrary (unless it isn't!), select our coordinate system such that x = 0, then the adjustment is also zero, and then the James-Stein estimator agrees that û = x = 0.