Hacker News new | ask | show | jobs
by rssoconnor 775 days ago
I think the part on "How arbitrary is the origin, really?" is not correct. The origin is arbitrary. As the Wikipedia article points you you can pick any point, whether or not it is the origin, and use the James-Stein estimator to push your estimate towards that point and it will improve one's mean squared error.

If you pick a point to the left of your sample, then moving your estimate to the left will improve your mean squared error on average. If you pick a point to the right of your sample, then moving your estimate to the right will improve your mean squared error as well.

I'm still trying to come to grips with this, and below is conjecture on my part. Imagine sampling many points from a 3-D Gaussian distribution (with identity covariance), making a nice cloud of points. Next choose any point P. P could be close to the cloud or far away, it doesn't matter. No matter which point P you pick, if you adjust all the points from your cloud of samples in accordance to this James-Stein formula, moving them all towards your chosen point P by various amounts, then, on average they will move closer to the center of your Gaussian distribution. This happens no matter where P is.

The cloud is, of course, centered around the center of the Gaussian distribution. As the points are pulled towards this arbitrary point P some will be pulled away from the the center of Gaussian, some are pulled towards the center, and some are squeezed so that they are pulled away from the center in the paralled direction, but squeezed closer in the perpendicular direction. Anyhow, apparently everything ends up, on average, closer to the center of the Gaussian in the end.

I'm not entirely sure what to make of this result. Perhaps it means that mean squared error is a silly error metric?

3 comments

Your visualization helped me understand this! If the center of the distribution is far from P, then all the lines from P to the points in your cluster are basically parallel, and you just shift your point cluster which doesn’t help your estimate. But if P is close to the mean, then it sits near the middle of your cluster, so pulling all points towards P is “shrinking” the cluster more than “shifting” it.
Here are some links that might help visualize what is going on:

https://www.naftaliharris.com/blog/steinviz/

https://www.youtube.com/watch?v=cUqoHQDinCM (this video actually references the original post)

My takeaway is that the volume of points which get worse as they are pulled towards point P exists in some region R. As the number of dimensions increase, region R's volume shrinks as a % of the total cloud volume, making it much more unlikely that a sample is pulled from that region. In other words, you are more likely to sample points which move closer to the center than move away, which is why the estimator is an improvement on average.

You make a valid point, but I feel there is something in the direction the article is gesturing at...

The mean of the n-dimensional gaussian is an element of R^n, an unbounded space. There's no uninformed prior over this space, so there is always a choice of origin implicit in some way...

As you say, you can shrink towards any point and you get a valid James-Steiner estimator that is strictly better than the naive estimator. But if you send the point you are shrinking towards to infinity you get the naive estimator again. So it feels like the fact you are implicitly selecting a finite chunk of R^n around an origin plays a role in the paradox...

> But if you send the point you are shrinking towards to infinity you get the naive estimator again.

You get close to it but strictly speaking wouldn’t it always be better than the naive estimator?

Right, it's a limit at infinity
> There's no uninformed prior over this space, so there is always a choice of origin implicit in some way...

You could use an uninformed improper prior.

You would just need to come up with a way to pick a point at random uniformly from an unbounded space.
You can just use the function that is constantly 1 everywhere as your improper prior.

Improper priors are not distributions so they don't need to integrate to 1. You cannot sample from them. However, you can still apply Bayes' rule using improper priors and you usually get a posterior distribution that is proper.

Sure.

The point is that you wrote that « you can pick any point […] » and when toth pointed out that « there is always a choice of origin implicit in some way » you replied that « you could use an uninformed improper prior. »

However, it seems that we agree that you cannot pick a point using an uninformed improper prior - and in any method for picking a point there will be an implicit departure from that (improper) uniform distribution.

Oh.

When I said "you can pick any point P", I meant universal quantification, i.e "for all points P", rather than a randomly chosen P.

I did say "choose P", which was pretty bad phrasing on my part.