| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kgwgk 775 days ago

I don't understand what you mean. Who assumes what?

Take any point and shrink your least-squares estimator in that direction. You get an estimator that it's strictly better - in some technical sense - which renders the original estimator inadmissible - in some technical sense.

That's a mathematical fact, it has nothing to do with prior information about the problem.

1 comments

mitthrowaway2 775 days ago

The article's presentation of the James-Stein estimator sets the arbitrary point at the origin. (My previous comments should be read in this context). Of course, we could set it anywhere, including [42,...]. Let's call it p. Regardless of where you set it, the estimator suggests that your best estimate û, of the mean μ, should be nudged a little away from x and towards p.

My point is that the choice of 'p' (or, in the article's presentation, the choice of origin) cannot truly be arbitrary because if it reduces the expected squared difference between μ and û, then it necessarily contains information about μ. If all you truly know about μ is x and σ, then you will have no way to guess in which direction you should even shift your estimate û to reduce that error.

If you do have some additional information about μ, beyond just x alone, then sure, take advantage of it! But then don't call it a paradox.

link

kgwgk 775 days ago

(I cannot speak for the original article, I’ve not put the effort to fully understand it so I won’t categorically say it’s wrong but it didn’t seem right to me.)

The “paradox” is that it can truly be arbitrary! Pick a random point. Shrink your least-squares estimator. You got yourself a “better” estimator - without having any additional information.

That’s why the “Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution” paper had the impact that it had.

link

mitthrowaway2 775 days ago

Then you'll have to clarify what you mean by "random" when you say "pick a random point".

Unless you mean that every point on a spherical surface centered on x would have a lower expected squared error than x itself?

link

kgwgk 775 days ago

We may be talking about different things.

Let's say that you have a standard multivariate normal with unknown mean mu = [a, b, c].

The usual maximum-likelihood estimator of the unknown mean when you get an observation is to take the observed value as estimate. If you observe [x, y, z] the "naive" estimator gives you the estimate mû = [x, y, z].

For any arbitrary point [p, q, r] you can define another estimator. If you observe [x, y, z] this "shrinkage" estimator gives you an estimate which is no longer precisely at [x, y, z] but is displaced in the direction of [p, q, r]. For simplicity let's say the resulting estimate is mû' = [x', y', z'].

Whatever the choice you make for [p, q, r] the "shrinkage" estimator has lower mean squared error than the "naive" estimator. The expected value of (x'-a)²+(y'-b)²+(z'-c)² is lower than the expected value of (x-a)²+(y-b)²+(z-c)².

link