| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cocoablazing 2987 days ago
	The issue is intra-predictor correlation. In the extreme case that a predictor is duplicated, the correct beta might be {betaa, beta(1-a)} for a in [0, 1], which an algorithm may not estimate in a stable manner. A significant degree of correlation introduces this general problem.

2 comments

beagle3 2987 days ago

... or worse; it is still true for any a. You could easily get {1,000,001, -1,000,000}, which for perfectly clean, precise, representable data is equivalent, but which magnifies any noise/error in one of the predictors by a million. or a billion.

link

MichailP 2987 days ago

So say you have 3 predictors that have high intra predictor correlation. Can you still pick one of them, and discard the remaning 2? Or you cant pick any one of them?

link

beagle3 2987 days ago

Using ridge regression (mentioned in TFA) would prefer a (1/3,1/3,1/3) average of those predictors (or a better combination, depending on their respective noises).

Using lasso (also mentioned in TFA) would prefer to pick the best of the three and drop the others.

Using elastic net would be a combination of both.

Note, though, that any method other than simple regression has tuning parameters -- depending on those, you could still end with result equivalent to plain least squares.

link

cocoablazing 2987 days ago

You can, but why trash information that is present when you can leverage it with a different approach?

link

MichailP 2987 days ago

Like PCA? But that way you loose physical meaning of the predictors.

link

closed 2986 days ago

PCA is a special case of factor analysis, so you are representing them as observations of a latent variable (which is often a narrative people use when explaining why two x variables are correlated)

link