Hacker News new | ask | show | jobs
by cocoablazing 2987 days ago
The issue is intra-predictor correlation. In the extreme case that a predictor is duplicated, the correct beta might be {betaa, beta(1-a)} for a in [0, 1], which an algorithm may not estimate in a stable manner. A significant degree of correlation introduces this general problem.
2 comments

... or worse; it is still true for any a. You could easily get {1,000,001, -1,000,000}, which for perfectly clean, precise, representable data is equivalent, but which magnifies any noise/error in one of the predictors by a million. or a billion.
So say you have 3 predictors that have high intra predictor correlation. Can you still pick one of them, and discard the remaning 2? Or you cant pick any one of them?
Using ridge regression (mentioned in TFA) would prefer a (1/3,1/3,1/3) average of those predictors (or a better combination, depending on their respective noises).

Using lasso (also mentioned in TFA) would prefer to pick the best of the three and drop the others.

Using elastic net would be a combination of both.

Note, though, that any method other than simple regression has tuning parameters -- depending on those, you could still end with result equivalent to plain least squares.

You can, but why trash information that is present when you can leverage it with a different approach?
Like PCA? But that way you loose physical meaning of the predictors.
PCA is a special case of factor analysis, so you are representing them as observations of a latent variable (which is often a narrative people use when explaining why two x variables are correlated)