Hacker News new | ask | show | jobs
by MichailP 2983 days ago
Thanks for the answer. And what is the correct approach here, if you can only chose/not chose predictor in final set? Discard all multicollinear predictors or pick just one of them?
2 comments

Keeping just to linear regression. If those variables are measuring the same construct, pick the best one or use a method to combine their scores. If they measure different constructs but are very correlated, then you'd need to drop one..depending on the variance inflation factor...which you can test for.

As the article mentions however, there are regression methods meant for these situations (e.g. ridge regression).

One thing that should be mentioned though is in the case of polynomials e.g. y ~ x + x^2, there will be a lot of multicollinearity between these terms, but that multicollinearity is OK...just be sure to center your variables.