Hacker News new | ask | show | jobs
by SubiculumCode 2988 days ago
There is no need to be rude or yell.

Yes, if your variables are perfectly linearly dependent they get dropped. Did anyone say otherwise? I did not think about this case because most correlated measures causing multicollinearity problems aren't perfectly 'linearly dependent'. Linearly dependency usually only comes up practically if you miscoded some of your independent dummy variables (e.g. adding both 'male[0,1]' and 'not male[0,1]' as two categorical predictors). So I am not really sure of your point.

As to your second point, it might be unbiased but the statistical inference (i.e. p-value) would be incorrect with multi-collinearity..thus again, I am not sure of your point when you are only repeating what I said.

Moreover, it may not be particularly meaningful to the researcher even if the parameter estimate is unbiased. One frequently finds with multicollinearity that the signs of effects will switch (- to +, or + to -) as you add highly correlated predictors into a model, in oft-theoretically questionable ways, but does serve to remind one that the parameter estimates are only meaningful in the context of the other predictors in the model.

1 comments

There's this other thing called the FWL theorem.

As long as the unexplained term is uncorrelated (in the probabilistic model; linear regression will force this to be the case computationally) with the included variables, your coefficients will remain unchanged. So adding/removing variables shouldn't change results at all -- unless the model is mis-specified and you're including variables that correlate with unobserved factors in unexpected ways.

So for example a regression of children's IQ on the income of their parents provides a plausible mechanism; but if you add the arm length of the kids you will have problems, since arm length is correlated to an omitted variable (kids with longer arms are older and perform better on IQ tests).

That's most of the "in context" story. Nothing to do with multicollinearity.

Thanks for the thoughtful comment and reference.

The 'in context' was not so much about multicollinearity but about shared and unique variance.