| HN Mirror

> In fact, with linear models it is mathematically impossible to make a "worse" model (in terms of mean squared error) by including more variables (like gender, age, race, etc...).

That's only true if you mean the mean squared error on the training data, which is not usually a good indicator of model quality. Instead you should use the mean squared error on test data, which gets worse if you add non-predictive variables to the input.

If there are non-predictive variables, the linear model with the lowest expected square error should assign exactly zero weight to them, equivalent to the situation where those variables don't exist. But training on a finite sample, that "exactly zero" outcome is extremely unlikely (as in, the probability is 0) if the non-predictive variables vary at all. That variance allows identifying individual data points, even though the relationship is completely random and doesn't help generalize to unseen data. In other words, the model overfits to noise.