Hacker News new | ask | show | jobs
by meanmrmustard92 2994 days ago
It is actually wrong. The assumption is that y is a linear combination of the covariates in X. You can run regressions like y = x + x^2 (i.e. you permit a quadratic relationship) just fine.
1 comments

It's not wrong, it's just a way of looking at things that speaks to the underlying math rather than the full extent of what you can do with it if you extend it with things like kernel methods.

When you use linear regression to fit a model like

  y ~ ax + b(x^2)
what you're technically doing is fitting a linear function with two parameters on two variables. One variable happens to always be equal to the square of the other variable, but, for the purpose of how the model is usually going to be fit, it is still using the same old analytical method that's based in linear algebra.
Fair enough. Mechanically, all you're ever doing when estimating a parameter vector using OLS is projecting Y onto the span of X, and that requires linearity in the sense that Y = XB. But far too often I've met people who've come away thinking OLS is useless because they mistake the linearity in parameters with 'y must be a linear function of x', which is they think is too simplistic, and so they go do more complicated methods when OLS would have been just fine as long as they used polynomials and/or interaction terms.
To me, that's a stellar example of why you probably shouldn't have people who don't even have a basic undergraduate "intro to stats" understanding of the subject doing your statistics work.

I get that it's a potential cause of confusion for someone who has no training in stats. But it's also jargon that describes a useful concept, and that is literally transparent if you do have enough understanding of the math to know what "linear" and "parameters" mean in this context.