Hacker News new | ask | show | jobs
by theophrastus 3078 days ago
One issue that I nearly always find missing in intro discussions about linear regression is the near universal assumption of no error in the abcissal/"x" values. And while this is true-ish for time series data, (we know for certain which day we collected the data on - yet the same hour every day?), I'd be rich if I had a nickel for every time I saw standard linear regression done when the "x" had significant (and known) error. In which case you're biasing yourself unless you use some sort of 2d regression, like Deming.[1]

[1] https://en.wikipedia.org/wiki/Deming_regression

1 comments

Regression with measurement error is usually treated in much higher level statistics/econometrics classes.

If you're interested in this you can read more in Mostly Harmless Econometrics [1] about adressing this with IV methods

[1] http://www.development.wne.uw.edu.pl/uploads/Main/recrut_eco...

To build on this a little bit more, there are also generalized linear models that allow to specify the reliability (i.e., error level) of a variable.

Regarding, 2SLS models, I find them more useful to account for endogeneity in the model rather than measurement error. After all, measurement error is usually unobserved (otherwise you would just take it out). 2SLS “just” reweigh the point estimates by identifying the good variation in the instrumented variable using the instrument (for example, using draft lottery results to account for the endogenous choice to attend college).