|
|
|
|
|
by yummyfajitas
3549 days ago
|
|
The point is that the bias is linear, and my model is linear, so the model fixes things. The example is synthetic (so we could know what the right answer is and check if we recover it) so of course I put everything in. In the linked article, I explicitly reference a real world case where the same linear model was used to discover that grades and test scores are biased in favor of blacks: http://ftp.iza.org/dp8733.pdf In more complicated situations, the bias would need to be amenable to detection by a neural network, an SVM or random forest. The entire purpose of models like this is that lots of hidden patterns are detected. Even if relevant measurements are unavailable, one can use redundant encoding to fix bias. Delip Rao explains redundant encoding here, for example, though he is more concerned that ML models might learn facts he wants to remain hidden:
http://deliprao.com/archives/129 |
|
In the first of those two sections you yourself note how bias correction is not possible without the relevant data: "If we scrubbed the data this result would be impossible. Running least squares on scrubbed data yields alpha = [ 0.29878373, 0.30869833] - we can't correct for bias because we don't know the variable being biased on."
I'm not disputing that bias correction is possible, only that it can be much harder than you seem to be implying, with statements like "Most algorithms can and will correct for biases in their inputs.", and "Of course data contains biases. But again, please read the article I linked; algorithms will have a tendency to correct that bias."
I have some experience with bias correction in (ocean) weather forecasting, and in that domain there were problems both with the difficulty of modelling the bias structure, and with obtaining measurements reliable enough for bias correction.