|
|
|
|
|
by mrow84
3549 days ago
|
|
I don't really understand that claim. You are explicitly adding a bias that is linearly dependant on your race variable, and then allowing your regression to recover that bias by introducing noisy measurements of race (which you as the modeller knew was the thing causing the bias). As you say, that is unsurprising. That result does not, however, address my point, which is that if the structure of the bias is difficult to understand, or perhaps even just difficult to model, and if relevant measurements (with errors that are uncorrelated with your original errors) are unavailable, then bias correction is essentially impossible. |
|
In the linked article, I explicitly reference a real world case where the same linear model was used to discover that grades and test scores are biased in favor of blacks: http://ftp.iza.org/dp8733.pdf
In more complicated situations, the bias would need to be amenable to detection by a neural network, an SVM or random forest. The entire purpose of models like this is that lots of hidden patterns are detected.
Even if relevant measurements are unavailable, one can use redundant encoding to fix bias. Delip Rao explains redundant encoding here, for example, though he is more concerned that ML models might learn facts he wants to remain hidden: http://deliprao.com/archives/129