|
|
|
|
|
by mrow84
3549 days ago
|
|
To remain with the example in your blog post, your model fixed things because the implicit bias model was correct (linear dependance on race), and the data were available, either directly (via the race variable) in the "What if measurements are biased?" section, or indirectly (via the noisy redundantly-encoded race variables) in the "What if we scrub race, but redundantly encode it?" section. In the first of those two sections you yourself note how bias correction is not possible without the relevant data: "If we scrubbed the data this result would be impossible. Running least squares on scrubbed data yields alpha = [ 0.29878373, 0.30869833] - we can't correct for bias because we don't know the variable being biased on." I'm not disputing that bias correction is possible, only that it can be much harder than you seem to be implying, with statements like "Most algorithms can and will correct for biases in their inputs.", and "Of course data contains biases. But again, please read the article I linked; algorithms will have a tendency to correct that bias." I have some experience with bias correction in (ocean) weather forecasting, and in that domain there were problems both with the difficulty of modelling the bias structure, and with obtaining measurements reliable enough for bias correction. |
|
Machine learning does not have less bias than human researchers. It is simply magnified at scale.
This is fundamentally wrong. Given data on the biasing factor, most algorithms will try to use it and improve things. Sometimes information is unavailable. On net there is a reason why many algorithms will reduce bias, and no particular reason why they would increase it equally in the remaining cases.