|
|
|
|
|
by eridius
3549 days ago
|
|
Regarding your last paragraph, there's two different angles here. The "machine learning is racist" angle I think is quite valid, but covers a different topic than what we've been discussing here. To be more specific, there's two different ways in which we can have racist models: 1. The algorithm is biased in a way that reflects reality but does not reflect how we wish it to behave. This is the "machine learning is racist" angle. A lending algorithm might quite rightly think that black people are a higher risk, but this is ethically problematic to act on, because denying loans to black people only serves to compound the social problem (even though it may make financial sense for your bank). 2. What I'm arguing is that we can have racist algorithms due to the fact that the data itself may be biased in a way you're not aware of. To take the red shirt example, something I forgot to say before was that if, say, a fad spreads among the black community of wearing red shirts, then you're going to see an uptick in arrests of black people, but your algorithm won't be able to figure out that this is actually due to arresting red-shirted people, which means it will believe that black people in general are more likely to be arrested. |
|
(1) is only possible if your data provides access to the biasing variable, perhaps via redundant encoding. This is the standard critique folks make.
As per (1), the biasing variable is available. Now if the algorithm is expressive enough to describe the functional form of the bias (e.g. the bias is quadratic, and the model includes quadratic terms), it will fix that bias.
You're right that there are lots of hidden variables that we can't use in a predictor. Murderous intent and mafia membership are also not available as predictive factors. You could build a more accurate model if you had that data. So what?