| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eridius 3549 days ago

Regarding your last paragraph, there's two different angles here. The "machine learning is racist" angle I think is quite valid, but covers a different topic than what we've been discussing here. To be more specific, there's two different ways in which we can have racist models:

1. The algorithm is biased in a way that reflects reality but does not reflect how we wish it to behave. This is the "machine learning is racist" angle. A lending algorithm might quite rightly think that black people are a higher risk, but this is ethically problematic to act on, because denying loans to black people only serves to compound the social problem (even though it may make financial sense for your bank).

2. What I'm arguing is that we can have racist algorithms due to the fact that the data itself may be biased in a way you're not aware of. To take the red shirt example, something I forgot to say before was that if, say, a fad spreads among the black community of wearing red shirts, then you're going to see an uptick in arrests of black people, but your algorithm won't be able to figure out that this is actually due to arresting red-shirted people, which means it will believe that black people in general are more likely to be arrested.

1 comments

yummyfajitas 3549 days ago

Here's my point.

(1) is only possible if your data provides access to the biasing variable, perhaps via redundant encoding. This is the standard critique folks make.

As per (1), the biasing variable is available. Now if the algorithm is expressive enough to describe the functional form of the bias (e.g. the bias is quadratic, and the model includes quadratic terms), it will fix that bias.

You're right that there are lots of hidden variables that we can't use in a predictor. Murderous intent and mafia membership are also not available as predictive factors. You could build a more accurate model if you had that data. So what?

link

eridius 3549 days ago

The problem with (2) isn't just that your model isn't as precise as it could be, it's that your model may be inadvertently biased because all of the data that it was fed was biased. This comment (https://news.ycombinator.com/item?id=12625917) gives a good example of that one. No amount of expressivity in the algorithm will account for the fact that the Friendface model (read the comment) was trained on a predominately white userbase versus FaceSpace's model which is trained on a predominately urban black userbase.

link