|
|
|
|
|
by Eridrus
3568 days ago
|
|
The biggest issues of bias/fairness in ML are not to do with the algorithms or results, but the underlying data. A trivial example would be: what if you trained a classifier to predict whether a person would be re-arrested before they went to trial? Some communities are policed more heavily so you would tend towards reinforcing the bias that exists and provide more ammunition to those arguing for further bias in the system, a feedback loop if you would. Or what if some protected group needs a higher down payment because the group is not well understood enough so that you can't distinguish between those who will repay your loans and who won't? Maybe educational achievement is a really good predictor on one group, but less effective on another. Is it fair to use the protected class (or any information correlated with it) when it is essentially machine-enabled stereotyping? Recently it has been noted that NLP systems trained on large corpuses of text tend to exhibit society's biases where they assume that nurses are women and programmers are men. From a statistical perspective this correlation is there, but we tend to be more careful about how we use this information than a machine. We wouldn't want to use this information to constrain our search for people to hire to just those that fulfil our stereotypes, but a machine would. This paper has some details on such issues: http://arxiv.org/abs/1606.06121 I don't think there are any easy solutions here, but I think it's important to be aware that data is only a proxy for reality and fitting the data perfectly doesn't mean you have achieved fair outcomes. |
|