Hacker News new | ask | show | jobs
by strbean 2373 days ago
Your aside is pretty much dead-on the big ethical issue with bias in ML right now.

For example, ML can do quite a good job of predicting recidivism rates in convicts, and justice systems have been using this to aid in sentencing and parole hearings. Obviously, these ML approaches are not supposed to consider ethnicity. So the factor that ends up having the greatest weight is "did your father / grandfather spend time in prison", which is an extremely effective proxy for "are you not white".

Basically, when your training data is based on a reality already heavily influenced by bias, your models will end up reflecting and perpetuating that bias.

1 comments

The real problem is that there is an actual racial disparity in recidivism rates, so an algorithm that makes accurate predictions will predict the racial disparity that actually exists. There is no way to solve that without significantly impairing the accuracy of the predictions -- which is to say releasing convicts who we know have an unreasonably high probability of recidivism merely because there were too many other convicts with an unreasonably high probability of recidivism who were the same race.

You can also imagine what happens if you apply this recidivism "adjustment" to gender, which causes a lot of the people advocating it in the case of race to become nervous and defensive.

Accuracy is not the top objective in these systems, fairness is.
In this example, what is fairness, if not the most accurate prediction possible?
Especially when you consider fairness to the community at large. Is it fair to black neighborhoods if we send proportionally more expected recidivist drug dealers and rapists back into their communities than we do to white communities?
Fairness is judging a case based on its merits, rather than correlations between other dimensions that are connected with systematic bias.

I should be judged based on the interpretation of my situation, not because someone who lives in a similar neighborhood was previously a bad bet.

Just to start with:

1. Not punishing someone for the sins of their family.

2. Not punishing someone for the unfair treatment that their family suffered in the past.

The effect of its use on policy.
That is incredibly vague. What effect and what policy?
most of the standard metrics of fairness for machine learning don't just just try to equalize proportions of positive/negative labels. they look at error rates.

under these measures of fairness, a perfectly accurate predictor is regarded as perfectly fair, regardless of a disparity in base rates in the two populations.

some of the predictive policing models still fail under these metrics -- they are more prone to make errors on black defendants.

> under these measures of fairness, a perfectly accurate predictor is regarded as perfectly fair, regardless of a disparity in base rates in the two populations.

Unless your predictor is perfectly accurate, the errors will be proportional to the base rate. If you're predicting that more X will do Y then you have more chances to be wrong.

Improving accuracy is the only real way to reduce the error rate. If you can't do that then you're left with malicious nonsense like fudging the base rate, which is just trading false positives for false negatives and not actually making anything better.