Hacker News new | ask | show | jobs
by bluecalm 3070 days ago
>>In other words, black defendants actually are more dangerous to release

Yes, but...

>>and there is no magic algorithm that bypasses this fact.

Maybe there is, it's just the method used wasn't able to find it either due to limitations of the method itself, not enough information or bias in the training set.

As a toy example: assuming you only have race and age to make your decision on then to optimize for public safety you need to include race to make good decisions. If you have race, age, number of friends who committed crime then maybe you don't need race anymore. The problem is that we are likely not getting enough data and then race is a proxy for that uncollected (and maybe uncollectable) data.

2 comments

> If you have race, age, number of friends who committed crime then maybe you don't need race anymore.

I understand that your argument is a toy argument so it doesn't make sense to discuss it specifically, but I feel it's important to point out that the issue here isn't just which variables are used to make the decision, but what the decision ends up actually being. That is, maybe you find that you can make a very good decision based solely on age and number of friends who committed crimes, and don't take race into account at all -- but then if this algorithm ends up yielding "yes" to most white people and "no" to most black people (even if your algorithm doesn't use race at all), you haven't solved anything. [edit: "solved anything" was poor word choice on my part, obviously you have solved something, but you remain in the state described by the paper]

Another issue is that while you can "whitewash" variables, it's very difficult to scrub race out entirely. For example, in practice we can't use "committed crimes" as an indicator because we can never know a ground truth: we'd have to use "were convicted of crimes" instead. Unfortunately, you're far more likely to be convicted of a given crime if you're black than if you're white, so you're already mixing race into your variables even if it isn't named. With the disparity in convictions, enforcement, etc., it's very, very difficult to come up with measurable signals that are not in some way already tainted by racial decisions.

>>That is, maybe you find that you can make a very good decision based solely on age and number of friends who committed crimes, and don't take race into account at all -- but then this algorithm ends up yielding "yes" to most white people and "no" to most black people, you haven't solved anything.

It would actually solve the problem. It's ok if I give more "no's" to black people as long as black people are more dangerous in general. It's only not ok if I punish a specific non-dangerous black person just because they are black.

That's what fairness is: you get what you deserve because of your decisions and wrongdoing not because how you look or where you were born. That some groups end up with more convictions is expected and doesn't contradict fairness principle.

>> there is no magic algorithm that bypasses this fact.

> Maybe there is

No, there isn't. We actually have a mathematical proof (which is quite simple) why this is impossible.

Specifically, following conditions can't be true at the same time: 1. groups differ in base rate 2. prediction isn't perfect 3. decision is correct at the same rate for groups 4. decision is correct at the same rate for groups, restricted to positive/negative class.

1 is a brute fact. Your toy example insinuates at 2. 3 is called calibration and what is usually optimized by machine learning. When people say algorithm is unfair, it usually means 4.

https://arxiv.org/abs/1609.05807