Hacker News new | ask | show | jobs
by sean_anandale 3070 days ago
So in sum: "analyzing data from Broward County, we find that optimizing for public safety yields stark racial disparities; conversely, satisfying past fairness definitions means releasing more high-risk defendants, adversely affecting public safety."

In other words, black defendants actually are more dangerous to release and there is no magic algorithm that bypasses this fact.

6 comments

>>In other words, black defendants actually are more dangerous to release

Yes, but...

>>and there is no magic algorithm that bypasses this fact.

Maybe there is, it's just the method used wasn't able to find it either due to limitations of the method itself, not enough information or bias in the training set.

As a toy example: assuming you only have race and age to make your decision on then to optimize for public safety you need to include race to make good decisions. If you have race, age, number of friends who committed crime then maybe you don't need race anymore. The problem is that we are likely not getting enough data and then race is a proxy for that uncollected (and maybe uncollectable) data.

> If you have race, age, number of friends who committed crime then maybe you don't need race anymore.

I understand that your argument is a toy argument so it doesn't make sense to discuss it specifically, but I feel it's important to point out that the issue here isn't just which variables are used to make the decision, but what the decision ends up actually being. That is, maybe you find that you can make a very good decision based solely on age and number of friends who committed crimes, and don't take race into account at all -- but then if this algorithm ends up yielding "yes" to most white people and "no" to most black people (even if your algorithm doesn't use race at all), you haven't solved anything. [edit: "solved anything" was poor word choice on my part, obviously you have solved something, but you remain in the state described by the paper]

Another issue is that while you can "whitewash" variables, it's very difficult to scrub race out entirely. For example, in practice we can't use "committed crimes" as an indicator because we can never know a ground truth: we'd have to use "were convicted of crimes" instead. Unfortunately, you're far more likely to be convicted of a given crime if you're black than if you're white, so you're already mixing race into your variables even if it isn't named. With the disparity in convictions, enforcement, etc., it's very, very difficult to come up with measurable signals that are not in some way already tainted by racial decisions.

>>That is, maybe you find that you can make a very good decision based solely on age and number of friends who committed crimes, and don't take race into account at all -- but then this algorithm ends up yielding "yes" to most white people and "no" to most black people, you haven't solved anything.

It would actually solve the problem. It's ok if I give more "no's" to black people as long as black people are more dangerous in general. It's only not ok if I punish a specific non-dangerous black person just because they are black.

That's what fairness is: you get what you deserve because of your decisions and wrongdoing not because how you look or where you were born. That some groups end up with more convictions is expected and doesn't contradict fairness principle.

>> there is no magic algorithm that bypasses this fact.

> Maybe there is

No, there isn't. We actually have a mathematical proof (which is quite simple) why this is impossible.

Specifically, following conditions can't be true at the same time: 1. groups differ in base rate 2. prediction isn't perfect 3. decision is correct at the same rate for groups 4. decision is correct at the same rate for groups, restricted to positive/negative class.

1 is a brute fact. Your toy example insinuates at 2. 3 is called calibration and what is usually optimized by machine learning. When people say algorithm is unfair, it usually means 4.

https://arxiv.org/abs/1609.05807

If a group of people have, statistically, higher-than-average recidivism rates, should we be punishing all members of that group? That all but guarantees unfair treatment of individuals even if it makes statistical sense.

Even in your post you go from a statement that amounts to "statistically some groups have a larger number if dangerous individuals" to "black people are more dangerous". The two sentences do not mean the same thing!

You've got it backwards, though. The algorithm isn't saying "this person is black and therefore shouldn't get bail". It's saying "this person shouldn't get bail (according to a calculated flight risk based on bunch of reasonable criteria)", and a disproportionate percentage of the people who are assessed as high flight risks just happen to be black.
And the dataset that this algorithm derives its predictions from is presumably a real-world dataset, i.e. one where black people form a disproportionately large portion of convicts and recidivists.

The point stands: using statistics to meter out justice IMO amounts to collective punishment. Of course that leaves the question if whether more conventional methods are any better, but now we're opening up a new can of worms, namely what is the goal of criminal justice systems and how should those goals be achieved...

So maybe, rather than sticking our collective head in the sand and saying "no, it's impossible that black people are more likely to break bail or reoffend", we say "holy crap that's obviously caused by something" and try to address the root cause?

You cannot fix a problem by pretending it doesn't exist.

There's a difference between saying a disproportionately large number of recidivists are black and saying black people are more likely to reoffend. The latter frames the issue in terms of "what black people are like", and worse, gives the impression any given black person is more of a criminal than any given white person.

I'm not usually such a stickler for language, but in this instance it does seem to me that this kind of profiling serves to perpetuate that narrative of criminality being a feature of a group of people, and doesn't help to get at the causes.

> In other words, black defendants actually are more dangerous to release and there is no magic algorithm that bypasses this fact.

You are right, but that's not the problem with the algorithm.

A critical assumption with large-scale data mining is that past trends continue - the problem is that the existing data fits the algorithm. It is just a conservative What-If decision maker operates on existing facts (i.e bad present day situation), just wrapped into code (or worse, encoded as opaque literal "biases" in a decision tree).

I see somewhat similar patterns in lending interest data (redline zipcode -> credit ratings) and the problem is that bigger the historical trend data, the less forgiving a "past trends" algorithm will end up being.

Since this ends up being a prisoner's dilemma, if you are a rational actor in this system and the system keeps playing a defect card on you, then the obvious move is to always defect - cut your losses.

Algorithms can't improve the job prospects of the people released. And that's not the algorithm's fault.

And therefore, the you're right - the algorithm can't change the world beyond its output result.

> A critical assumption with large-scale data mining is that past trends continue

Not necessarily. It would be easy to use only recent data and ignore data from decades ago. The prediction accuracy wouldn't go down much as long as there is a large enough sample size in the recent data. It may even go up if the old data really is inapplicable now.

Then if a past trend stops happening, the old data gets purged eventually and only the post-trend data is considered.

Which is why we shouldn't build our lives or policy around simple algorithms that do not take into account the breadth of human values, philosophy, and desire for change beyond the bad present day.
You know the phrase "round up the usual suspects"?

It's the same root idea: the goal of the system is not to find the person who committed the crime. The goal of the system is, for each crime, to find a person who can be nailed for it. And you can manufacture a class of people such that whenever you need an offender you can go grab some of them and stand a good chance at getting a guilty plea or a conviction.

You start by selectively hyper-enforcing small violations against a chosen subset of the population. Get them for minor traffic violations (which you can ensure turn into arrest warrants by setting the fines and fees and other costs high enough!), get them for "paraphernalia" offenses (where you assume everyday objects in Person A's possession are evidence of drug habits while they wouldn't be assumed evidence in the possession of Person B!), get them for all sorts of things.

Now whenever there's a crime you just go pick some random people from the demographic you've been doing this to, figure out who you can bribe to testify against whom, and then march into the courtroom with witnesses and a defendant who has a thick file of previous "encounters" with the system, and off to a cell they go for a while.

And it gets even easier each go-round: people with criminal records don't usually have the resources to move on and rebuild their lives, so once they're released they're going to be right back where you found them last time. And they've got an even longer record now, which is just proof that you've been doing a great job in figuring out who these dangerous recidivists are! The model works!

People who downvoted this (original comment is at -1): seriously, just go do some research on over-policing and corruption in police departments and prosecutors' offices.

I know it goes against some cherished beliefs, but this really is how much of the US handles "justice".

And "public safety" conspicuously excludes the harms caused by incarceration (for both innocent and guilt suspects, and the innocent members of their families and communities).

(Not to mention that "public safety" still excludes the harms of centuries of slavery and decades of Jim Crow...)

Are you suggesting that we discard notions of justice / law due to historical sentiment? Or do you mean that we should address issues or biases which might be formed because of these events?
On a topic like this it’s important I think to address the elephant in the room. I don’t think this implies a genetic issue! I’m not an expert but from what I know this shouldn’t lead me to the conclusion that “black Americans are predisposed to violence” and, if you know approximately the same things I do, I think that’s probably fair for you too!
I don't think anyone here on HN thinks that it implies a genetic issue, but rather an issue caused by generations of discrimination and failed attempts to stop the cycle of poverty and crime that plagues many of these communities.
I am 100% certain, based on 5 years of observation, that there is a significant portion of users on this site who absolutely believe that racial disparities are genetic and immutable.
I would be amazed if most important physical, mental, or social characteristics had zero underlying biological or genetic drivers, or if they were completely driven by underlying biology. Even small underlying differences can be amplified through compounding effects over a lifetime of decisions, through culture, etc.

(The easy ones are male/female; racial differences are far less.)

But they are, according to virtually all the data that we have. However there's no indication at all that the cause is genetic but rather cultural in the US.