| > That's not a good facsimile for deciding sentencing Sure. I'm not saying my example is the smartest (although there are people trying to use ML to do this to be fair - http://www.fakenewschallenge.org/) and I was unaware of your experience with ML, so I was working under the assumption you had no experience with it and trying to pick a simple example (even if it's dumb) to explain how bias can sneak into models in general, rather than in the specific case of sentencing criminals. Let's loop back to what you originally said: > If you're writing a machine learning application to take a dataset and match future inputs to past results I don't see how these biases can sneak into the program You then go on to describe a number of factors that you think should go into sentencing models that leave plenty of scope for bias: * "If they meant to do it" - this is a judgement made by a person and clearly reflects the view of the person making the decision. * If they feel bad about doing it - again, someone has to judge whether someone is legitimately remorseful or is trying to pretend they are to get themselves a lighter sentence. * "If they have done it before" - This will reflect things like policing tactics. For instance poorer areas might be subject to higher rates of policing especially in areas adopting the broken windows theory of policing (https://en.wikipedia.org/wiki/Broken_windows_theory#New_York...) and often in these areas petty crimes are cracked down on more frequently. This means that people are more likely to have run ins with the law, meaning they're less likely to get jobs due to convictions showing up in background checks, which in turn increases their likelihood to reoffend. * What severity this crime is - I'm not sure what you mean by this. Do you mean e.g. murder being more severe than petty theft, or things like how severe an assault was? I'm assuming the latter since the former is often just covered by things like sentencing guidelines anyway. If someone commits an assault, then how do you rate this in a way that a model can understand? How do you ensure consistency across different cases and judges? At any rate, the point of these models is usually to remove the biases that judges might have about people of certain backgrounds from sentencing guidelines and produce a score that informs the likelihood of the convict reoffending (I believe), so your proposal isn't how this works in practice. In practice they're trying to avoid exactly these kinds of subjective assessments you proposed and replace them with supposedly objective predictors for the likelihood of the person in question to reoffend. From the article linked from the post: https://www.propublica.org/article/machine-bias-risk-assessm... > Northpointe’s software is among the most widely used assessment tools in the country. The company does not publicly disclose the calculations used to arrive at defendants’ risk scores, so it is not possible for either defendants or the public to see what might be driving the disparity. (On Sunday, Northpointe gave ProPublica the basics of its future-crime formula — which includes factors such as education levels, and whether a defendant has a job. It did not share the specific calculations, which it said are proprietary.) > Northpointe’s core product is a set of scores derived from 137 questions that are either answered by defendants or pulled from criminal records. Race is not one of the questions. The survey asks defendants such things as: “Was one of your parents ever sent to jail or prison?” “How many of your friends/acquaintances are taking drugs illegally?” and “How often did you get in fights while at school?” The questionnaire also asks people to agree or disagree with statements such as “A hungry person has a right to steal” and “If people make me angry or lose my temper, I can be dangerous.” Given that independent research seems to confirm that the company's model seems to favour higher sentences for people of colour, it's pretty clear from that description where biases could sneak in to the model, I hope? |
Source: Flores, Bechtel, Lowencamp; Federal Probation Journal, September 2016, "False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.”", URL http://www.uscourts.gov/statistics-reports/publications/fede...
In fact the ProPublica analysis was so poorly done that the authors of the above study wrote in the conclusion:
> "It is noteworthy that the ProPublica code of ethics advises investigative journalists that "when in doubt, ask" numerous times. We feel that Larson et al.'s (2016) omissions and mistakes could have been avoided had they just asked. Perhaps they might have even asked...a criminologist? We certainly respect the mission of ProPublica, which is to "practice and promote investigative journalism in the public interest." However, we also feel that the journalists at ProPublica strayed from their own code of ethics in that they did not present the facts accurately, their presentation of the existing literature was incomplete, and they failed to "ask." While we aren’t inferring that they had an agenda in writing their story, we believe that they are better equipped to report the research news, rather than attempt to make the research news."
What I find remarkable is that in the ongoing coverage ProPublica has published on this subject in December 2016 they interviewed a bunch of more people, but none of the folks that have criticized their analysis (published in September). Make of that what you will.