Hacker News new | ask | show | jobs
by randomtask 3331 days ago
As with anything, it's hard to generalise without oversimplifying, but here goes. You don't generally just have a data set and a machine learning algorithm that somehow magics outputs from a data set. Usually decisions have to be made by people either in training the model, selecting variables that are included in a model, etc.

Here's a simple example. Say you're trying to come up with an algorithm that decides whether articles in a data set are "fake news" (topical, I know). We have to tell the algorithm whether a given article in the training set is fake or legitimate, otherwise how would it know? Clearly this will reflect the views of whoever is tagging the articles. When we run the model on a training set we need to score how well it did, again this will reflect the opinion of the person doing the scoring.

For a real example: https://mathbabe.org/2016/05/12/algorithms-are-as-biased-as-....

1 comments

I've done some machine learning in the past. I get how you tackle machine learning problems and the first thing that I would say that your proposed topic is not currently possible.

    "*Say you're trying to come up with an algorithm that decides whether articles in a data set are "fake news"
 (topical, I know).*"
Current AI cannot do this. This would take finding sources, pulling data out of those sources, cross referencing multiple sources, and recurring for those articles to a certain depth.

That's not a good facsimile for deciding sentencing. Sentencing is more like a linear regression classification. You have a history of previous cases where the defendant was found guilty. You then have a pile of factors that played into the judge's decision for sentencing. For example:

    * If they meant to do it
    * If they feel bad about doing it
    * If they did do it (Beyond a reasonable doubt)
    * If they have done it before
    * What severity this crime is 
    * ... etc
The judge then uses their experience in law and previous case law as well as statues to find a proper punishment. This is in the form of:

    * Time served
    * Fines
    * Privileges revoked
    * Community Service
This would then be fed into a classification engine. You leave all of the existing infrastructure in place (Judge, Jury, Lawers) and just use their decision as input into the sentencing.

Deciding the validity of claims is not within the scope of modern day machine learning (as of 2017). Classification engines are very much in the scope of machine learning of today.

I don't see how case factors could be biased. I don't see how historical cases (when stripped of all identifying information) could be biased. I don't see why a system like this would be bad.

All treatment of everyone would converge into a uniform handling of cases.

> That's not a good facsimile for deciding sentencing

Sure. I'm not saying my example is the smartest (although there are people trying to use ML to do this to be fair - http://www.fakenewschallenge.org/) and I was unaware of your experience with ML, so I was working under the assumption you had no experience with it and trying to pick a simple example (even if it's dumb) to explain how bias can sneak into models in general, rather than in the specific case of sentencing criminals.

Let's loop back to what you originally said:

> If you're writing a machine learning application to take a dataset and match future inputs to past results I don't see how these biases can sneak into the program

You then go on to describe a number of factors that you think should go into sentencing models that leave plenty of scope for bias:

* "If they meant to do it" - this is a judgement made by a person and clearly reflects the view of the person making the decision.

* If they feel bad about doing it - again, someone has to judge whether someone is legitimately remorseful or is trying to pretend they are to get themselves a lighter sentence.

* "If they have done it before" - This will reflect things like policing tactics. For instance poorer areas might be subject to higher rates of policing especially in areas adopting the broken windows theory of policing (https://en.wikipedia.org/wiki/Broken_windows_theory#New_York...) and often in these areas petty crimes are cracked down on more frequently. This means that people are more likely to have run ins with the law, meaning they're less likely to get jobs due to convictions showing up in background checks, which in turn increases their likelihood to reoffend.

* What severity this crime is - I'm not sure what you mean by this. Do you mean e.g. murder being more severe than petty theft, or things like how severe an assault was? I'm assuming the latter since the former is often just covered by things like sentencing guidelines anyway. If someone commits an assault, then how do you rate this in a way that a model can understand? How do you ensure consistency across different cases and judges?

At any rate, the point of these models is usually to remove the biases that judges might have about people of certain backgrounds from sentencing guidelines and produce a score that informs the likelihood of the convict reoffending (I believe), so your proposal isn't how this works in practice. In practice they're trying to avoid exactly these kinds of subjective assessments you proposed and replace them with supposedly objective predictors for the likelihood of the person in question to reoffend. From the article linked from the post: https://www.propublica.org/article/machine-bias-risk-assessm...

> Northpointe’s software is among the most widely used assessment tools in the country. The company does not publicly disclose the calculations used to arrive at defendants’ risk scores, so it is not possible for either defendants or the public to see what might be driving the disparity. (On Sunday, Northpointe gave ProPublica the basics of its future-crime formula — which includes factors such as education levels, and whether a defendant has a job. It did not share the specific calculations, which it said are proprietary.)

> Northpointe’s core product is a set of scores derived from 137 questions that are either answered by defendants or pulled from criminal records. Race is not one of the questions. The survey asks defendants such things as: “Was one of your parents ever sent to jail or prison?” “How many of your friends/acquaintances are taking drugs illegally?” and “How often did you get in fights while at school?” The questionnaire also asks people to agree or disagree with statements such as “A hungry person has a right to steal” and “If people make me angry or lose my temper, I can be dangerous.”

Given that independent research seems to confirm that the company's model seems to favour higher sentences for people of colour, it's pretty clear from that description where biases could sneak in to the model, I hope?

The "Machine Bias" report that ProPublica published about the Northpointe software (and that is cited in OP) has been shown to be wrong.

Source: Flores, Bechtel, Lowencamp; Federal Probation Journal, September 2016, "False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.”", URL http://www.uscourts.gov/statistics-reports/publications/fede...

In fact the ProPublica analysis was so poorly done that the authors of the above study wrote in the conclusion:

> "It is noteworthy that the ProPublica code of ethics advises investigative journalists that "when in doubt, ask" numerous times. We feel that Larson et al.'s (2016) omissions and mistakes could have been avoided had they just asked. Perhaps they might have even asked...a criminologist? We certainly respect the mission of ProPublica, which is to "practice and promote investigative journalism in the public interest." However, we also feel that the journalists at ProPublica strayed from their own code of ethics in that they did not present the facts accurately, their presentation of the existing literature was incomplete, and they failed to "ask." While we aren’t inferring that they had an agenda in writing their story, we believe that they are better equipped to report the research news, rather than attempt to make the research news."

What I find remarkable is that in the ongoing coverage ProPublica has published on this subject in December 2016 they interviewed a bunch of more people, but none of the folks that have criticized their analysis (published in September). Make of that what you will.

I cannot believe that the paragraph you cite actually appears in a journal article. It's really rather unprofessional and silly.

I don't want to end up defending the methods ProPublica have used as I am certainly not qualified to do that and have no skin in this game anyway. I posted here initially in response to a very general question about bias in models, and I'd rather not be drawn into a lengthy discussion about this specific piece.

However, I do have one or two issues with the conclusions you seem to be drawing in your comment:

> What I find remarkable is that in the ongoing coverage ProPublica has published on this subject in December 2016 they interviewed a bunch of more people, but none of the folks that have criticized their analysis (published in September). Make of that what you will

I'm not sure it's possible to conclude anything from that actually. There could be plenty of non-nefarious reasons for the omission. For instance, one individual they cite in the follow up review [1] has written a paper citing the paper you linked to showing that "the differences in false positive and false negative rates cited as evidence of racial bias in the ProPublica article are a direct consequence of applying an instrument that is free from predictive bias to a population in which recidivism prevalence differs across groups". [2] The Flores et al paper seems to claim that showing that predictive bias does not exist is enough, which it would seem is not the case. If racial bias might appear anyway in the situations in which the model is often applied in reality, then perhaps the ProPublica authors felt that the paper cited below [2] adequately addressed the criticism of the paper you cited and decided not to reference the FPJ article for reasons of clarity in their follow up? I think discounting their work because of a single omission would be throwing the baby out with the bathwater.

The ProPublica authors cite plenty of other research in the area in their follow ups. Sure, this is all largely in agreement with their conclusions or go further, but does this matter unless those publications are incorrect? The ProPublica authors are writing for a news publication not an academic journal and are therefore not obligated to cite every relevant publication when they're publishing. So long as they can do this without forcing a conclusion then I don't see the problem. Perhaps they deliberately ignored the paper. Who knows?

[1] https://www.propublica.org/article/bias-in-criminal-risk-sco...

[2] https://arxiv.org/abs/1610.07524