Hacker News new | ask | show | jobs
by daveguy 3685 days ago
They analyzed what they could -- the outcomes of the algorithm (recommendation) and the accuracy of those recommendations. They picked out specific examples, but the analysis was over the whole data set. I think you missed these relevant parts from the article:

> We obtained the risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 and checked to see how many were charged with new crimes over the next two years, the same benchmark used by the creators of the algorithm.

> The score proved remarkably unreliable in forecasting violent crime: Only 20 percent of the people predicted to commit violent crimes actually went on to do so.

> The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants. White defendants were mislabeled as low risk more often than black defendants.

> Could this disparity be explained by defendants’ prior crimes or the type of crimes they were arrested for? No. We ran a statistical test that isolated the effect of race from criminal history and recidivism, as well as from defendants’ age and gender.

> Black defendants were still 77 percent more likely to be pegged as at higher risk of committing a future violent crime and 45 percent more likely to be predicted to commit a future crime of any kind.

1 comments

Go read the description of the statistical analysis or just view their R notebook:

https://github.com/propublica/compas-analysis/blob/master/Co...

Their own analysis shows that (p ~= 0) that high and medium risk factors are predictive. They also showed that the racial bias terms (race_factorAfrican-American:score_factorHigh, etc) are probably not predictive (p > 0.05).

Your quotes are not evidence of bias, though I see how they might confuse an innumerate reader. It's interesting how good a job this article is doing confusing the innumerate - it's almost as if it was written to mislead without technically lying.

For example, black defendants being pegged as being more likely to commit crimes can be caused by one of two things: bias or perhaps black defends actually are more likely to commit crimes. According to ProPublica's own analysis (see race_factorAfrican-American), the latter is actually the case. This is true with p = 4.52e-06 - see line [36].

I read through the entire analysis. It appears that you stopped reading after you saw a p-value that supported your bias. That is bias in the sense of pre-conceived notion. You then proceeded to pedantically argue that the well demonstrated bias of the algorithm (more false positives for blacks than whites about 40% vs 20%) does not exist because of a p-value that came in between 0.05 to 0.1 instead of below 0.05.

Please let me know when your reading comprehension catches up with your mediocre statistics comprehension.

Maybe you just didn't realize that the 20-20 hindsight data -- prediction vs recidivism -- is included right there in the analysis. Or maybe you did realize it later and just decided you'd dug in so much that you didn't want to admit your ignorance.

Or maybe you still haven't comprehended the difference between the meanings of the word bias.