| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jdp23 3685 days ago
	"Oh sure, this algorithm is much more likely to have false positives on blacks, and much more likely to have false negatives on whites, and the results are that blacks are more likely to treated more harshly by the system. But it's not biased because of the definition of bias I'm using!" Orwell would have loved "disparate impact isn't bias" :)

1 comments

yummyfajitas 3685 days ago

Clearly statistics terminology is confusing you. The definition of bias is E[\hat{\theta} - \theta]. The definition of disparate impact is a predictor computing different means/quantiles for different protected classes.

https://en.wikipedia.org/wiki/Bias_of_an_estimator

https://en.wikipedia.org/wiki/Disparate_impact

To understand this intuitively, here's a simple thought experiment.

Consider Captain Hindsight, a predictor which returns the right answer 100% of the time. By definition, E[\hat{theta} - \theta] = 0, i.e. zero bias. (Also zero variance.)

Now suppose that blacks have a higher recidivism rate (hardly implausible, ProPublica's analysis suggests they do with p < 0.01).

Captain Hindsight - being 100% accurate and having no bias - must predict that blacks have a higher recidivism rate. Yet because Captain Hindsight predicts a higher recidivism rate for blacks, he now has disparate impact.

Seriously, you are calling standard mathematical terminology Orwellian? What's your angle here?

link

zyxley 3685 days ago

Your thought experiment here is incorrect, given that the analysis compares COMPAS results to actual recidivism rates and shows over- and under-prediction in comparison to them.

link

yummyfajitas 3685 days ago

The thought experiment is a mathematical proof that the two concepts are causally unrelated, nothing more. I really suggest you brush up on your basic math - you seem to not be following along.

Your claims about the emirical means of recidivism rates do not prove what you think they prove. Different races might be misclassified at different rates for a variety of reasons - e.g., one race might be affected more by some high-variance predictor, or there could be composition effects (e.g. the pdf of blacks|high score might be different than whites|high score).

The way to factor out whether they scores are biased is to do the cox survival analysis with interaction terms. Which they did. You just don't like the result.

Could you clearly lay out the statistical argument that you believe implies that E[\hat{\theta} - \theta] > 0?

link

zyxley 3685 days ago

> Could you clearly lay out the statistical argument that you believe implies that E[\hat{\theta} - \theta] > 0?

Again: I don't care about your domain-specific definition of "bias". I care about whether the end result of this secret algorithm is unequal and inaccurate treatment of different demographic groups.

link

yummyfajitas 3685 days ago

inaccurate treatment of different demographic groups.

This is exactly what the standard mathematical definition of bias (restricted to a given group) addresses. The authors of this article ran exactly that analysis - see lines [36] and [46].

I know that you are trying to retreat from statistics, since the stats don't support your mood affiliation, but don't retreat to "accuracy". Retreat to something vague and undefined instead. It'll work better.

As for "unequal", I don't know what you mean. Do you consider disparate impact to be "unequal"? If so, then I'm sorry to tell you that reality is imposing an unfortunate choice on you: equal or accurate, you can't have both. (According to ProPublica this algorithm chooses accurate.)

Criticizing an algorithm for revealing this unfortunate fact is like blaming telescopes for Saturn having rings.

link

zyxley 3685 days ago

> equal or accurate, you can't have both

By "equal and accurate", I mean a result that corresponds to actual recidivism rates for as many demographic groups as possible (males, females, different races, different ages, combinations of the above, etc). The analysis shows that it is substantially likely (certainly well-beyond the oft-vaunted "reasonable doubt" standard, nevermind the specifics of p-values being slightly above 0.5) that the algorithm being used doesn't provide such a result.

link

jdp23 3685 days ago

Clearly the idea that words have meanings beyond statistics terminology is confusing you.

And no, I'm not calling standard mathematical terminology Orwellian. What I'm calling Orwellian is your describing a biased system as unbiased (by attempting to reframe the discussion around a specific statistical definition, chosen by you).

link