Hacker News new | ask | show | jobs
by bobcostas55 3216 days ago
>This might have less statistical bias, but would bias against individuals of a race who are in truth not at higher risk of recidivism.

This is true of all predictive variables and all predictive models. It doesn't justify the recent panic aimed at ML at all.

2 comments

Journalists may not be using the term bias in a statistically appropriate way, but they do seem to be capturing a colloquial sense of what bothers people about these models: the potential for the individual to be subsumed into his demographic, and the use of suspect classifications (like race in the US) as highly salient but 'silent' factors in the model. I don't think this is a misleading use of the term for the average reader, whom I suspect would consider the FICO redlining example as exhibiting 'bias' as the term is commonly understood by laypersons. The fact that these models are difficult to understand and interrogate for the average citizen is not exactly a point in their favor, since a lot of consensual governance is based on transparency and information symmetry, even at the expense of optimization/efficiency.
Of course all models are biased, but we can try to build models that minimize the types of bias we care most about.

I think the so-called panic comes from the fact that more complicated algorithms are difficult for an untrained person to understand. For example, it's also illegal to discriminate housing/lending based on race, but if I hide that discrimination inside of a sufficiently-complicated ML model, I may be able to get away with it.