Hacker News new | ask | show | jobs
by okusername 2516 days ago
Is it "bias" if it's true? The issue is that we want to override statistics and distort the analysis for ideological reasons, but that's not a fault of the algoryhm, it's a feature request.
6 comments

> Is it "bias" if it's true?

I think it's quite common that the quantity you're interested in isn't observable, so you need to proxy it with something. The GP's example is "Person X is likely to commit a crime". If that could be estimated reliably, it would be extremely useful for allocating governmental resources like policing and education.

The problem is "Person X is likely to commit a crime" isn't observable, so a careless researcher might proxy it with "Person X is likely to be convicted of a crime". The latter is actually very different for the former, since it includes factors like a defendent's ability to hire a good lawyer, existing police presence around Person X's neighbourhood, and government priorities on which crimes to prosecute (think crack vs. cocaine in the 80's).

Any good social scientist or economist will be aware of all of this. But once you bake it into a model that doesn't explain itself, you have a mess on your hands. Especially if the model gets more credence than it deserves by people who don't understand it.

Often, yes. Because the statistics themselves can be distorted. For example, consider any of the openly racist police forces in the Jim Crow south. Any naive system based on their data would mirror the racism of their practices.

And that's ignoring more complicated feedback loops. Since colonial times, American whites have often used their dominance to keep black people impoverished. [1] Poverty and crime are correlated. Wealth is correlated with getting away with crime. So if a system looks at crime statistics without considering at the history, it would be easy to perpetuate the ugly parts of that history.

[1] See Kendi's "Stamped from the Beginning" for the colonial-era laws and practices, and Loewen's "Sundown Towns" for the Nadir up through suburbanization.

ML algorithms per se are in theory neutral on this subject, sure, but a trained model acquires and can amplify the biases of its source data-set.

You say “ideological”, but it is also sound science. Machine learning models of this kind are based on observational data and thus find correlations without knowing anything about causation, and so you need to put in a lot of effort to find and avoid problematic correlations. Of course, then the magic disappears, because an ML algorithm can't actually tell you if someone is a criminal.

We know, based on statements from ex-agents within the DEA, that the DEA didn't go after drug use in white suburban communities because it would have been politically untenable.

The incarceration patterns therefore reflect this political bias.

Though I think your statement is likely overall true, in a sub-thread about biased conclusions from data, it's important to observe that from the statements of those ex-agents, all you can conclude is that parts of the DEA didn't go after drug use in white suburbia...
> Is it "bias" if it's true?

Yes. Statistics are for populations, not for individuals.

Using a statistical correlation as the basis for an individual decision is inherently biased, but is something that is seen all too often unfortunately.

That's correct, but we're talking about an essential feature request.

If the algorithm is going to insist on treating all black people like criminals because their crime rate is higher, than it's a bad algorithm and needs to be fixed before shipping, or scrapped altogether.