Hacker News new | ask | show | jobs
by backpropaganda 3316 days ago
If I were training a classifier to predict whether a sentence is talking about household activities v/s not, wouldn't the occurrence of man/woman in the sentence be a good feature? Today, woman do perform household activities more (whether we like it or not), and wouldn't it make sense to use that piece of information when performing some predictive analysis?

The technical sense of "bias" arises when the train and test distributions differ. Obviously if you train with a dataset of text from a foreign country's news and then apply it on an American context, the difference in the data distributions will introduce bias, but why do we need a social twist to this already well-functioning term? If the same classifier is trained and evaluated in India (with its sexist roles, say), then there's no (technical) bias and I don't see why it's a bad application.

3 comments

>wouldn't it make sense to use that piece of information when performing some predictive analysis?

No, because eventually your system will graduate from predicting the results of society's bias to reinforcing society's bias. That is a bad thing.

Can you give an example of a situation where an ML application would be reinforcing a problematic bias but still have good performance metrics? My point is that a wrongly-applied ML application would suffer in just plain accuracy. For instance, a Automatic Carrier Counsellor might give "homemaker" as a suggested career choice to women, but then before we start calling it biased, it would already be wrong. If the same algorithm had dug deeper, it would have learn that the said woman would be a great programmer.
Recidivism prediction systems will usually tell you that black people are more likely to get arrested/convicted again. They do so accurately, but also result in longer sentences for black people.

https://arxiv.org/abs/1610.07524

Yeah but doesn't that have more to do with the way the predictions are used?

It seems to me to be a stupid thing to do. This person seems more likely to get convicted again, lock 'em up longer. Instead of asking why is this person more likely to get convicted again? Can we prevent this in a redemptive non punitive way?

It's really useful to have that prediction/data but how you use it is more important

the problem is a layperson doesn't necessarily know what a prediction necessarily means without a deep understanding of how the system is making its predictions, let alone how to apply it.

worse is that since the prediction is coming from computer that lends the prediction an air of authority another article called "bias laundering". the general belief is that computers are objective and cannot have bias, which in a sense is true, but people don't tend to think a step further about the problems and biases in the people who programmed the computer.

so that is definitely a thing usually missing from these discussions is that the people using these systems generally don't know how they work, and believe they predict or imply things that they don't

I mean, that same algorithm could be used to determine that blacks or other at-risk groups should receive extra attention or support. An accurate picture of reality can be used poorly or well.
There is a much more fundamental problem, which is that people are bad at understanding the difference between "is" and "should". No amount of information about what the world looks like tells you anything about what course of action is the most moral (and vice versa). If you are building a system that predicts recidivism rates (figuring out what "is"), then any piece of information that improves your accuracy is good. If you use that system to suggest sentences (making decisions about "should), then you are going to run in to a lot of problems.
I still don't understand why information should be elided from judges during sentencing. If public officials use data to worsen issues like recidivism rather than improve them, then those officials should be removed. If a judge can't be trusted to act responsibly and morally with accurate information about a defendant then why would we even begin to think that they're competent?

It's just the reality of how the justice system works. We have trust in the approximation of justice that the judiciary provides and constantly struggle to improve that judiciary.

That sounds like exactly the kind of thing you'd expect to happen when you treat people as feature clusters instead of, you know, people.
I think you have a really good point here. The problem is that we have this current bias in society and people wish to change it. I think there is a fear, that if we reflect this bias, in the way we talk, we re-enforce the bias.

It seems an effective tool, if you want to change thinking then police the way words can be used around the topic. It is however worrying that machines could start playing a role in this. It could become a powerful tool in steering public opinion. This doesn't seem too bad, but that could be used to favour an incumbent political party, or more than likely to sell products we otherwise don't really want.

But you are right machines need accuracy and removing that bias could be detrimental to the task they're solving.

My point isn't that accuracy and bias are orthogonal, but that bias is contained in the accuracy metric.
That's fine if you are measuring the bias component too. If you're not, you risk perpetuating it. It's natural if I read a sentence about hands to conclude that the test refers to people and not fish, but if I read about 'the hands of a surgeon' and assume that those hands are male based on the current demographics of the surgical profession, I'm making an unwarranted assumption on insufficient information. It's wise to grant utility to uncertainty by maintaining a 'Don't know' option rather than being in a rush to make a determination before it is necessary, not least because of the computational cost of unwinding incorrect assumptions.
Totally agree with you. I'm not at all trying to say accuracy and bias can be orthogonal.

I'm trying to say some people think they have a good enough reason to throw away accuracy if that means they can change a societal bias. But that can only be a good thing if you agree with the change being made.

Isn't it preferable to accurately measure and account for Bias? The same metrics can be used to combat bias that reinforce it. Without accurate metrics, how do you form your argument against bias? Feelings?
No, it would not be a good feature. For one thing, baking the bias of existing practices as opposed to constraints risks reinforcing that practice as more and more decision-making is left to ML. Second it makes your system vulnerable to verbal paradoxes designed to exploit that bias.