| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Blikkentrekker 1641 days ago

> Gebru, a widely respected leader in AI ethics research, is known for coauthoring a groundbreaking paper that showed facial recognition to be less accurate at identifying women and people of color, which means its use can end up discriminating against them.

Surely this is a function of location? I understand the U.S.-English term “person o color” to be convoluted language for “not white”. One simple thing I notice is that if I search for, say, “child” on Google Image Search, the images indeed tend to look as what one would expect from the average inhabitant of an English-speaking nation, when I search “子供”, I indeed mostly see what I would expect from Japan. Similarly, if I search for “house”, what I find tends to look like a house most likely situated in the Netherlands; with “บ้าน”, it does resemble more so stereotypical Thai architecture.

I would assume that a.i.'s made in, say, Japan would yield different results.

4 comments

b9a2cab5 1641 days ago

The meaning in woke terminology is more subtle than just "not white". For example Asians would likely be excluded in this case, and Middle Easterners and other minorities. "People of color" in this case means blacks and dark skinned Latinos.

The idea that AI itself can be biased (as opposed to the dataset) also has some significant problems. The lead of Facebook AI Research got canceled on Twitter because he pointed out that it's the bias in the dataset used to train the AI that results in bias in the AI and not the AI itself that's biased. I'd also question whether Gebru is a "widely respected leader in AI ethics research". Model interpretability is not even close to a solved problem so just because you can demonstrate some correlation between images of black people and worse performance does not imply that "black person" is a causative factor. It could literally be dataset distribution or image contrast or any number of other plausible explanations that are easily fixable by an ML engineer.

tsimionescu 1641 days ago

The AI is the final product of applying a learning algorithm on a training set.

Claiming that "the AI is not biased, the training set is" is like saying "this running program isn't buggy, it's just the source code that is buggy".

mschuster91 1641 days ago

Searching for "child" or "house" will yield what has been classified as such in training - and searching for Japanese or Thai labels will do the same. No surprise there, if the labels don't get normalized before training.

And normally, that's harmless - as you said, you'd expect to see an AI finding pictures of houses in the region/culture you are searching it. But in a multi-cultural/multi-ethnic society, searching for "people" and showing up only what is considered the "majority" has a whole different lot of ethical implications.

Identifying and ideally remediating such issues is why ethics research is so sorely needed.

Blikkentrekker 1641 days ago

> And normally, that's harmless - as you said, you'd expect to see an AI finding pictures of houses in the region/culture you are searching it.

I am not actually; I am searching for “huis”, not “Nederlands huis”; I'd expect the result I obtain from the former with the latter.

I'd actually expect “house” and “huis” to reveal similar results from a good search engine. Obviously this is not easily possible with how it is trained with corpora in a specific language, but from usability I think this is undesirable, if I specifically want Dutch houses I can always add that term as a specification; there is no way to simply search for houses, wherever they might be, in Dutch, or English, or Thai, or any other language.

That is to say, I'm not arguing that there is no problem; I'm arguing that the problem is highly dependent upon location, and that he article should not take such a U.S.A.-centric stance and act as though the reset of the world not exist.

b9a2cab5 1641 days ago

No, remediating such issues (only predicting the maximum likelihood class in the dataset) is a problem of _machine learning and optimization_ research, not ethics research. There is nothing an ethicist can do to solve this problem. It is easy to point out problems with existing AI and write a bunch of papers to get yourself tenure. It is very hard to fundamentally advance our understanding of deep learning models past a fancy maximum likelihood estimation problem.

mschuster91 1641 days ago

> _machine learning and optimization_ research

Ethics education is (unfortunately) not really seen as necessary across the tech field, which is why ethics researchers need to be part of at all stages of AI development.

And for what it's worth, ethics researchers should be part of all technology development - the "racist soap dispenser" should have been more than enough proof of how even a very simple, innocent product can contribute to ethnic discrimination.

xmprt 1641 days ago

The problem is that AI (and the English language to some extent) transcends borders. So even if it's an AI developed in the US, it can potentially impact people outside the US and it makes ethical sense to build something that doesn't exclude groups based on arbitrary conditions.

Blikkentrekker 1641 days ago

Yes, but to offset that, many a.i. in English were also made outside of English-speaking regions, in what one assumes to be proportional degree.

This is probably why there is more variance when searching for English terms as wel, as a Lingua Franca. If I search “house” I do see some styles of architecture not commonly found in Anglo-Saxon nations, whereas all occurrences of “huis” do seem to be situated in the Netherlands.

sangnoir 1641 days ago

> many a.i. in English were also made outside of English-speaking regions

Different regions, yes - but where did the training and benchmark datasets come from? AI research is surprisingly monocultural (or use "standardized benchmarks" if you're feeling charitable). Not too long ago, there was a paper posted on HN that showed that a bunch of the datasets contain mislabeled data, which means a lot of "different" models are encoding similar biases.

baxter001 1641 days ago

Completely not the focus of the article, and you've turned the result of an error rate of 0.8 percent for gender classification of light-skinned men and a 34.7 percent error rate for the same classifier on dark-skinned women - into some kind of google image search language game?

I can only quote Joy Buolamwini on this:

“To fail on one in three, in a commercial system, on something that’s been reduced to a binary classification task, you have to ask, would that have been permitted if those failure rates were in a different subgroup?”

b9a2cab5 1641 days ago

The answer would probably be yes if that subgroup wasn't a large percentage of the dataset used for training and testing. Or if that subgroup wasn't a large percentage of the user base.

Come on, if you've worked at any large company using ML you know model performance is literally just taking the average accuracy/ROC/precision/etc over your training dataset plus some hold out sets. Then you track proxy metrics like engagement to see if your model actually works in production. At no point does race come into the equation. Naturally, if your choice of subgroup happens to not be a large proportion of either the dataset or the userbase then you don't see the poor performance on that subgroup show up in your metrics so you don't care to fix it.

indigo945 1641 days ago

Obviously, but the question is, why were there no Black women in the data set, and what care can be taken to prevent racialized bias when selecting the data set in the future?

Blikkentrekker 1641 days ago

I would assume these data sets are not manually selected but imported from some mechanism.

Other issues which are sure to arise is that the a.i. will have trouble with people who aren't smiling, and that the data set probably contains people who look better than average, and almost certainly excludes people who suffer from injuries or deformities in appropriate proportions.

Perhaps an interesting project is simply the compilation of a vast dataset of “world proportional pictures of people”. — It would be an interesting undertaking to realize such a dataset.

tsimionescu 1641 days ago

World proportional is not good enough for this type of task. If we are to rely on AI for things like identifying people in pictures in a trial, we would need equal representation in the data set, so the AI doesn't have any kind of systematic bias. Otherwise, the AI's bias will compound errors in the real world. So you would need as many pictures of Australian aborigenees in the data set as Han Chinese people if you wanted to be sure there isn't a risk that a random person would be confused for someone of the over or under represented groups.

b9a2cab5 1641 days ago

Certainly you can ask these questions but these are business process issues, not technical ones. They're unrelated to AI.

My personal take is you won't see any tangible movement on this until black women (or whatever group you choose) comprise a tangible proportion of revenue generating users. Corporations operate for money and nothing else.

notahacker 1641 days ago

Of course they are related to what we call AI, because what we call AI is primarily dependent on the quality of the business processes behind data selection and testing. If there is a strong trend of business processes to create systematic errors in the results the technology generates (an AI trained in China sucking at recognising white people wouldn't be a counter example of this phenomenon, it would be the same issue) it's an underlying weakness of the technology, and the utility of the technology needs to be viewed in the context that it's likely compromised by biases in the business processes of its developers.

Black women or other groups not viewed as the mainstream target for an AI solution aren't going to form a tangible proportion of revenue generating users if the software doesn't function properly for them. And a lot of the use cases for AI analysis don't involve the unrepresented-in-corpus minority group being the consumer anyway, they involve it being used to screen them by a third party who's been sold the tool on the false premise that it's free from human bias.