Hacker News new | ask | show | jobs
by anonthrowaway2 2832 days ago
Almost everyone I know with at least a faint understanding of ML is surprised by models picking up racism etc when there was zero intent to do so, because of systemic racism etc in available data. Or at least surprised by how much can be picked up. You're bubbled if no one you know is surprised.
4 comments

"because of systemic racism"

Sometimes data might be 'racist' (i.e. human written corpus text)... but sometimes data is just data.

Are facts racist?

I would seem the world is rather diverse, i.e. 'people are different' and as we are different, AI is going to pick up on that. That's the whole point.

Now ... some bad examples like in this example taking positive/negative inferences the wrong way. OR actual systematic racisms showing up in bad ways i.e. maybe some groups are more likely to be monitored than others, thereby showing up more frequently in mad terms etc..

Why is this surprising? ML models are just recognizers and bias on the basis of ancestry is observable in all human cultures at all times.

If we nobly insist that the models describe the world as we wish it were and ought to be, then we won't be describing the data accurately. Maybe that trade-off is worthwhile if it somehow reforms human attitudes along lines we find more agreeable?

Conversely, almost everyone I know with at least a faint understanding of ML is entirely unsurprised about this.

Then again, my personal social bubble leans heavily liberal and hard left. And I think that has a lot more to do with it than with how much people understand ML. When you explain this sort of thing to people who have no idea about ML, in very simple terms ("we give the robot the text that humans wrote, so that it can pick up the patterns" etc), they see why it does that very quickly, as well - if their politics makes them aware of bias in general.

Hmmm...I'm no expert, but my master's thesis topic in the 90's was on neural networks that use R-squared (a measure of correlation), and when I saw the news about Microsoft's chatbot going Nazi, I was not at all surprised. Not saying no one you knew was surprised, but I had "at least a faint understanding of ML", and the primary thing I learned about it was that it learns what's in the data, whether that's the part of the data that you intended it to learn or not.
Tay was trolled hard by 4chan, that's why she went hardcore Nazi almost immediately. It was amusing, but not a fair & controlled experiment by any means.
The real world is neither fair or a controlled experiment.
Which is why I'm surprised about all this "AI is biased" outrage. A decent algorithm will learn what's in the data. Cast on a wide enough scale, the data is roughly what the world is. If your bot learns from newspaper corpus, then it learns how the world looks through the lens of news publishing. If news publishing is somewhat racist, and your algorithm does not pick on that, then your algorithm has a bug in it.

It seems to me like the people writing about how AI is bad because it picks up biases from data are wishing the ML would learn the world as it ought to be. But that's wrong, and that would make such algorithms not useful. ML is meant to learn the world as it is. Which is, as you wrote, neither fair nor a controlled experiment.

Well put. The people complaining about how AI is bad are the same people who push "diversity hires" to try to pretend that the population of software developers is equal parts male/female, and white/black.