I struggle to think of a single person with the faintest understanding of what machine learning algorithms are being surprised by this. Who are these "technologists" you're speaking of?
Almost everyone I know with at least a faint understanding of ML is surprised by models picking up racism etc when there was zero intent to do so, because of systemic racism etc in available data. Or at least surprised by how much can be picked up. You're bubbled if no one you know is surprised.
Sometimes data might be 'racist' (i.e. human written corpus text)... but sometimes data is just data.
Are facts racist?
I would seem the world is rather diverse, i.e. 'people are different' and as we are different, AI is going to pick up on that. That's the whole point.
Now ... some bad examples like in this example taking positive/negative inferences the wrong way. OR actual systematic racisms showing up in bad ways i.e. maybe some groups are more likely to be monitored than others, thereby showing up more frequently in mad terms etc..
Why is this surprising? ML models are just recognizers and bias on the basis of ancestry is observable in all human cultures at all times.
If we nobly insist that the models describe the world as we wish it were and ought to be, then we won't be describing the data accurately. Maybe that trade-off is worthwhile if it somehow reforms human attitudes along lines we find more agreeable?
Conversely, almost everyone I know with at least a faint understanding of ML is entirely unsurprised about this.
Then again, my personal social bubble leans heavily liberal and hard left. And I think that has a lot more to do with it than with how much people understand ML. When you explain this sort of thing to people who have no idea about ML, in very simple terms ("we give the robot the text that humans wrote, so that it can pick up the patterns" etc), they see why it does that very quickly, as well - if their politics makes them aware of bias in general.
Hmmm...I'm no expert, but my master's thesis topic in the 90's was on neural networks that use R-squared (a measure of correlation), and when I saw the news about Microsoft's chatbot going Nazi, I was not at all surprised. Not saying no one you knew was surprised, but I had "at least a faint understanding of ML", and the primary thing I learned about it was that it learns what's in the data, whether that's the part of the data that you intended it to learn or not.
Tay was trolled hard by 4chan, that's why she went hardcore Nazi almost immediately. It was amusing, but not a fair & controlled experiment by any means.
Which is why I'm surprised about all this "AI is biased" outrage. A decent algorithm will learn what's in the data. Cast on a wide enough scale, the data is roughly what the world is. If your bot learns from newspaper corpus, then it learns how the world looks through the lens of news publishing. If news publishing is somewhat racist, and your algorithm does not pick on that, then your algorithm has a bug in it.
It seems to me like the people writing about how AI is bad because it picks up biases from data are wishing the ML would learn the world as it ought to be. But that's wrong, and that would make such algorithms not useful. ML is meant to learn the world as it is. Which is, as you wrote, neither fair nor a controlled experiment.
Well put. The people complaining about how AI is bad are the same people who push "diversity hires" to try to pretend that the population of software developers is equal parts male/female, and white/black.
It’s because most tech people have the default position that racism is not really a big deal, an edge case in modern society. That certainly is the message the political center and right is pushing.
Given that the data showed a massive range by name within the same race and a much smaller skew between different races, couldn't this data be said to support that conclusion?
Disclaimer: I don't know enough about the data or the algorithm to determine this mathematically but I think worth pointing out. Would have been nice to see some statistical analysis instead of just assuming the charts speak for themselves.