Hacker News new | ask | show | jobs
by gsich 2812 days ago
Or: don't include gender in the training data.
1 comments

They didn’t. It was discovered through other signals (mention of membership in “women’s” clubs etc.
So they did. It should be obvious that if you don't want to include gender, then you have to sanitize gender-related data.
That's not as easy as one might think.

Machine learning generally doesn't have any prior opinions about things and will learn any possible correlation in the data.

It could for example discover that certain words or sentence structures used in the resume are more likely associated with bad candidates. Later you find out that <protected class> has a huge amount of people that use these certain words/structures while most other people don't.

And now the AI discriminates against them.

ML will pick up on any possible signal including noise.

More than that, though. Graduates of all-women colleges were also caught. If you're using school as a data point, that's extremely hard to sanitize.
Then what is the purpose of this? At some point you want this thing to "discriminate" (or "select", if this is a better word) people based on what they have done in life. Which is not negative per se.
But you don't want it to select based on gender.
Would it though? A school name is essentially just that, no gender information there, even with the "women" prefix. If you discriminate other schools, you can do it too with those. FWIW there could be a difference in performance which the ML finds.