| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by was_boring 2811 days ago

There are a few ways you can tackle this issue: 1) have the same algorithm for each group, but train separately (so in the end you have two different weights); 2) over-sample the group under represented in the data; 3) make the penalty more severe for guessing wrongly on female then male applicants during training; 4) apply weights to gender encoding; 5) use more then just resumes as data.

This isn't an insurmountable problem, but does require extra work then just "encode, throw it in and see what happens".

Amazon only scrapped the original team, but formed a new one in which diversity is a goal for the output.

1 comments

gsich 2810 days ago

Or: don't include gender in the training data.

link

kareemsabri 2810 days ago

They didn’t. It was discovered through other signals (mention of membership in “women’s” clubs etc.

link

gsich 2810 days ago

So they did. It should be obvious that if you don't want to include gender, then you have to sanitize gender-related data.

link

zaarn 2810 days ago

That's not as easy as one might think.

Machine learning generally doesn't have any prior opinions about things and will learn any possible correlation in the data.

It could for example discover that certain words or sentence structures used in the resume are more likely associated with bad candidates. Later you find out that <protected class> has a huge amount of people that use these certain words/structures while most other people don't.

And now the AI discriminates against them.

ML will pick up on any possible signal including noise.

link

s73v3r_ 2810 days ago

More than that, though. Graduates of all-women colleges were also caught. If you're using school as a data point, that's extremely hard to sanitize.

link

gsich 2810 days ago

Then what is the purpose of this? At some point you want this thing to "discriminate" (or "select", if this is a better word) people based on what they have done in life. Which is not negative per se.

link

s73v3r_ 2810 days ago

But you don't want it to select based on gender.

link