| > The number of women and men in the data set shouldn't matter (algorithms learn that even if there was 1 woman, if she was hired then it will be positive about future woman candidates). This is incorrect. The key thing to keep in mind is that they are not just predicting who is a good candidate, they are also ranking by the certainty of their prediction. Lower numbers of female candidates could plausibly lead to lower certainty for the prediction model as it would have less data on those people. I've never trained a model on resumes, but I definitely often see this "lower certainty on minorites" thing for models I do train. The lower certainty would in turn lead to lower rankings for women even without any bias in the data. Now, I'm not saying that Amazon's data isn't biased. I would not be surprised if it were. I'm just saying we should be careful in understanding what is evidence of bias and what is not. |
gp: "The number of women and men in the data set shouldn't matter (algorithms learn that even if there was 1 woman, if she was hired then it will be positive about future woman candidates)."
This is false for typical models.