|
|
|
|
|
by titzer
2813 days ago
|
|
There's not enough information about how their ML algorithm works, nor how large their dataset was for any of the above reasoning to be justified. Fwiw, many ranking functions do indeed take certainty into account, penalizing populations with few data points. |
|
Unless they presented lots of unqualified resumes of people not in tech as part of the training, which seems like something someone might think reasonable. Then, the model would (correctly) determine that very few people coming from women's colleges are CS majors, and penalize them. However, I'd still expect a well built model to adjust so that if someone was a CS major, it would adjust accordingly and get rid of any default penalty for being at a particular college.
If the whole thing was hand-engineered, then of course all bets are off. It's hard to deal well with unbalanced classes, and as you mentioned, without knowing what their data looks like we can only speculate on what really happened.
But I will say this: this is not a general failure of ML, these sorts of problems can be avoided if you know what you're doing, unless your data is garbage.