Hacker News new | ask | show | jobs
by deogeo 2620 days ago
Sounds like an extraordinarily poor AI system if it depends on absolute numbers, and not per capita. And wouldn't the number of unsuccessful hires also skew male?
3 comments

"Sounds like an extraordinarily poor AI system if it depends on absolute numbers, and not per capita."

To some extent, you're bringing in your human bias to prefer human biases when you make that statement. We humans have a hierarchy of important attributes, and for various reasons believe race and gender are more important than eye color or height. But the machine learning algorithm just gets a multidimensional point in hyperspace. It doesn't, a priori, "know" that it needs to do a "per capita" adjustment based on FIELD_1 any more than it knows it needs to do a per capita adjustment on FIELD_2. And you can't "adjust" on all the fields because that'll just cancel out.

We are also in the weird position of wanting the machine to do adjustments based on FIELD_1, but without us having to actually admit to ourselves that we're doing it. From a technical perspective, probably the best answer is to do a straight-up training based on the data, then have an cleanly-separated after-the-fact cleanup process to perform whatever social adjustments it is we want on the outcome. But nobody is willing to admit that's what we want, and to put those adjustments down on paper in the form of code, because the instant they're concrete, pretty much everybody is going to decide they're wrong, and no two people are going to agree on the manner in which they are wrong, and an epic, national-front-page-news shitstorm will ensue. So here we are, trying to make adjustments without making adjustments, or, alternatively, trying to make adjustments in a place where we can blame the AI rather than humans.

(The ironic thing is that because we can't admit what we're trying to do, we're going to end up doing a really poor job of it. Tools will be applied haphazardly, the results can't be measured except very grossly at the very end of the process, and the goals won't be obtained and the system is always going to be quirky and weird. If we could clearly declare what it is we actually wanted, it would be fairly easy to get it from the AIs.)

The basic "resumes skewed male so the algorithm did too" explanation appears to be incorrect. But it's found in the original Reuters story and most derived stories, and finding it here implies it's reached the level of urban legend.

Going by the details of the Reuters story and several others, it appears that what actually happened was a training/task mismatch. Amazon wanted an algorithm to do resume discovery, which recruiters would run and get quality predictions as they viewed resumes. But they trained it on resume results, giving it past resumes which had been submitted to Amazon and telling it to seek similar resumes. None of the stories make it clear if there even was negative training data; it looks like the tool was simply told to compute degree-of-similarity to past inputs, and possibly told to prioritize resumes which were ultimately hired.

As a result, the tool was trying to convert a relatively gender-neutral pool (resumes found online) to a skewed one (Amazon applicant resumes), and did so by weighting gendered terms. It also seems to have underweighted technical terms, failing to appreciate them as mandatory or strictly position-specific.

The developers were sufficiently aware of that to catch and correct the known gender biases (e.g. devaluing women's colleges or the literal word "women's"), but were scared there were other uncaught biases. And the results were apparently terrible all around, so the tool was scrapped. Which is pretty much what you'd expect from something trained on exclusively positive, sample-biased examples. The story has been seriously distorted, but the real plan also seems terrible...

Consider the possibility that the (pre-AI system) probability of success for a female applicant is the same as the probability of success of a male applicant. You could make a "per capita" quota as a kind of goal. That's not a problem, but how would you make sure the quota was met?

The typical AI system doesn't work on the basis of selecting candidates entirely at random, pro rata, in order to meet a quota. It works on the basis of criteria for success. One thing it might learn (unfortunately) is that most posts at the company are filled by men.

From a machine learning point of view, one can just add the constraint that the probability of being in the "yes" bucket is that same for both male and female candidates. Doing this will give a worse fit than an unconstrained optimization, but it is fairer.

More sophisticated approaches are possible.

There's no "just" to any aspect of this topic. I think what you are talking about is what is sometimes called "classification parity", and there are problems with it, and with everything else we've come up with to combat bias.

https://arxiv.org/abs/1808.00023