| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gacgacgac 1 hour ago
	Yes. You missed it. They are using a test dataset of 83k resumes generated in 2022 for this paper and comparing it as a baseline against their observational data: https://www.nber.org/papers/w29053 The dataset is constructed, deliberately, to hold candidate performance constant and vary the names of candidates to appear to be associated with a specific race.

3 comments

AStrangeMorrow 1 hour ago

From looking at how that was done, it seems they (the paper you linked) used an older paper which looked at which names are frequent enough and more biased toward a certain demographic (90% of that name occurrence falls within that demographic).

But they picked 9 family names per group. Which sounds quite low. And combined that with first names to reach 500 first+last names per group.

I wonder how much of the bias we see has to do with the names actually picked versus it being racially motivated (absolutely not denying that this probably is a factor, but might not be the only one).

For example, in France there is the national BAC end of high school exam. If you you at the names X grade distribution, and look at the higher “very good” bracket: some names are heavily under-represented (less than 5% of say “Jordan” get that grade) while some are over-represented (35% of “Josephine” get such a grade). The exam is for the most part anonymous, but some names are definitely heavily correlated with lower/higher income groups. So nothing surprising: Josephines tend to come from richer families, thus in average get better education/support, thus better grades. Same thing is true with family names to a smaller extent.

So I wonder how much of the bias we see, be it from real persons or the AI has more to do with a class thing than a racial thing. Again those are not neatly separate things, but still

link

pc86 16 minutes ago

Race and socioeconomic status are pretty strongly correlated but I'd imagine it's possible to do a study to see what the extent of each's influence is. You'd need to find "high socioeconomic" names that are also strongly correlated with race(s) themselves correlated with low socioeconomic status and vice versa which honestly might be the hardest part. The disambiguation from a statistical standpoint doesn't seem that difficult once you have the data.

link

xp84 27 minutes ago

Wow. So, all the 'people' and 'resumes' involved are fake, but they submitted them to real jobs?

Cool.

In any event, I'd happily support a ban on all parts of the ATS that could be involved in automated approval, rejection, or scoring being able to see candidate names. But I sense the author of this has a bigger agenda.

link

rayiner 17 minutes ago

That’s an earlier paper. This one involves 3 million real applicants, with no control for applicant quality.

link