|
|
|
|
|
by gacgacgac
1 hour ago
|
|
Yes. You missed it. They are using a test dataset of 83k resumes generated in 2022 for this paper and comparing it as a baseline against their observational data: https://www.nber.org/papers/w29053 The dataset is constructed, deliberately, to hold candidate performance constant and vary the names of candidates to appear to be associated with a specific race. |
|
But they picked 9 family names per group. Which sounds quite low. And combined that with first names to reach 500 first+last names per group.
I wonder how much of the bias we see has to do with the names actually picked versus it being racially motivated (absolutely not denying that this probably is a factor, but might not be the only one).
For example, in France there is the national BAC end of high school exam. If you you at the names X grade distribution, and look at the higher “very good” bracket: some names are heavily under-represented (less than 5% of say “Jordan” get that grade) while some are over-represented (35% of “Josephine” get such a grade). The exam is for the most part anonymous, but some names are definitely heavily correlated with lower/higher income groups. So nothing surprising: Josephines tend to come from richer families, thus in average get better education/support, thus better grades. Same thing is true with family names to a smaller extent.
So I wonder how much of the bias we see, be it from real persons or the AI has more to do with a class thing than a racial thing. Again those are not neatly separate things, but still