Hacker News new | ask | show | jobs
by AStrangeMorrow 2 hours ago
From looking at how that was done, it seems they (the paper you linked) used an older paper which looked at which names are frequent enough and more biased toward a certain demographic (90% of that name occurrence falls within that demographic).

But they picked 9 family names per group. Which sounds quite low. And combined that with first names to reach 500 first+last names per group.

I wonder how much of the bias we see has to do with the names actually picked versus it being racially motivated (absolutely not denying that this probably is a factor, but might not be the only one).

For example, in France there is the national BAC end of high school exam. If you you at the names X grade distribution, and look at the higher “very good” bracket: some names are heavily under-represented (less than 5% of say “Jordan” get that grade) while some are over-represented (35% of “Josephine” get such a grade). The exam is for the most part anonymous, but some names are definitely heavily correlated with lower/higher income groups. So nothing surprising: Josephines tend to come from richer families, thus in average get better education/support, thus better grades. Same thing is true with family names to a smaller extent.

So I wonder how much of the bias we see, be it from real persons or the AI has more to do with a class thing than a racial thing. Again those are not neatly separate things, but still

1 comments

Race and socioeconomic status are pretty strongly correlated but I'd imagine it's possible to do a study to see what the extent of each's influence is. You'd need to find "high socioeconomic" names that are also strongly correlated with race(s) themselves correlated with low socioeconomic status and vice versa which honestly might be the hardest part. The disambiguation from a statistical standpoint doesn't seem that difficult once you have the data.