Hacker News new | ask | show | jobs
by wand3r 1 hour ago
Did I miss the part of the article where they break down how they determined race? Is the algorithm blind to race? It looks like they specifically looked at 83k people applying to ~100 companies which notably were Fortune 500 companies. Could there simply be candidate discrepancies here? Hard for me to follow the full methodology but it doesn't necessarily seem either malicious or that well structured. Don't you need to have a control group of applicants who are similar on paper? To allege DISCRIMINATION is quite bold.

Definitely open to opposing or critical views

4 comments

The 83,000 applications to Fortune 500 companies, that was a different previous study they compared their results to. This paper's takeaway is that unlike that Fortune 500 data, the applications here that went through an ML vendor's screening process showed evidence of "systemic rejection," where some applicants got rejected across the board at higher rates than you'd expect if they were facing independent would-be employers.
That’s not the data set used for this paper: https://algorithmichiring.github.io/

If you click through, the paper says the race is self-reported.

“Our data tracks 4,197,168 applications. It includes applicant gameplay features and for each application, the application date, the position name and employer, metadata about the position and employer, and the numerical score and final recommendation each applicant received for each completed application. 40.2% of applicants self-report race with a breakdown of 16.8% Asian, 14.2% White, 3.6% Black, 3.0% Hispanic, and all other racial categories below 2% (i.e. fewer than 100,000 applicants).”

id expect any algorithm to learn race by other properties in the data?

its going to be in the rest of the data because race has a meaningful correlation, and pleanty of causation with being disadvantaged in real ways, that can also affect the ability to then do certain jobs.

like, the environmental pollution and building interstates and freeways through black communities, on purpose to do bad things to those communities, then results in a bunch of noise and particulate pollution, that is bad for developing brains.

you wont be able to do some meritocratic non-racist hiring without fixing the environmental racism. otherwise youre just mirroring racism other people built for you

Yes. You missed it. They are using a test dataset of 83k resumes generated in 2022 for this paper and comparing it as a baseline against their observational data: https://www.nber.org/papers/w29053

The dataset is constructed, deliberately, to hold candidate performance constant and vary the names of candidates to appear to be associated with a specific race.

From looking at how that was done, it seems they (the paper you linked) used an older paper which looked at which names are frequent enough and more biased toward a certain demographic (90% of that name occurrence falls within that demographic).

But they picked 9 family names per group. Which sounds quite low. And combined that with first names to reach 500 first+last names per group.

I wonder how much of the bias we see has to do with the names actually picked versus it being racially motivated (absolutely not denying that this probably is a factor, but might not be the only one).

For example, in France there is the national BAC end of high school exam. If you you at the names X grade distribution, and look at the higher “very good” bracket: some names are heavily under-represented (less than 5% of say “Jordan” get that grade) while some are over-represented (35% of “Josephine” get such a grade). The exam is for the most part anonymous, but some names are definitely heavily correlated with lower/higher income groups. So nothing surprising: Josephines tend to come from richer families, thus in average get better education/support, thus better grades. Same thing is true with family names to a smaller extent.

So I wonder how much of the bias we see, be it from real persons or the AI has more to do with a class thing than a racial thing. Again those are not neatly separate things, but still

Race and socioeconomic status are pretty strongly correlated but I'd imagine it's possible to do a study to see what the extent of each's influence is. You'd need to find "high socioeconomic" names that are also strongly correlated with race(s) themselves correlated with low socioeconomic status and vice versa which honestly might be the hardest part. The disambiguation from a statistical standpoint doesn't seem that difficult once you have the data.
Wow. So, all the 'people' and 'resumes' involved are fake, but they submitted them to real jobs?

Cool.

In any event, I'd happily support a ban on all parts of the ATS that could be involved in automated approval, rejection, or scoring being able to see candidate names. But I sense the author of this has a bigger agenda.

That’s an earlier paper. This one involves 3 million real applicants, with no control for applicant quality.