Hacker News new | ask | show | jobs
by Loughla 1493 days ago
Not to get into a flame war, but I want to present an alternate option to yours.

Because in the US some people have a hard time understanding that all races and genders deserve to be treated equally as humans with the same access to goods and services. Further, that there are disparities in care based on race/ethnicity[1][2] and gender[3][4] because of that racism/sexism present in the systems. This then leads to requiring that race/ethnicity and gender data be scrubbed sometimes to keep people from impacting outcomes based on their own biases.

[1] https://www.americanbar.org/groups/crsj/publications/human_r...

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1924616/

[3] https://www.americashealthrankings.org/learn/reports/2019-se...

[4] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965695/

1 comments

It sometimes makes sense to scrub race/ethnicity/gender information from certain types of data, typically when a human is going to be making individual decisions.

For example, not having race data on resumes is generally productive, because that categorization can't provide a meaningful input to the decision associated with an individual person. Even if it were to be the case that there was some correlation between race and skill at whatever job you're interviewing for[1], the size of the effect is almost certainly small, and in the meanwhile you've also controlled for any bias in the person doing the reviewing.

If you're having a machine look at a dataset, and the machine determines that race or ethnicity is a material factor in determining some attribute in that dataset, you're not doing anybody any good by denying that fact and destroying the result.

[1]Let's ignore for the purposes of this discussion, fields (like certain sports) where extreme competition combines with a position heavily dependent upon racially-linked physical characteristics. Though even in this case, there is still a (different, weaker) argument for suppressing race data in "resumes" (yes, I know, ballplayers don't submit resumes to their local NBA franchise)

Race is a rough, subjective, culturally-bound summary of characteristics. If you're already evaluating characteristics, adding either your guess of race or a self-reported race is like injecting gossip into good data.

If the outcome that you're trying to predict is also affected by perceptions of race, you've built a gossip feedback loop.

Then you should be looking at ethnicity and not "race" as such. For example, Ashkenazi Jews as an ethnic group are genetically very distinct from other Europeans, but are generally considered "white" on self-reported race surveys.
"Very distinct" seems a little exaggerated. Compare the "Autosomal genetic distances" between Ashkenazi jews and other European groups at https://en.wikipedia.org/wiki/Genetic_studies_on_Jews with a similar table of Intra-European distances at https://en.wikipedia.org/wiki/Fixation_index. Finns and French have like twice the distance as Italians and Ashkenazi.

Now look just above that latter table, showing distances between East Asians and Europeans. The distances are far greater--more than 10x.

The precision with which we can identify and track ancestry, often based on small fractions of DNA (Y-chromosome in particular wrt Ashkenazi Jews, not mtDNA as one might think) doesn't imply the degree of genetic distinctiveness.

>If you're having a machine look at a dataset, and the machine determines that race or ethnicity is a material factor in determining some attribute in that dataset...

I think the trickiness is in providing the machine unbiased data to begin with so that it doesn't incorrect associations between features like race. The most egregious examples I'm aware of are the machine learning systems used to suggest criminal sentencing, but, apropos to this topic I believe there are cases where it may produce erroneous associations in something like skin cancer risk.