Hacker News new | ask | show | jobs
by amarshall 1493 days ago
Maybe, maybe not. Hard to say—which is the problem they call out in the paper

> efforts to control [model race-prediction] when it is undesirable will be challenging and demand further study

1 comments

The correlation being "undesirable" to the individuals doing the research does not mean that the correlation is inaccurate.

I mean, sure, there are tons of ways for garbage data to sneak into ML models -- though these guys tried pretty hard to control for that -- but if the model actually determined that "race" is a meaningful feature, then that might be because it is, and science should be concerned with what is, not with what we wish were.

If one believes and proclaims that they have controlled for variable X, but they haven’t actually done so, then their results and analysis may well be invalid or misleading because of that. Whether they actually should have controlled for X or not is orthogonal.
Oh, yes, sorry. If by the correlation being possibly-undesirable you meant that it was possibly-spurious due to incompletely controlling for some bias in the source data, then yes, conclusions based on a model which found such a spurious correlation caused by incomplete input control might be undesirably biased in a not-accurate fashion.

This study appears to have done a good job controlling for known biases that could have been proxies for race, but it is presumably possible that they missed something and tainted the data

Right, and that’s pretty much the conclusion: our explicit goal was to control for race, and yet, we appear to have failed and don’t know why (so don’t know how to adjust the control yet). So likely others using similar-enough methodologies and techniques are unknowingly failing to control.