| HN Mirror

Yeah, I agree. That's fair. I've thought about this before - I worked at a bank a couple of years ago, and our CCO (Chief Credit Officer in this instance) wanted to implement a rule, in our credit decisioning models, to decline people whose surnames had 5 or more vowels. It was a naked (and admitted) proxy for Africans, and probably some other 'ethnic' people too[0].

And it made me think: I suspect lots of our ML/NN models function like that. They pick race, or they pick a proxy for race. In situations where the 'ground truth' metric genuinely is racially skewed, it can be hard to tell, and it's just not realistic to demand that people make their models inaccurate for the sake of racial equity.

But it highlights, for me, the unavoidable danger of black-box models. I don't mean some logistic regression or decision tree, because those - while not literally explaining themselves - can be figured out if you have some domain knowledge of the parameters. But the overfitting machines that we call neural nets, well, I suspect this is happening everywhere, at a cost in both equity and also accuracy/reliability. (The probably-apocryphal story of the computer vision model for estimating density of people in a train station, but which ended up just looking at the clock on the wall, comes to mind.)

[0] I remember it exactly because it also would have captured me, incidentally, with my plummy double-barrelled surname - though that's beside the point here.