Hacker News new | ask | show | jobs
by closed 3570 days ago
It is a bias if you calculate the cost to people taking out loans, based on color. Green people will pay a higher cost, even though in the ground-truth model their race is not directly related to loan repayment.

For example, if only blue people in Idaho fail to repay loans, green people will still absorb a greater cost in the multiple regression case above (in the sense that they are more likely to be penalized for being Idahoans).

1 comments

Yes, if it's actually (blue & Idaho) ~> default, and your model ignores blue, then the greens will pay a higher cost. If color is redundantly encoded then your model can partially fix this and penalize the blue's in Idaho.

Do you consider this situation unjust? If so, you might be unhappy to learn that the entire goal of the field of algorithmic fairness is to do something along these lines.

> Available data on blacks specifically is completely irrelevant if blacks and whites aren't fundamentally different. The white model will generalize.

I should have been more clear that I was responding to this part of your comment. That even if blacks and whites aren't fundamentally different (in the sense that your race does not directly cause an outcome of interest) you can produce biases that are essentially a misatrribution about the relationship between race and that outcome. Worse, if there_is_ a relationship you can reverse the direction a model estimates for the relationship (Simpson's paradox).

> Do you consider this situation unjust? If so, you might be unhappy to learn that the entire goal of the field of algorithmic fairness is to do something along these lines.

I don't think the creation of tools to accommodate this specific purpose is bad, per se. Whether or not they are the appropriate tool to use is a different question.