Hacker News new | ask | show | jobs
by joe_the_user 2744 days ago
You have a classifier for credit assignment (giving a loan, etc.). The classifier is 99% accurate on the entire population. The classifier is 55% accurate on a small minority.

Uh, concerning this hypothetical.

I'll admit a scenario of this sort sounds appealing at first blush. But "99%" accuracy rate with credit assignment is transparently absurd if considers it for a second. There is a clear, significant limit to the accuracy that can being assigned to anyone's credit, if credit means "actually repaying". The fundamental uncertainty of the economy guarantees this.

The distinction between this ideal (99%-55%) and whatever it might be in reality ( 65%-55%) matters. What's is the system is squeezing a few more percentage points out of data for a large company. And what is the cost of those percentage points?

[EDIT: ACTUALLY - the pernicious scenario is a system that isn't not any MORE accurate for any group than any other BUT which is NEGATIVELY biased against one group and POSITIVELY biased against another group. That situation is EASY to get when one unselectively slurps up any data available. The inaccuracy of predictor is a problem for the company, the biasedness of the predictor is a problem of the individuals discriminated against]

The situation is that a company really can a total better prediction rate for various desired qualities by using completely biased, unfair markers. (White-skin, went to "a good school", from a wealthy background, dresses well, attractive features...). When one allows "black box optimization" to get those features, what one does is allow the use of these considerations, which all otherwise legally off-limits. Legal strictures against discrimination say that objective measures of black people's ability need to be it, not because other measures never matter but because other measures are unfair, other measures don't consider past discrimination.

As a further example, outside of race or gender considerations, some percentage of employees may be forced to care for a sick relative. Maybe that makes them a potentially less effective employee or worse credit risk or whatever. Human evaluators might have values that such questions outside consideration. For an opaque multidimensional analysis, this may a ding - the human user doesn't even know if it's a ding.

1 comments

You are thinking too US-centric here. There are jurisdictions where this is allowed.

Also don't stare yourself blind on the numbers.

Do you unfairly deny 5 minorities or erroneously deny 10 genpop?

This is reality for today's data scientists.

Right now in the US: You find an informative variable that acts as a proxy for race (such as Facebook likes). It is not forbidden by law to use it, but you have the data showing the proxy effect. Will you add it to the CTR model and get a raise, or do you act and speak up?

Will you let the trolley follow its course and run over 10.000 Chinese dissidents, or make the switch and run over 100 of your colleagues (and friends) who would benefit from a China expansion?