| HN Mirror

By "fundamentally different", I mean that the most accurate model will be something like this:

    repayment_probability = 1 x downpayment_frac + 0.5 x credit_score + A x isBlack

for some A != 0. I.e., if A = -0.2, then a black borrower with a 60% downpayment is as likely to pay back a loan as a white borrower with a 40% downpayment.

If A = 0, then the bias described by tlb and danso won't occur.

What you describe with hidden variables is called "redundant encoding", and it's just a way of recovering the `A x isBlack` term if you remove `isBlack` from your input set. But if blacks and whites repay their loans at the same rate (holding all else equal), redundant encoding won't happen - it doesn't actually improve accuracy.

I describe this in more detail here: https://www.chrisstucchio.com/blog/2016/alien_intelligences_...

I agree with you that the core issue is an unspecified true goal. Folks are unwilling to publicly and explicitly state how many bad loans should be issued for fairness or how many unqualified students should be allowed into college for diversity.

Or for an example closer to home, how much we should lower the bar in order to hire more non-Asian minorities in tech? Daring to ask that question gets you some pretty hostile responses.