| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by platz 3107 days ago

Don't be surprised if those predictions are heavily biased against minorities and poor people. Do you care if they do?

It's a similar problem to using ML to give people credit scores.

If the training data includes a lot of minorities and poor people breaking laws / delinquent payments, then your ML will simply key on race/economic status as a predictor.

So you've built a system that simply targets those groups.

But you might object and say that this race/economic status targeting gives the highest accuracy! It was only learned in the training data, after all. You can make a great classifier that is extremely unfair.

So you have to realize there is a conflict here between accuracy and fairness. This means there is a conflict between observational data (training), and using that data to produce decisions/outcomes.

If you make decisions/outcomes that reinforce the training data, you do not give racial groups/low economic status people a chance to improve their lives.

That is extremely inhuman, predatory, and unfair.

1 comments

febin 3107 days ago

All I want to predict is time periods/locations which are vulnerable. Nothing more than that.

link

jjoonathan 3107 days ago

Racism is morally wrong but not mathematically wrong. P(criminal|black) > P(criminal), but if you observe that someone has black skin and treat them poorly because of it, you've done a bad thing. It doesn't matter that you were just following Bayesian reasoning because you're still hurting someone on the basis of something they can't control.

Lady Justice doesn't wear a blindfold as a fashion accessory. Discarding information is a key factor in nearly every established system of justice / morality. Refusing to do so (i.e. "just" running a ML algorithm) places you directly at odds with society's hard-earned best practices.

link

platz 3107 days ago

> Lady Justice doesn't wear a blindfold as a fashion accessory

I never noticed that before. Thanks for pointing this out!

link

platz 3107 days ago

> All I want to predict is time periods/locations which are vulnerable.

Ok, and to what end?

I assume someone else will be consuming these predictions, else you wouldn't bother at all.

What are your customers/users going to do with these predictions?

Or is that simply not your responsibility; someone else's problem?

link

myaso 3107 days ago

Take a look at crimereports.com. You might get lucky and find a good source on a per city or county basis, it's too fragmented overall too try this. Different countries might have different documentation standards and publishing guidelines for this kinda of data, might be worth a shot to look.

link