| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by joiguru 1804 days ago

The basic idea is as follows.

Lets say you are building an ML model to decide whether to give someone insurance or not. Lets also assume your past behavior had some bias (say against some group). Now ML model trained on this past data will likely learn that bias.

Part of modern ML focus is then to understand what bias exists in data, and how can we train models to use the data but somehow counteract that bias.

4 comments

umvi 1804 days ago

How do you tell if something is biased or not? Seems like the current system is "if people cry foul because it seems unfair, then the model is biased" which doesn't seem scientifically rigorous.

This seems like a hard problem. For example, say that you have an ML model that decides whether someone will be a good sports athlete or not purely based on biometrics (blood oxygen level, blood pressure, BMI, reflex time, etc.). If the model starts predicting black people will be better athletes at higher rates than white people, is the ML model biased? Or is the reality that black people have higher-than-average advantageous physical characteristics? How do you tell the difference between bias and reality?

link

logicalmonster 1804 days ago

> If the model starts predicting black people will be better athletes at higher rates than white people, is the ML model biased?

My comment is the naughtiest of wrong-think by HN standards, but the likely reality is that most human programmers will do their genuine best to deliver algorithms that treat all humans equally without bias towards race, gender, and other immutable characteristics and just try and deliver the best objective result (picking good athletes, getting the most money, getting the best employees, or whatever other task is involved), but then will be forced to either do 1 of 2 things when it inevitably yields an unequal outcome or decision that goes against the political correctness orthodoxy.

1) Reprogram their software to fit modern political correctness standards. Personally I think this is close to impossible. As an example: say you're creating some software to determine healthiness by various available data and it objectively determines that heavier people tend to be less healthy. You're boxed into an impossible corner here of either being politically incorrect or just lying to people about their health.

2) Go back to human decision makers for anything controversial because I don't even know how it will be possible to program an algorithm to take into account all of society's made-up, arbitrary, ever-changing rules on "equitable" outcomes. As far as I'm aware, Amazon had to abandon their effort to replace some of their HR efforts with algorithms because it yielded politically incorrect outcomes despite the programmers seemingly trying to just come up with the best possible employees and nothing else.

link

dekhn 1804 days ago

The bias would have to be determined by a board of experts who debate things based on facts, but is ultimately subjective and linked to the time and place of the culture.

The ethics in AI folks, for the most part, seem to want models to predict what they would predict, based at least partly on subjective analysis of culture, not entirely based on scientific data.

At least that's what I think I've concluded about algorithmic bias. It's one of the situations where I really want to understand what they're saying before I make too many criticisms and counterarguments./

link

commandlinefan 1804 days ago

> ML model trained on this past data will likely learn that bias

That's the opposite of what the author is saying, though - or rather, she's saying that data bias exists, but the algorithm itself introduces bias that would be there even if the data itself were somehow totally fair, for some unspecified definition of "fair".

link

dekhn 1804 days ago

what you just described is a previous bias being encoded in the data. It's not algorithmic bias, because it's not encoded in the structure of the algorithm. Sara addresses that (data re-weighting) but says that's not all.

I honestly don't think it can be what you're describing, or the debate is a very different one from what Sara and others in the "algorithmic bias exists and it is distinct from data bias" sense.

link

ramoz 1804 days ago

A reference I like, based on your last point:

https://www.frontiersin.org/articles/10.3389/fpsyg.2013.0050...

link