Maybe? But even something as simple as an analysis of human height has genetic/ethnic, nutritional, age and gender components, not to mention historical differences. If your input is name, and your output is predicted height, knowing and testing for bias in your data sources is definitely important.
I think where people get caught up is that they don't see the world at large as biased, because they view their understandings as essentially correct. For example, we expect judges to rule fairly on every case, right?
Now, parole data accurately reflects how people were paroled, so predicting likelihood of clemency requests is a perfectly valid use of that data. If you were doing machine learning to try to help people get paroled, you'd want to leave that bias in as a predictor, because it's unfair but real.
But you'd probably want to adjust that data to correct for the recently having eaten bias if you were writing a system for parole recommendations for new judges based on past judicial decisions.
You wouldn't want people to be more likely to be denied just because they came before a judge before lunch. And if you don't test for a bias like that, how would you be able to tell that the machine learning algorithm had it? And it wouldn't even need to be direct... seeing people A-Z in court could mean a bias based on name.
Long story short, bias is a real issue, and you need to be aware of it and test for it, not assume that your input data isn't effected by human error.
I think where people get caught up is that they don't see the world at large as biased, because they view their understandings as essentially correct. For example, we expect judges to rule fairly on every case, right?
To pick a non controversial issue, likelihood of parole is apparently effected by how recently the judge ate https://www.economist.com/node/18557594
Now, parole data accurately reflects how people were paroled, so predicting likelihood of clemency requests is a perfectly valid use of that data. If you were doing machine learning to try to help people get paroled, you'd want to leave that bias in as a predictor, because it's unfair but real.
But you'd probably want to adjust that data to correct for the recently having eaten bias if you were writing a system for parole recommendations for new judges based on past judicial decisions.
You wouldn't want people to be more likely to be denied just because they came before a judge before lunch. And if you don't test for a bias like that, how would you be able to tell that the machine learning algorithm had it? And it wouldn't even need to be direct... seeing people A-Z in court could mean a bias based on name.
Long story short, bias is a real issue, and you need to be aware of it and test for it, not assume that your input data isn't effected by human error.