Hacker News new | ask | show | jobs
by aisofteng 1994 days ago
As a fellow practitioner, I entirely agree. Actually, reading this article made something click for me regarding the oft discussed and denigrated “bias in AI” always brought up in discussions of the “ethics of AI”: there is no bias problem in the algorithms of AI.

AI algorithms _need_ bias to work. This is the bias-variance trade off: https://en.m.wikipedia.org/wiki/Bias–variance_tradeoff

The problem is having the _correct_ bias. If there are physiological differences in a disease between men and women and you have a good dataset, the bias in that dataset is the bias of “people with this disease”. If there is no such well-balanced dataset, what is being revealed is a pre-existing harmful bias in the medicinal field of sample bias in studies.

If anything, we should be thankful that the algorithms used in AI, based on statistical theory that has carefully been developed over decades to be objective, is revealing these problems in the datasets we have been using to frame our understanding of real issues.

Next up, the hard part: eliminating our dataset biases and letting statistical learning theory and friends do what they have been designed to do and can do well.

1 comments

> AI algorithms _need_ bias to work. This is the bias-variance trade off: https://en.m.wikipedia.org/wiki/Bias–variance_tradeoff

To be clear, statistical bias is in fact distinct from the colloquial term ‘bias’ most people use - but they can be interpreted similarly if given the proper context (which you did)

In machine learning the "bias" that relates to the bias-variance tradeoff is inductive bias, i.e. the bias that a learning system has in selecting one generalisation over another. A good quick introduction to that concept is in the following article:

Why We Need Bias in Machine Learning Algorithms

https://towardsdatascience.com/why-we-need-bias-in-machine-l...

The article is a simplified discussion of an early influential paper on the need for bias in machine learning by Tom Mitchell:

The need for bias in learning generalizations

http://dml.cs.byu.edu/~cgc/docs/mldm_tools/Reading/Need%20fo...

The "dataset bias" that you and the other poster are discussing is better described in terms of sampling error: when sampling data for a training dataset, we are sampling from an unknown real distribution and our sampling distribution has some error with respect to the real one. This error manifests as generalisation error (with respect to real-world data, rather than a held-out test set), because the learning system learns the distribution of its training sample. Unfortunately this kind of error is difficult to measure and is masked by the powerful modelling abilities of systems like deep neural networks, who are very capable at modelling their training distribution (and whose accuracy is typically measured on a held-out test set, sampled with the same error as the rest of the training sample). It is this kind of statistical error that is the subject of articles discussing "bias in machine learning".

Inductive bias has nothing to do with such "dataset bias and is in fact independent from dataset bias. Rather, inductive bias is a property of the learning system (e.g. a neural net architecture). Consequently, it is not possible to "eliminate" inductive bias - machine learning is impossible without it! The two should absolutely not be confused, they are not similar in any context and should not be interpreted as in any way similar.