Hacker News new | ask | show | jobs
by ericmason 488 days ago
Seems like more of a bug than bias. The problem is in ignoring the appearance of the person in the first place. It's a statistical model, and of course there are more black rappers and white investment bankers. If it noticed that the person was white to begin with, and applied that trait, it wouldn't have to guess about the race at all.
1 comments

> It's a statistical model, and of course there are more black rappers and white investment bankers

Yes, this is what the author is pointing out - there's a statistical bias in the dataset that is showing in the results.

Is it a "statistical bias" if it reflects the underlying data? Is it "bias" to generate mostly male lumberjacks, even though most are male?
Yes. The term of art for this is "demographic bias" and it's exactly what you describe -- the population set has itself a skew for or against some demographic.

An ML image generator designed to repaint someone as a lumberjack should work equally well for all users, no matter the actual real world demographics. So the training dataset needs to account for this demographic bias if it wants to not overfit.

This isn't some recent "woke" phenomena, this has been known about large ML projects for at least a decade, if not longer.

If you are training a model to respond on automated test failures, you don't want to sample real world test data in proportion to automated test results, because most automated tests pass. This is also demographic bias and needs to be handled depending on what you want the model to learn.