Hacker News new | ask | show | jobs
by civilized 1746 days ago
Deep learning, for all its recent glories, still suffers from relatively crude, slow-converging training algorithms compared to other areas of ML and statistics.

Maybe to your typical SGD-type algorithm, working off a dataset filled with mostly light skin toned people, skin tone just looks like a real solid first-order way to distinguish humans and primates, and picking up the black people / primate distinction seems much more marginal and second-order, in terms of impact on the cost function.

If most of the people in the dataset were black, I predict you wouldn't see this.

1 comments

Consider too what they are likely using for inputs: photos with associated comments.

I don't know Facebook's TOS sufficiently to know whether they are using private groups as source material, but if you're utilizing bigoted content to train pattern recognition, you will replicate bigoted content.

I wonder why this was downvoted. It's an interesting hypothesis.
My guess is that the poster was making an assumption that a large part of facebook's images are bigoted content. I am neither agreeing or disagreeing. But apparently some people got a little emotional about the platform being associated with maybe having a heightened amount of bigot content.
Not necessarily a large part, simply enough to identify as its own pattern.

In my experience there are a lot of bigoted things on Facebook. If these are serving as source data, and are sufficiently distinguished from other training material, it may well be user behavior the ML system would replicate.