Hacker News new | ask | show | jobs
by kayaeb 2458 days ago
In NLP (specifically vectorizing words, ala word2vec) there's a famous test of whether or not your training has worked properly whereby you calculate the vector of "king" and subtract the vector of "man" and add the vector of "woman," if your machine is properly tuned, you should end up with a vector close to "queen" or "princess."

I wonder if similar things can be done to address specific (i.e. racial or gender) biases in computer vision.

1 comments

There is a similar word-embedding test that definitely rustles people's jimmies:

Doctor - Man + Woman = ?

What normally comes out is Nurse. What "they" think should come out is Doctor!

By "they" I mean people that get upset by this.

Yeah, besides the fact that this compositionality is relatively unique to word2vec, research on the biases pre-trained models express is pretty available. Linked a few below for those interested. Most of the issues are down to the same phenomenon discussed here in the context of ImageNet, the input texts were biased and the algorithm learned said bias.

[0] https://arxiv.org/abs/1607.06520 "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings "

[1] http://proceedings.mlr.press/v97/brunet19a/brunet19a.pdf "Understanding the Origins of Bias in Word Embeddings"

[2] http://matthewkenney.site/biases.html "Google word2vec biases"