|
|
|
|
|
by ma2rten
2832 days ago
|
|
I think that the bias problem they are highlighting is very important. That said, I'm wondering if they really didn't try (like the title suggests) or if they choose this approach on purpose because it highlights the problem. To explain what happened here: They trained a classifier to predict word sentiment based on a sentiment lexicon. The lexicon would mostly contain words such as adjectives (like awesome, great, ...). They use this to generalize to all words using word vectors. The way word vectors work is that words that frequently occur together are going to
be closer in vector space. So what they have essentially shown is that in common crawl and google news names of people with certain ethnicities are more likely to occur near words with negative sentiment. However, the sentiment analysis approach they are using amplifies the problem in the worst possible way. They are asking their machine learning model to generalize from training data with emotional words to people's names. |
|