|
|
|
|
|
by rossdavidh
2832 days ago
|
|
I think I would agree. You otherwise run the risk of having fixed the metric ("Italian" vs. "Mexican", "Chad" vs. "Shaniqua", etc.) without actually fixing the underlying issue. Also, regarding black/white etc., there might legitimately be words which have so many different meanings (whether race-related or not) that you should just exclude them from sentiment analysis. "Right" can mean like "human rights", "right thing to do", or "not left". Probably plenty of other words like that. You might do better to have a list of 100-200 words that are just excluded because of issues like that. |
|
I haven't studied word embeddings past the pop-sci level but wouldn't such words form multiple clusters in the embedding space? I would have thought it would be relatively easy to get different 'words' for 'right (entitlement)', 'right (direction)', etc?
Edit: Nibling post answers this question.