| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by EB66 2832 days ago

Just thinking out loud here...

It seems to me that if you wanted to root out sentiment bias in this type of algorithm, then you would need to adjust your baseline word embeddings dataset until you have sentiment scores for the words "Italian", "British", "Chinese", "Mexican", "African", etc that are roughly equal, without changing the sentiment scores for all other words. That being said, I have no idea how you'd approach such a task...

I don't think you could ever get equal sentiment scores for "black" and "white" without biasing the dataset in such a manner that it would be rendered invalid for other scenarios (e.g., giving a "dark black alley" a higher sentiment than it would otherwise have). "Black" and "white" is a more difficult situation because the words have different meanings outside of race/ethnicity.

2 comments

rossdavidh 2832 days ago

I think I would agree. You otherwise run the risk of having fixed the metric ("Italian" vs. "Mexican", "Chad" vs. "Shaniqua", etc.) without actually fixing the underlying issue.

Also, regarding black/white etc., there might legitimately be words which have so many different meanings (whether race-related or not) that you should just exclude them from sentiment analysis. "Right" can mean like "human rights", "right thing to do", or "not left". Probably plenty of other words like that. You might do better to have a list of 100-200 words that are just excluded because of issues like that.

link

taneq 2831 days ago

> there might legitimately be words which have so many different meanings

I haven't studied word embeddings past the pop-sci level but wouldn't such words form multiple clusters in the embedding space? I would have thought it would be relatively easy to get different 'words' for 'right (entitlement)', 'right (direction)', etc?

Edit: Nibling post answers this question.

link

acpetrov 2831 days ago

Would it be worth trying to think of words with different meanings as entirely new words? So, "white" in one sentence may be a different word than "white" in another?

link

visarga 2831 days ago

There's a long list of papers on that - 'multi-sense word embeddings'. But more recently we have found that passing the raw character embeddings through a two layer BiLSTM will resolve the ambiguity of meaning from context - 'ElMO'.

https://arxiv.org/abs/1802.05365 (state of the art)

link

mattkrause 2832 days ago

Does “a dark black alley” have a sentiment at all?

I would argue that it’s pragmatically associated with bad things (e.g., being mugged, overcrowded areas) but it’s not intrinsically bad (or good) itself.

link

grandmczeb 2832 days ago

> associated with bad things

Is that not what's meant by sentiment?

link

mattkrause 2832 days ago

My intuition is that word-level sentiment is rather pointless. “The Disaster Artist was not bad” has a positive sentiment overall, but each of the individual words, except possibly ‘artist’, have are usually thought to be negative. Moreover, you can totally flip the overall sentiment by adding another neutralish word “The Disaster Artist was not even bad.”

Similarly, my guess is that alley is rarely found in a positive context, but the actual sentiment comes from elsewhere in the utterance.

link

TheCoelacanth 2831 days ago

Word-level sentiment is like spherical cows in a vacuum in physics. Everyone knows its an extremely flawed model, but it produces good results in a lot of scenarios, so it will inevitably be used because it also has the enormous benefit of simplicity.

link

monochromatic 2832 days ago

This article is about a simple model. Within that model, it absolutely makes sense for “dark black alley” to get a negative score.

link

mattkrause 2831 days ago

It certainly gets a sentiment score, but whether that score is in any way meaningful or corresponds to actual human sentiment is important. Otherwise, you’re just playing stupid games, and winning stupid prizes...though I suppose just stupid is a step up from stupid and racist.

link