Hacker News new | ask | show | jobs
by malikolivier 1436 days ago
I assumed I was quite fluent in English, even in slangs, having seen a fair share of both American and British movies.

Now that I see the examples given, I think I would have mislabeled most of them too, even if I were highly motivated to label them.

Though it's normal for any language, it's very interesting how English is variable between dialects and time periods when it comes to slang. There are so many regional slangs of which I cannot understand all the nuances.

A few examples from this dataset, that I would not have labeled correctly:

- daaaaaamn girl! – mislabeled as ANGER

- [NAME] wept. – mislabeled as SADNESS

- [NAME] is bae, how dare you. – mislabeled as ANGER

And don't get me started on Australian/NZ slang. It's a completely different world.

4 comments

I think that in many of these cases the deciding factor is not only fluency in the language, but also the harder problem of context. Labelling individual sentences without context is hard enough, but what makes it worse is that it then spreads to the mistaken assumption that sentences can be analysed without context based on the initial training.

I would argue that the very idea of "sentiment analysis" as applied to individual tweets is flawed... and that's even before we get into the much, much harder problems of sarcasm and irony.

Now all we need is for social media companies to have users do the tagging. Then the data they sell will be even more valuable!

Oh. Shoot me now.

Note to future taggers: I am not suicidal.

Some of these would confuse native British English speakers too, for what it's worth. The first and third are African-American vernacular, at least originally. If you haven't seen a US movie where a character literally exclaims "daaaaamn girl!" in an approving voice, you're going to pretty reliably mislabel that one regardless of where or how you learned English. The second is a meme reference and how you label it is going to be dependent on how much time you spend on reddit, not your level of English skill.
Movies don't use the language mislabeled here. Youtube and Twitch do, sometimes excessively so.

I'm pretty sure you would be able to find half of the "mislabeled as negative" sentences shown in a single 15-minute "Among Us" video.

There are others I would have mislabeled too. I think it shows how you need a grasp of the subculture the comment is coming from to reliably label it. Most 15-year olds would probably ace those examples, no cap.

Really curious what makes Aussie and Kiwi slang so different. I didn't think we were that different.