Hacker News new | ask | show | jobs
by aw3c2 4247 days ago
Clicked through ~20 of them and the analysis was completely off in most cases.
2 comments

Yep, although in some cases understandably. One tweet listed as ‘Not good’ had the text ‘Killed it! 🔫’ and a location given as a comedy club. I’m assuming someone had had a good gig, but I’m not surprised a classifier algorithm got that wrong.
Some cases are just difficult, but the overall accuracy could probably be improved considerably if the sentiment analyzer were calibrated for the domain. AFINN (linked above) is calibrated from newspaper articles, which almost certainly have a different distribution of word/sentiment correlations than Tweets do. It's not hard for me to imagine that "killed" is a better predictor of negative sentiment when classifying news articles.
Sentiment.js (which is what we used in devwax and I am guessing the same here) is just AFINN based Sentiment.https://www.npmjs.org/package/sentiment which you can customize. So in our case we added things like {"barreling": 2} etc. What would be better is a bigram/trigram based approach do you could score "Going Off" and "Killed It" etc, but I am not sure if there is a js library that does that?
Makes sense, since AFINN is just a tab-separated list of words and [-5, 5] valence scores. [1]

The impressive part of this for me is the visualization. Really nice.

http://www2.imm.dtu.dk/pubdb/views/publication_details.php?i...