|
|
|
|
|
by logfromblammo
2622 days ago
|
|
Agreed. It looks like they went through a dictionary and scored the words on a negativity/positivity axis, and then just took the mean of all the scoring words in a post. I have written posts very much saltier than the ones scored as saltiest by this ranking algorithm, possibly because I didn't use inherently negative vocabulary to express a highly negative sentiment. It's a fun party trick, but its usefulness is limited without semantic analysis or live-human scoring. 20.23% of my posts are rated as "salty". I wonder what percentage of scoring words are rated as negative. |
|
For the curious you can see the dictionary here: https://github.com/sloria/TextBlob/blob/eb08c120d364e9086467...
The package used is a pretty popular one called TextBlob. It is nifty for working with unlabeled data like we have with the HackerNews dataset.
We really focused our definition of saltiness around being a combination of (subjective + negative) comments.
We reduced the impact of (objective + negative) as we feel that criticism, while at times painful, if presented objectively isn't necessarily salty.
We built this model fast (1 week) and have since iterated this week into developing a Fine Tuned BERT model that we are training over a much broader set of toxicity, demographic, and polarity features. The training set is much larger and higher quality so we are expecting a large jump in precision upon deployment.
I hope the app gave you some good chuckles as you went around though. It's hard to explain how excited I felt when I saw pg_is_a_butt at the top of my pandas data frame the first time I processed the data.
It's doing a little bit right. :)