Hacker News new | ask | show | jobs
by Kassius509 2621 days ago
You're spot on. This model is based off a dictionary with scored words!

For the curious you can see the dictionary here: https://github.com/sloria/TextBlob/blob/eb08c120d364e9086467...

The package used is a pretty popular one called TextBlob. It is nifty for working with unlabeled data like we have with the HackerNews dataset.

We really focused our definition of saltiness around being a combination of (subjective + negative) comments.

We reduced the impact of (objective + negative) as we feel that criticism, while at times painful, if presented objectively isn't necessarily salty.

We built this model fast (1 week) and have since iterated this week into developing a Fine Tuned BERT model that we are training over a much broader set of toxicity, demographic, and polarity features. The training set is much larger and higher quality so we are expecting a large jump in precision upon deployment.

I hope the app gave you some good chuckles as you went around though. It's hard to explain how excited I felt when I saw pg_is_a_butt at the top of my pandas data frame the first time I processed the data.

It's doing a little bit right. :)

1 comments

My saltiest comment was reportedly "If taxing price gougers seems stupid you're going to hate the pitchfork-toting mobs." I'm not sure how to work some kind of contextual analysis into it but I'm pretty sure it's something on your to-do list. Good on you for the creative idea and implementation and putting it out here for us to sprinkle salt on.
TYVM.

This kind of feedback gets me pumped about continuing to work on it.