| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by _bxg1 2620 days ago
	The scoring heuristic could use some work; I've already encountered multiple "salty" comments along the lines of "That sounds awful", with a sympathetic tone, probably tagged because of the word "awful".

2 comments

ape4 2620 days ago

I got -.25 for a "link of the lazy <url>" comment. And now I will again ;)

link

logfromblammo 2620 days ago

Agreed. It looks like they went through a dictionary and scored the words on a negativity/positivity axis, and then just took the mean of all the scoring words in a post.

I have written posts very much saltier than the ones scored as saltiest by this ranking algorithm, possibly because I didn't use inherently negative vocabulary to express a highly negative sentiment.

It's a fun party trick, but its usefulness is limited without semantic analysis or live-human scoring.

20.23% of my posts are rated as "salty". I wonder what percentage of scoring words are rated as negative.

link

Kassius509 2620 days ago

You're spot on. This model is based off a dictionary with scored words!

For the curious you can see the dictionary here: https://github.com/sloria/TextBlob/blob/eb08c120d364e9086467...

The package used is a pretty popular one called TextBlob. It is nifty for working with unlabeled data like we have with the HackerNews dataset.

We really focused our definition of saltiness around being a combination of (subjective + negative) comments.

We reduced the impact of (objective + negative) as we feel that criticism, while at times painful, if presented objectively isn't necessarily salty.

We built this model fast (1 week) and have since iterated this week into developing a Fine Tuned BERT model that we are training over a much broader set of toxicity, demographic, and polarity features. The training set is much larger and higher quality so we are expecting a large jump in precision upon deployment.

I hope the app gave you some good chuckles as you went around though. It's hard to explain how excited I felt when I saw pg_is_a_butt at the top of my pandas data frame the first time I processed the data.

It's doing a little bit right. :)

link

howard941 2620 days ago

My saltiest comment was reportedly "If taxing price gougers seems stupid you're going to hate the pitchfork-toting mobs." I'm not sure how to work some kind of contextual analysis into it but I'm pretty sure it's something on your to-do list. Good on you for the creative idea and implementation and putting it out here for us to sprinkle salt on.

link

Kassius509 2619 days ago

TYVM.

This kind of feedback gets me pumped about continuing to work on it.

link

_bxg1 2620 days ago

Honestly I think the whole thing would be more meaningful if it just used total downvotes and/or ratio of downvotes to upvotes

link

Kassius509 2620 days ago

I agree. Would have been much easier too. Unfortunately, HN doesn't have downvotes. The current model does incorporate upvotes, but we're seeing a lot more success training with like-kind labeled datasets + BERT fine tuning.

Thank you for trying the app and hopefully V2 will leave you feeling like the system is more precise.

link

_bxg1 2620 days ago

"HN doesn't have downvotes"

It certainly does; although it's possible they aren't stored independently and simply "cancel out" an upvote, so maybe that's what you meant.

The interface and the graphs are really nice; even though basing the data on votes alone would be less interesting in one sense, I think the rest of the site would still provide value even with that simpler metric.

link