Hacker News new | ask | show | jobs
by KevinMcAlear 4255 days ago
Hello! I wrote this in on of the other threads about so I figured I would leave it here too.

A few things:

1. Thanks for posting my blog post (https://news.ycombinator.com/item?id=8517727) @chippy. :) The actual app ( haternews.co ) kept getting booted off HN... And now thanks @melling for posting it.

2. There have been a lot of interesting comments on the three (now four) threads on here. People pointed out some bugs and overall issues which I will be fixing (also, the site should not crash half as much now). This is just a fun side project I have been messing around with so I can get better at using data science in various applications. If you would like to help build it out for fun further let me know! Also, feel free to submit a bug or suggestion for an improvement if you really want to.(https://github.com/kevinmcalear/hater_news/issues)

3. I wanted to build the "hater score" for two reasons. First, to see how accurately I could build a model to measure insulting comments in the wild and second (if it's accurate), to see how people would react to seeing how positive or negative they usually are on Hacker news (or other social networks).

4. I wanted to make sure everyone knows that just because something is your "Worst Comment" doesn't mean it is negative. Most people have very low scores and most of your comments are not identified as insulting. (It would be over 50% if it is actually an insulting comment.) So most people on HN are not actually haters. I just had a more "hater" focused design just for fun. There are in fact actual haters though, if you look hard enough.

5. Something I found interesting is clicking the "Back In The Day" checkbox. It takes your 50 oldest comments and analyses them, instead of your 50 most recent.

6. Finally, if you're not sure why some comments are getting ranked higher than others, feel free to look at the training data I used (it's from a kaggle competition from a while back.) and read my blog post. If you don't want to here are additional features I used on top of standard bag-of-words (CountVectorizer):

* badwords_count – A count of bad words used in each comment.

* n_words – A count of words used in each comment.

* allcaps – A count of capital letters in each comment.

* allcaps_ratio – A count of capital letters in each comment / the total words used in each comment.

* bad_ratio – A count of bad words used in each comment / the total words used in each comment.

* exclamation – A count of "!" used in each comment.

* addressing – A count of "@" symbols used in each comment.

* spaces – A count of spaces used in each comment.

If you have suggestions on other features I could collect let me know! I'll also be building a way to get actual training data from HN itself and letting HN users determine if a comment is actually insulting or not so that the predictions constantly improve.

2 comments

I'd say that the fact that it doesn't even work - arbitrarily doling out 'hater' marks due to a poor sentiment engine - makes it a good candidate for removal.

This insidious flaw results in two things:

1. accounts being deemed as 'haters' without warrant

2. the belief that the tool is 'correct' - perpetuating #1.

This kind of mindset ("Let's check I have the least negative impact on the community") has to come from the HN crowd. Would it be relevant to adapt it to Reddit?