| A few things: 1. Thanks for posting my blog post @chippy. :) The actual app ( haternews.co ) kept getting booted off HN... 2. There have been a lot of interesting comments on the three threads on here. People pointed out some bugs and overall issues which I will be fixing (also, the site should not crash half as much now). This is just a fun side project I have been messing around with so I can get better at using data science in various applications. If you would like to help build it out for fun further let me know! Also, feel free to submit a bug or suggestion for an improvement if you really want to.(https://github.com/kevinmcalear/hater_news/issues) 3. I wanted to build the "hater score" for two reasons. First, to see how accurately I could build a model to measure insulting comments in the wild and second (if it's accurate), to see how people would react to seeing how positive or negative they usually are on Hacker news (or other social networks). 4. I wanted to make sure everyone knows that just because something is your "Worst Comment" doesn't mean it is negative. Most people have very low scores and most of your comments are not identified as insulting. (It would be over 50% if it is actually an insulting comment.) So most people on HN are not actually haters. I just had a more "hater" focused design just for fun. There are in fact actual haters though, if you look hard enough. 5. Something I found interesting is clicking the "Back In The Day" checkbox. It takes your 50 oldest comments and analyses them, instead of your 50 most recent. 6. Finally, if you're not sure why some comments are getting ranked higher than others, feel free to look at the training data I used (it's from a kaggle competition from a while back.) and read my blog post. If you don't want to here are additional features I used on top of standard bag-of-words (CountVectorizer): * badwords_count – A count of bad words used in each comment. * n_words – A count of words used in each comment. * allcaps – A count of capital letters in each comment. * allcaps_ratio – A count of capital letters in each comment / the total words used in each comment. * bad_ratio – A count of bad words used in each comment / the total words used in each comment. * exclamation – A count of "!" used in each comment. * addressing – A count of "@" symbols used in each comment. * spaces – A count of spaces used in each comment. If you have suggestions on other features I could collect let me know! I'll also be building a way to get actual training data from HN itself and letting HN users determine if a comment is actually insulting or not so that the predictions constantly improve. |
Also, the "most hateful" comment was me quoting someone else's rather unpleasant comment, whereas I'd prefer my distaste for lousy ideas show through more directly.