Hacker News new | ask | show | jobs
by KevinMcAlear 4246 days ago
A few things:

1. Thanks for posting my blog post @chippy. :) The actual app ( haternews.co ) kept getting booted off HN...

2. There have been a lot of interesting comments on the three threads on here. People pointed out some bugs and overall issues which I will be fixing (also, the site should not crash half as much now). This is just a fun side project I have been messing around with so I can get better at using data science in various applications. If you would like to help build it out for fun further let me know! Also, feel free to submit a bug or suggestion for an improvement if you really want to.(https://github.com/kevinmcalear/hater_news/issues)

3. I wanted to build the "hater score" for two reasons. First, to see how accurately I could build a model to measure insulting comments in the wild and second (if it's accurate), to see how people would react to seeing how positive or negative they usually are on Hacker news (or other social networks).

4. I wanted to make sure everyone knows that just because something is your "Worst Comment" doesn't mean it is negative. Most people have very low scores and most of your comments are not identified as insulting. (It would be over 50% if it is actually an insulting comment.) So most people on HN are not actually haters. I just had a more "hater" focused design just for fun. There are in fact actual haters though, if you look hard enough.

5. Something I found interesting is clicking the "Back In The Day" checkbox. It takes your 50 oldest comments and analyses them, instead of your 50 most recent.

6. Finally, if you're not sure why some comments are getting ranked higher than others, feel free to look at the training data I used (it's from a kaggle competition from a while back.) and read my blog post. If you don't want to here are additional features I used on top of standard bag-of-words (CountVectorizer):

* badwords_count – A count of bad words used in each comment.

* n_words – A count of words used in each comment.

* allcaps – A count of capital letters in each comment.

* allcaps_ratio – A count of capital letters in each comment / the total words used in each comment.

* bad_ratio – A count of bad words used in each comment / the total words used in each comment.

* exclamation – A count of "!" used in each comment.

* addressing – A count of "@" symbols used in each comment.

* spaces – A count of spaces used in each comment.

If you have suggestions on other features I could collect let me know! I'll also be building a way to get actual training data from HN itself and letting HN users determine if a comment is actually insulting or not so that the predictions constantly improve.

4 comments

Have you ever considered that machine learning isn't fairly dust and that you can't sprinkle algorithms on a criteria that's poorly formulated to begin with and get an objective criteria for evaluation? I mean what is "hate" - insults? expressions of frustration? Sly insults? Sarcasm?

Also, the "most hateful" comment was me quoting someone else's rather unpleasant comment, whereas I'd prefer my distaste for lousy ideas show through more directly.

Great point.

1. I wish everything was made from *fairy dust. How awesome would that be? :)

2. "Hate" is definitely hard to quantify. It's in fact quite difficult to map words to their intentions and get it right consistently (especially within a proper context). So difficult that people set up Kaggle competitions on exactly this. I actually got my "magical" training data from a competition that paid out $10k, which I explained in the article but here it is again:

https://www.kaggle.com/c/detecting-insults-in-social-comment...

They did a great job building a baseline training data set to evaluate several different models on. Which are all briefly explained or at least shown in code in the article. And what "hate" actually means here is the probability that a comment is considered insulting. The "hater score" is just an average of the most recent (or oldest, depending on your settings) comments' probabilities that they are insulting.

3. I read and looked at several different attempts to build something similar by various data scientists who were kind enough to share their findings, including a huge contributor to scikit-learn (https://github.com/amueller).

4. Taking out quoted text would be a great feature to add. I have about 5 or 6 new features I will probably add and see if the model works any better for it, thanks for the suggestion (another person was suggesting the same thing). :)

5. This was just to see how well "sprinkled algorithms" and magical coding works in the wild world of actual comments. I love learning and improving my knowledge base with actual experience so I figured why not build something and see what happens. :)

My favorite part about your algorithm is that it even detects SELF-HATRED! That puts an interesting, and depressing!, spin on the project. My pseudo-unconscious self-loathing was uncovered through my meager number of HN posts. Eerie.

The worst comment detected by your project is an expression of relief in learning that onions will not, in fact, brown in fewer than 30 minutes.

"All your life you're just thinking, "I'M AN IDIOT, WHY WON'T THESE BROWN!?" and then, one September's day, you find out it was all lies... the entire time. Lies all the way down."

It's interesting some of the comments that the system decided were negative from my history.

Of all of them, only one was actually negative (and that one was about people that steal AWS resources for cryptocoin mining)

Hopefully I'll have some time one evening to have a bit more of a play with your work :)

Great! I'm excited to hear about your findings. I'll try to incorporate them into the app! :)
It doesn't seem to work on shadowbanned users, eg. https://news.ycombinator.com/user?id=TempleOS

I'm not sure if this is a limitation of the API or not though.

Hmmm... seems to work for me. https://transfer.sh/TclxF/templeos.png

Here is the API call too. https://hacker-news.firebaseio.com/v0/user/TempleOS.json

Something must have went wrong, sorry!

Right you are, I think I typed it all lowercase or camel case, thanks for the clarification :)
No worries! :)