Hacker News new | ask | show | jobs
by sbenario 4246 days ago
@KevinMcAlear - can you share any details on the algo? :-)
2 comments

Sure! I wrote a dry blog post about it actually. http://kevinmcalear.com/thoughts/building-hater-news/

@krapp there were some challenges building out a great model but you can download the whole repo and pull out just the machine learning part and see what I did, I have it commented out in an iPython Notebook. :)

The Repo: https://github.com/kevinmcalear/hater_news

It basically uses word tokenization using scikit-learn's count vectorizer and some extra features I added like "bad words", ratio of bad words to total words used, speaking in all CAPS, and a few other features. I then took the features and use logistic regression to predict the likely hood that a specific comment is insulting then average all a user's comments into one score.

I used training data from a kaggle competition and was able to score near the same level as the winners but it will definitely be improved as I keep working on it.

It's open source https://github.com/kevinmcalear/hater_news

Looks like a counted bag of words + https://github.com/kevinmcalear/hater_news/blob/master/its_p... are fed into a logistic regression