|
|
|
|
|
by KevinMcAlear
4247 days ago
|
|
Sure! I wrote a dry blog post about it actually.
http://kevinmcalear.com/thoughts/building-hater-news/ @krapp there were some challenges building out a great model but you can download the whole repo and pull out just the machine learning part and see what I did, I have it commented out in an iPython Notebook. :) The Repo: https://github.com/kevinmcalear/hater_news It basically uses word tokenization using scikit-learn's count vectorizer and some extra features I added like "bad words", ratio of bad words to total words used, speaking in all CAPS, and a few other features. I then took the features and use logistic regression to predict the likely hood that a specific comment is insulting then average all a user's comments into one score. I used training data from a kaggle competition and was able to score near the same level as the winners but it will definitely be improved as I keep working on it. |
|