Interesting content but also second the spellcheck point when you are posting about natural language. Also, pie charts are horrible for things with a ton of data points-the second graph kind of pulls it off because "en" is so large but can't make much out of the first graph. Third, while Five Thirty Eight is certainly well known, they definitely make mistakes, as was seen in their complete miss in prediction of his presidency. They are no better than Rasmussen who currently holds Trump at a split 49% approval rate, may want to add them as another source to better balance your fact statements.
"complete miss"? They put his probability of winning somewhere between a quarter and a half -- rather uniquely, among a field of predictors who on average put his probability of winning under 10%.
It sounds like because he won, you believe they should have given him a probability greater than 1/2. That represents a misunderstanding of what probability means.
The only people who think Five Thirty Eight had a complete miss in predicting a Trump victory are those who do not have an understanding of probability, statistics, or predictive models.
It's not about some simple word frequencies of words. Some common words like "like" "need" "second" "part" in the dataset of whole documents isn't so meaningful in a specific sentence. Google "tf-idf" will show you more details about this.
http://www.rasmussenreports.com/public_content/politics/poli...