What I found When I analysis million followers of President Trump with nlp

Y	Hacker News new \| ask \| show \| jobs

	What I found When I analysis million followers of President Trump with nlp (insightninja.net)
	14 points by plantpark 3060 days ago

3 comments

gravis7777 3060 days ago

Interesting content but also second the spellcheck point when you are posting about natural language. Also, pie charts are horrible for things with a ton of data points-the second graph kind of pulls it off because "en" is so large but can't make much out of the first graph. Third, while Five Thirty Eight is certainly well known, they definitely make mistakes, as was seen in their complete miss in prediction of his presidency. They are no better than Rasmussen who currently holds Trump at a split 49% approval rate, may want to add them as another source to better balance your fact statements.

http://www.rasmussenreports.com/public_content/politics/poli...

link

Jeff_Brown 3060 days ago

"complete miss"? They put his probability of winning somewhere between a quarter and a half -- rather uniquely, among a field of predictors who on average put his probability of winning under 10%.

It sounds like because he won, you believe they should have given him a probability greater than 1/2. That represents a misunderstanding of what probability means.

link

thousandautumns 3060 days ago

The only people who think Five Thirty Eight had a complete miss in predicting a Trump victory are those who do not have an understanding of probability, statistics, or predictive models.

link

collyw 3060 days ago

Agreed charts are supposed to make understanding the data easier. This may do the opposite.

link

plantpark 3060 days ago

I thought Pie chart will make the percentage more clearly. Do you have any suggestions for the chart? Thanks!

link

plantpark 3060 days ago

What kind of chart do you think is better for such data? Looking forward to your advice!

link

plantpark 3060 days ago

Thanks, I will check the source for more details.

link

collyw 3060 days ago

Why the need for machine learning for the second part? It seems like a complicated way to do what you could do with some simple database queries.

link

plantpark 3060 days ago

It's not about some simple word frequencies of words. Some common words like "like" "need" "second" "part" in the dataset of whole documents isn't so meaningful in a specific sentence. Google "tf-idf" will show you more details about this.

link

collyw 3059 days ago

Ok, I looked that up. Again isn't this something that Elasticsearch would do without needing to set up a machine learning system?

link

bucko 3060 days ago

good content, but run your text through spell-check before posting, and/or send it to someone to proofread.

link

plantpark 3060 days ago

Sorry for that, I will check it again.

link

xnoot3 3060 days ago

I understood it without any significant issues.

link