Hacker News new | ask | show | jobs
by gus_massa 3909 days ago
Looks like a interesting but difficult project. From the article:

> The question then is to what extent the content of Twitter is representative of voting behaviour and how accurately we can predict the results of an election with Twitter data.

Some demographics (young? city?) are overrepresented than others (old? farms?). In some cases, it's cool to vote publically for a party, so the people is more eager to say it publically or directly lie in the tweets. In other cases, people is afraid to say publically that they will vote for a party ...

I think you will need many magical constants that are not available until you have 3 or 4 previous elections to fit the data.

Can you make a prediction for each province / state?

Good look, and post the progress and predictions before the election, and a post-mortem analysis after.

2 comments

yes I believe that some parties will be overrepresented if you solely look at the volume of tweets. This is simply due to the fact that Twitter in Turkey is more popular under young and educated people(1). It should however not be difficult to take this into account in the linear regression algorithm, because the tuning parameters can be determined with data from the previous elections.

I had made an prediction for each province after initially collecting all the tweets, but the results were not accurate. At the moment I have also determined the location of about 33% of the twitter accounts and I hope the result will be better if I exclude all Twitterers which are not from the same province as the one I am doing the calculation on.

Thanks for the feedback :)

(1) http://webrazzi.com/2014/07/17/genart-ve-nielsenin-turkiyede...

I dont see why people would lie on twitter, instead of simply not tweeting...