I once did a map of the UK using sentiment analysis of the text of geotagged Flickr photos, hoping to find the areas which were more happier than others. Turned out there was no geographical pattern from that data.
Geographical analysis tools should be used in these types of analyses, apart from just looking at blobs on a map. I used k-means based cluster analysis to find groups of happy and sad areas but again the groups turned out to be nothing conclusive.
The web GIS company I ended up working for used sentiment analysis of tweets by aggregated them into regions, so as to find positive and negative areas during a specific timeframe (for example, US elections). The regions had demographics which could be used statistically, and in general some interesting patterns were observed.
What sort of accuracy did you have in your sentiment analysis algorithms? I'm curious because I find the error rate in such algorithms is typically higher than any sort of variance you are seeking, which causes significant problems in terms of any sort of pattern recognition.
When you're using things as short as Tweets, and as broad as "general sentiment", you're probably making accuracy even worse, to the point that simpler demographic analysis or bag-of-words clustering (i.e., cluster areas by diction rather than by sentiment) yields more reliable results, even for sentiment.
I've built a map that takes a geofenced stream of tweets and runs AFINN-111 sentiment analysis on them, and then displays them in real time on a map of London.
Negative sentiments are displayed as Red tweets, happy tweets are Blue.
The whole thing is built on node.js using node-tweet-stream, node-sentiment and socket.io. The frontend map is leaflet with stamen design's Toner tiles.
It's quite fun to watch, especially when there's a football match or a concert. If you click on the "follow tweets" checkbox, new tweets pop up as they arrive, although currently that makes the map pan north.
Very nice. It also gives me the impression that people in the West End are certainly more angry (or have free time to be angry on twitter) during office hours than those in the City.
Very cool but before clicking on some dots I was wondering why everyone feels the same. The colors are not ideal for red/green colorblind people (is it blue and purple?)
Maybe include a feature to select the colors for happy/sad/average with a button to return to defaults?
Black for sad, light grey for neutral, something like a medium bright green for happy would be my picks.
cool idea - i tried to pick perceptually separate colours and thought blue and red would work, but turns it's confusing for some people. I might add "colorblind" mode and use your colour range.
"colorblind mode" is an antipattern. Using more contrasting colors and using other distinguishing features such as shape and texture benefits all users.
We did the same with Tweets and Surfing. http://devwax.herokuapp.com/ from the meetup: http://www.meetup.com/DevWax/. It was all done in a weekend with some drinking and surfing, so it's a bit rough. The trouble with surfing was that the locations are very disparate and hard to guess. Fun to have a go at though...
Given that most people tweet close to home, that most people work close to home, you can get the home location of the user from their profile, and geocode it to assign a location to the tweet.
This approach only works when aggregating tweets for a larger area. E.g. comparing 10,000 tweets each in UK county, or perhaps for cities.
For even larger areas (think regions / countries) you could look through the user bios, or previous tweets to pull out any names or locations and do some analysis to work out which broader region they are in.
Cool, we are in Southwark, doing a bunch of graph visualisation stuff (http://blog.stitched.io/), so come by for a beer at some point, would be good to chat.
Could perhaps be more accurately retitled 'London claims to feel on social media' map. There's a lot of literature examining how people present themselves in such venues and how it's often an intentional communication (even if subconscious) to create a certain impression.
Neat site. As others have pointed out, the sentiment analysis is off in many cases. I'd be quite impressed if you managed to correctly classify this one, though: http://i.imgur.com/wmWUitu.jpg
thanks! that's definitely a good idea, how would you go about doing this? counting @mentions vs "tokens"/workds and setting a threshold ratio to remove tweets?
Good job, expect I find colors pretty unintuitive - why red means negative? It's color definitly connected with love, anger, war etc. So blue is for cold or maybe not showing emotions. I would definitely rethink that. And senitment analysis not always work - get a tweet rated as "sweet" ended with ":(".
Some of these are unintentionally hilarious without context. Here's a real gem: https://imgur.com/c7Ly6Qm
All in all, though, impressive. Sure some are misclassified but it seems like a significant majority are not, including a lot of the hard ones. Good work!
Yep, although in some cases understandably. One tweet listed as ‘Not good’ had the text ‘Killed it! 🔫’ and a location given as a comedy club. I’m assuming someone had had a good gig, but I’m not surprised a classifier algorithm got that wrong.
Some cases are just difficult, but the overall accuracy could probably be improved considerably if the sentiment analyzer were calibrated for the domain. AFINN (linked above) is calibrated from newspaper articles, which almost certainly have a different distribution of word/sentiment correlations than Tweets do. It's not hard for me to imagine that "killed" is a better predictor of negative sentiment when classifying news articles.
Sentiment.js (which is what we used in devwax and I am guessing the same here) is just AFINN based Sentiment.https://www.npmjs.org/package/sentiment which you can customize. So in our case we added things like {"barreling": 2} etc. What would be better is a bigram/trigram based approach do you could score "Going Off" and "Killed It" etc, but I am not sure if there is a js library that does that?
Cool idea. Unfortunately I've yet to see sentiment analysis even really come close to providing any useful insights. It's just not accurate enough on 140 character tweets.
Clearly has value regarding sentiment analysis. The current problem being that all of the junk gets marked as "average" or similar because sentiment can't be derived from it, which in the overall set skews things greatly.
yep you're right - this is just a cool toy to have a look at what Londoners are up to - the focus isn't scientific, more like "hey what are all those people there tweeting about".
Yep, i realised that - but node-sentiment was pretty quick to implement and gives alright results, so it's a bit of a tradeoff.
Do you know of more accurate sentiment analysis services?
Geographical analysis tools should be used in these types of analyses, apart from just looking at blobs on a map. I used k-means based cluster analysis to find groups of happy and sad areas but again the groups turned out to be nothing conclusive.
The web GIS company I ended up working for used sentiment analysis of tweets by aggregated them into regions, so as to find positive and negative areas during a specific timeframe (for example, US elections). The regions had demographics which could be used statistically, and in general some interesting patterns were observed.