Hacker News new | ask | show | jobs
by deltaqueue 4778 days ago
Nearly all questions answered in the "about map" link:

=========================================

The data behind this map is based on every geocoded tweet in the United States from June 2012 - April 2013 containing one of the 'hate words'. This equated to over 150,000 tweets and was drawn from the DOLLY project based at the University of Kentucky. Because algorithmic sentiment analysis would automatically classify any tweet containing 'hate words' as "negative," this project relied upon the HSU students to read the entirety of tweet and classify it as positive, neutral or negative based on a predefined rubric. Only those tweets that were identified by human readers as negative were used in this analysis.

To produce the map all tweets containing each 'hate word' were aggregated to the county level and normalized by the total twitter traffic in each county. Counties were reduced to their centroids and assigned a weight derived from this normalization process. This was used to generate a heat map that demonstrates the variability in the frequency of hateful tweets relative to all tweets over space. Where there is a larger proportion of negative tweets referencing a particular 'hate word' the region appears red on the map, where the proportion is moderate, the word was used less (although still more than the national average) and appears a pale blue on the map. Areas without shading indicate places that have a lower proportion of negative tweets relative to the national average.

The numbers that appear in the map during a mouse hover indicate the total number of hateful tweets and number of unique users sending them in each county.

==========================================

EDIT: The mouse overs don't appear to work very well in Chrome or Firefox, but from the one or two times I was able to see some numbers it appears that each red circle may be a dozen or less tweets. Also, the hot zones dissipate significantly the further you zoom in, so without any statistics or numbers it's difficult to draw conclusions.

A very interesting experiment, but given that the data is only normalized by Twitter traffic (non-response bias) this is in no way indicative of the actual distribution of racism.

2 comments

> Because algorithmic sentiment analysis would automatically classify any tweet containing 'hate words' as "negative," this project relied upon the HSU students to read the entirety of tweet and classify it as positive, neutral or negative based on a predefined rubric. Only those tweets that were identified by human readers as negative were used in this analysis.

I wonder how well a Bayesian classifier would work if the this was used as a training set. If it worked relatively well, there's no reason why you couldn't create a live version of the map.

Something like http://aworldoftweets.frogdesign.com/ maybe?

Not very well. Twitter sentiment is a difficult problem.

Consider using millions of training examples (vs. thousands). This was done as part of the "distant supervision" Twitter sentiment technique. What this means is that tweets with positive emoticons were labeled as positive sentiment, and negative emoticons were labeled as having negative sentiment. Emoticons were stripped before training. This system got 80% accuracy.

http://cs.wmich.edu/~tllake/fileshare/TwitterDistantSupervis...

I want to see that predefined rubric. I am unwilling to believe Iowans are more racist than Mississippians. Being racist in Iowa means hating like 3 people in the next county over.
As someone originally from MS, let me clarify that the state is not what most people imagine it is based on various movies or their U.S. history class. Mississippians take "the hospitality state" seriously.
To be clear, I'm solely focusing on the opportunity for racism. E.g., it's not an accident that the Germans, in the aggregate, that anti-Catholic sentiment in the 1850s was significantly stronger in the North than in the South.