| Nearly all questions answered in the "about map" link: ========================================= The data behind this map is based on every geocoded tweet in the United States from June 2012 - April 2013 containing one of the 'hate words'. This equated to over 150,000 tweets and was drawn from the DOLLY project based at the University of Kentucky. Because algorithmic sentiment analysis would automatically classify any tweet containing 'hate words' as "negative," this project relied upon the HSU students to read the entirety of tweet and classify it as positive, neutral or negative based on a predefined rubric. Only those tweets that were identified by human readers as negative were used in this analysis. To produce the map all tweets containing each 'hate word' were aggregated to the county level and normalized by the total twitter traffic in each county. Counties were reduced to their centroids and assigned a weight derived from this normalization process. This was used to generate a heat map that demonstrates the variability in the frequency of hateful tweets relative to all tweets over space. Where there is a larger proportion of negative tweets referencing a particular 'hate word' the region appears red on the map, where the proportion is moderate, the word was used less (although still more than the national average) and appears a pale blue on the map. Areas without shading indicate places that have a lower proportion of negative tweets relative to the national average. The numbers that appear in the map during a mouse hover indicate the total number of hateful tweets and number of unique users sending them in each county. ========================================== EDIT: The mouse overs don't appear to work very well in Chrome or Firefox, but from the one or two times I was able to see some numbers it appears that each red circle may be a dozen or less tweets. Also, the hot zones dissipate significantly the further you zoom in, so without any statistics or numbers it's difficult to draw conclusions. A very interesting experiment, but given that the data is only normalized by Twitter traffic (non-response bias) this is in no way indicative of the actual distribution of racism. |
I wonder how well a Bayesian classifier would work if the this was used as a training set. If it worked relatively well, there's no reason why you couldn't create a live version of the map.
Something like http://aworldoftweets.frogdesign.com/ maybe?