| A few comments on the stats themselves: 1. It looks like the total number of tickets in 2009 and 2010 is about 10% that of 2011. I'm guessing that there weren't actually ten times as many tickets given in 2011, so either the data is incomplete (as the author suggested), or there was a typo. If the data is incomplete, I'd suggest normalizing to the 2011 totals; otherwise, the 3-year average doesn't make much sense. 2. The scale of the "normalized" difference graphs (showing "Actual - Expected"). The formula given is (actual - expected) / total * 1000 = normalized number If this is the case, then since the scale goes to about +/- 5, the differences are very small (less than 1% away from what you'd expect!). But from eyeballing the data, that doesn't seem right. In any case, a better scale might be to expect the data to be normally distributed, and scale the differences to # of standard deviations. (See, e.g., http://en.wikipedia.org/wiki/Normal_distribution#Standard_de...) |
2) The fact that the normalized numbers were so small was very unintuitive to me at first too, but the important thing to realize is that in that formula, you're dividing the difference, not actual value for the given day, by the total number for the year. When I first ran those numbers I was so confused by the output. I was originally thinking that I'd normalize it by saying "X percent of the total for that year," but since I was working with the differences, and not the actual values, the numbers were too small a fraction.
Either that or I made some huge mistake in my logic...
WRT the use of standard deviations, like I said in the post, I'm not a statistician, so I wasn't really sure what the canonical way of normalizing data was. I pretty much just made one up. Thanks for pointing that out. I'll look into using standard deviation for the next one. :)