Hacker News new | ask | show | jobs
by AustinBGibbons 3590 days ago
Hey, I helped edit the blog post. Definitely agree that the different spiky behaviors are driven by data collection, but that's the beauty of real-world datasets! It would be interesting to learn if it was bulk first-of-month reporting, that crimes where the day wasn't specified get marked as the first, or some other phenomenon. The next step might be to interview police officers and ask :-)

Here's the top crimes broken down by the first of the month vs the average for the rest of the month - some crimes are only reported (or never reported) on the first of the month, and "FRAUD" for example is twice as likely to be the first of the month: https://sli.mg/lHjo83

1 comments

Another thing you'll notice in that image is that sex offenses get treated differently than other crimes. My understanding is that in an effort to protect the privacy of the victims, those crimes never have accurate geolocations, and in your image it looks like they are always reported on the 1st of the month, so they probably also never have accurate dates/times.

There are a LOT of these types of idiosyncrasies in crime data. If you do end up interviewing police officers in SF ask them why they don't publish homicide data as part of their crime data (I genuinely don't know the answer).