Nice work, I am glad that there is a paragraph talking about exposure. Crash trends based strictly on total number of crashes are easy to predict just based on where there is more traffic. Using crashes per vehicle mile traveled for road segments or crashes per entering vehicle for intersections can help tease out trends. Controlling for severity is also important.
When I do a crash analysis for a city, one of the tasks I do regularly for my job, I generate a crash rate and severity index for each intersection. The severity index is basically a weighted average based on severity, non-injury=1, minor injury=3, and severe injury or fatality=8. The crash rate and severity index are divided to create a Severity Rate. While not perfect or statistically valid, it does help identify trends. Also, I am in a rural state so it is rare that there are enough crashes to make any statistically valid conclusions.
What’s the basis for the severity weights? I’d expect the weights to be way more spread out, like 1/10/100/1000. It would definitely not be a good trade off to eliminate nine non-injury crashes at the cost of one additional fatality. But I certainly could be missing something about this sort of evaluation.
Fatalities are a tiny minority of crashes and aren't really interesting to study because usually you basically wind up studying the behavior of drunk people and people who don't wear seat-belts and if you filter those out there's not much data left making meaningful conclusions hard to draw. Fatal accidents are often just normal accidents with a couple aggravating variables on top (e.g. person rear ends semi-truck instead of normal truck or person gets in minor accident but not wearing seat-belt) so it doesn't make sense to fixate one them. Anything that reduces normal crashes by some amount will also affect fatal crashes.
They’re a relatively high percentage of vehicle-pedestrian and vehicle-bicycle collisions, though. Very important for a pedestrian-oriented city like NYC.
Drunk drivers and people that don't wear seat belts are still worth reviewing. While there are rarely engineering solutions to the fatalities that result, it can help inform education programs and initiatives. Amazingly, buckle-up and don't drive drunk advertising can make a difference.
They absolutely are, but are rare enough that it's difficult to reach statistical significance when talking in the aggregate. That a particular part of town went from one fatality one year to zero fatalities the next year is probably not evidence of the success of any particular safety-related policy intervention, it's just noise. Studying all crashes provides a proxy that hopefully helps decrease the odds that the fatal ones will occur will making it possible to make robust, data-driven claims about success or failure.
On a project I am currently working on, we saw pedestrian fatalities shift from 7 to 13 in consecutive years. it is a nearly 100% increase but like you said, it is just noise. This is in a city with around 100,000 residence. Convincing politicians that it is just noise is a whole different story.
I came of age at the end of the campaign making drink driving socially unacceptable in the UK to the point of being a social lepper if you drive drunk, even two or three drinks not falling about drunk - so it was normal to me that you don’t combine a night out with drink and your car.
I then moved to Texas where it was still just a ‘naughty boy’ type social offense until a major campaign that is still underway changed minds and hearts about it.
I’d like to see he stats in 10 years because it has been anecdotally very successful.
Generally it is based on the relative costs of a non-injury vs other injury types. So the weight is equal to cost of fatality divided by cost of non-injury. Keep in mind that these values vary from region to region. We use the 1, 3, 8 grouping because the State uses it and by being consistent we can compare different areas more easily.
I've worked extensively with this dataset on a similar project, http://crashmapper.org, and through that process found that the data is extremely error prone. Perhaps 20% of the collisions recorded are not geocoded (e.g.lack lat, long coordinates) and don't contain other location information such as street, cross street, and zip code that could be used to geocode them. It appears that some precincts of the NYPD do a better job at recording a crash location then others. Even more of the data lacks values for "contributing factors" so it seems difficult to use as a metric for analysis. Often there is a mismatch between the total number of persons injured or killed and the number of pedestrians, cyclists, or motorists injured or killed. Furthermore, whomever maintains this dataset will periodically go back in time and update it seemingly at random, editing existing data or adding new data, potentially months or years back in time. Often it appears to be that the data maintainer is changing values for fields such as the number of pedestrians, cyclists, motorists injured or killed. Presumably this is because more information surfaced about an incident at a later point in time and the city must go back and update it. However this can result in stats from the data not aligning with the NYPD's or DOT's official stats from a previous year. I would advise anyone to keep these facts in mind if trying to use the data for analysis and policy recommendations, such is open data.
Having done something similar for the Long Beach, CA area in college, one of the most interesting takeaways was the relative spatial distribution between fatal and non-fatal accidents.
Non-fatal accidents clearly clustered around high traffic areas, but fatal accidents didn’t reveal the same clustering. Instead they appeared to be uniformly distributed across the city.
I’m sure there is an explanation in this, and this was only 10 years data for a single city, but it always felt a little spooky that these accidents were equally likely to happen anywhere (though most likely later in the night).
High traffic areas tend to move traffic much slower than lower density areas. Getting into a fatal traffic crash when going 15 miiles per hour in stop and go traffic is much harder than when you lose control of a car when going 50 on an empty street.
I suspect that fatal accidents that are due to roadway design issues are more likely to result in the roadway being changed quickly; so it doesn't happen again. Especially if the fatalities are at the scene.
Minor accidents can happen for years at the same intersection, because of the same design issues, without triggering urgent followup, if it doesn't somehow trigger a response from officials.
I'm not sure what constitutes a "collision", but in 2015, I lived on Lexington between 121 and 122 and saw the investigation of a Hit and Run of a homeless man. I talked to a couple of the witnesses who saw it happen.
This incident was at Lexington and 123rd. In the data, I do not see this incident.
The question is if the highlighted area are really more dangerous or if there are just more visitors. Shouldn't one take into account the traffic counts?
Lots of crashes in Hell's Kitchen. That area is full of people going out to bars and restaurants, tiny sidewalks, and lots of impatient drivers trying to get through Manhattan to New Jersey.
Drivers mostly hit other things when there's too many things demanding their attention (poor visibility + difficult left turn + busy traffic + bikes + pedestrians = high risk of accidents) so this is probably just a heat map of intersections that are the busiest (in terms of things going on, not necessarily throughput).
When I do a crash analysis for a city, one of the tasks I do regularly for my job, I generate a crash rate and severity index for each intersection. The severity index is basically a weighted average based on severity, non-injury=1, minor injury=3, and severe injury or fatality=8. The crash rate and severity index are divided to create a Severity Rate. While not perfect or statistically valid, it does help identify trends. Also, I am in a rural state so it is rare that there are enough crashes to make any statistically valid conclusions.