Mapping Motor Vehicle Collisions in New York City | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Mapping Motor Vehicle Collisions in New York City (toddwschneider.com)
	72 points by lil_tee 2688 days ago

12 comments

takk309 2688 days ago

Nice work, I am glad that there is a paragraph talking about exposure. Crash trends based strictly on total number of crashes are easy to predict just based on where there is more traffic. Using crashes per vehicle mile traveled for road segments or crashes per entering vehicle for intersections can help tease out trends. Controlling for severity is also important.

When I do a crash analysis for a city, one of the tasks I do regularly for my job, I generate a crash rate and severity index for each intersection. The severity index is basically a weighted average based on severity, non-injury=1, minor injury=3, and severe injury or fatality=8. The crash rate and severity index are divided to create a Severity Rate. While not perfect or statistically valid, it does help identify trends. Also, I am in a rural state so it is rare that there are enough crashes to make any statistically valid conclusions.

mikeash 2688 days ago

What’s the basis for the severity weights? I’d expect the weights to be way more spread out, like 1/10/100/1000. It would definitely not be a good trade off to eliminate nine non-injury crashes at the cost of one additional fatality. But I certainly could be missing something about this sort of evaluation.

dsfyu404ed 2688 days ago

Fatalities are a tiny minority of crashes and aren't really interesting to study because usually you basically wind up studying the behavior of drunk people and people who don't wear seat-belts and if you filter those out there's not much data left making meaningful conclusions hard to draw. Fatal accidents are often just normal accidents with a couple aggravating variables on top (e.g. person rear ends semi-truck instead of normal truck or person gets in minor accident but not wearing seat-belt) so it doesn't make sense to fixate one them. Anything that reduces normal crashes by some amount will also affect fatal crashes.

cimmanom 2688 days ago

They’re a relatively high percentage of vehicle-pedestrian and vehicle-bicycle collisions, though. Very important for a pedestrian-oriented city like NYC.

takk309 2688 days ago

Drunk drivers and people that don't wear seat belts are still worth reviewing. While there are rarely engineering solutions to the fatalities that result, it can help inform education programs and initiatives. Amazingly, buckle-up and don't drive drunk advertising can make a difference.

apendleton 2688 days ago

They absolutely are, but are rare enough that it's difficult to reach statistical significance when talking in the aggregate. That a particular part of town went from one fatality one year to zero fatalities the next year is probably not evidence of the success of any particular safety-related policy intervention, it's just noise. Studying all crashes provides a proxy that hopefully helps decrease the odds that the fatal ones will occur will making it possible to make robust, data-driven claims about success or failure.

takk309 2688 days ago

On a project I am currently working on, we saw pedestrian fatalities shift from 7 to 13 in consecutive years. it is a nearly 100% increase but like you said, it is just noise. This is in a city with around 100,000 residence. Convincing politicians that it is just noise is a whole different story.

jamiek88 2688 days ago

I came of age at the end of the campaign making drink driving socially unacceptable in the UK to the point of being a social lepper if you drive drunk, even two or three drinks not falling about drunk - so it was normal to me that you don’t combine a night out with drink and your car.

I then moved to Texas where it was still just a ‘naughty boy’ type social offense until a major campaign that is still underway changed minds and hearts about it.

I’d like to see he stats in 10 years because it has been anecdotally very successful.

anitil 2688 days ago

It's these sorts of comments that make HN.

takk309 2688 days ago

Generally it is based on the relative costs of a non-injury vs other injury types. So the weight is equal to cost of fatality divided by cost of non-injury. Keep in mind that these values vary from region to region. We use the 1, 3, 8 grouping because the State uses it and by being consistent we can compare different areas more easily.

clhenrick 2688 days ago

I've worked extensively with this dataset on a similar project, http://crashmapper.org, and through that process found that the data is extremely error prone. Perhaps 20% of the collisions recorded are not geocoded (e.g.lack lat, long coordinates) and don't contain other location information such as street, cross street, and zip code that could be used to geocode them. It appears that some precincts of the NYPD do a better job at recording a crash location then others. Even more of the data lacks values for "contributing factors" so it seems difficult to use as a metric for analysis. Often there is a mismatch between the total number of persons injured or killed and the number of pedestrians, cyclists, or motorists injured or killed. Furthermore, whomever maintains this dataset will periodically go back in time and update it seemingly at random, editing existing data or adding new data, potentially months or years back in time. Often it appears to be that the data maintainer is changing values for fields such as the number of pedestrians, cyclists, motorists injured or killed. Presumably this is because more information surfaced about an incident at a later point in time and the city must go back and update it. However this can result in stats from the data not aligning with the NYPD's or DOT's official stats from a previous year. I would advise anyone to keep these facts in mind if trying to use the data for analysis and policy recommendations, such is open data.

xyzwave 2688 days ago

Having done something similar for the Long Beach, CA area in college, one of the most interesting takeaways was the relative spatial distribution between fatal and non-fatal accidents.

Non-fatal accidents clearly clustered around high traffic areas, but fatal accidents didn’t reveal the same clustering. Instead they appeared to be uniformly distributed across the city.

I’m sure there is an explanation in this, and this was only 10 years data for a single city, but it always felt a little spooky that these accidents were equally likely to happen anywhere (though most likely later in the night).

icsllaf 2688 days ago

High traffic areas tend to move traffic much slower than lower density areas. Getting into a fatal traffic crash when going 15 miiles per hour in stop and go traffic is much harder than when you lose control of a car when going 50 on an empty street.

toast0 2688 days ago

I suspect that fatal accidents that are due to roadway design issues are more likely to result in the roadway being changed quickly; so it doesn't happen again. Especially if the fatalities are at the scene.

Minor accidents can happen for years at the same intersection, because of the same design issues, without triggering urgent followup, if it doesn't somehow trigger a response from officials.

jermaustin1 2688 days ago

I'm not sure what constitutes a "collision", but in 2015, I lived on Lexington between 121 and 122 and saw the investigation of a Hit and Run of a homeless man. I talked to a couple of the witnesses who saw it happen.

This incident was at Lexington and 123rd. In the data, I do not see this incident.

karussell 2688 days ago

The question is if the highlighted area are really more dangerous or if there are just more visitors. Shouldn't one take into account the traffic counts?

BTW: there is similar (open) data for Germany: https://unfallatlas.statistikportal.de/ (It clearly shows the problem I mentioned)

Update: sorry, it seems that this issue is already discussed in this thread

djtriptych 2688 days ago

Yup. This is pretty much a map of population density in NYC.

jdlyga 2688 days ago

Lots of crashes in Hell's Kitchen. That area is full of people going out to bars and restaurants, tiny sidewalks, and lots of impatient drivers trying to get through Manhattan to New Jersey.

bonyt 2688 days ago

The map of total deaths includes a significant blip on the west side near Pier 40 and the Holland Tunnel, which I think is from the 2017 truck attack.

https://en.wikipedia.org/wiki/2017_New_York_City_truck_attac...

Map: https://imgur.com/a/jNbOv7W

kevin_thibedeau 2688 days ago

That area has a higher rate of incidents in general because of Brooklynites trying to avoid the excessive toll on Verrazano when leaving the city.

ryeguy_24 2688 days ago

I would bet that the shadow/light patterns on Roosevelt Avenue & 94th Street, Queens cause significant visual distractions to drivers and pedestrians.

dsfyu404ed 2688 days ago

Drivers mostly hit other things when there's too many things demanding their attention (poor visibility + difficult left turn + busy traffic + bikes + pedestrians = high risk of accidents) so this is probably just a heat map of intersections that are the busiest (in terms of things going on, not necessarily throughput).

I'd like to see a month by month heat map.

magduf 2688 days ago

>Drivers mostly hit other things when there's too many things demanding their attention

And this is exactly why humans shouldn't be driving. Hopefully human driving will be banned before too long, as machines can do it so much better.

brianbreslin 2688 days ago

I would love to pay the author to do this for my city. I'm fairly certain I could get the local govt to pay up for this.

slowhand09 2688 days ago

Very impressive!

skizm 2688 days ago

https://xkcd.com/1138/

InitialLastName 2688 days ago

To be fair, they mention that in the first paragraph after they introduce what the data is actually doing.

There is still a value to looking at a population-correlated heatmap in order to draw conclusions from the discrepancies between the two.