Hacker News new | ask | show | jobs
by yellowbkpk 3327 days ago
US Census can't really work with anyone when it comes to their data. They take their responsibilities under Title 13 [0] (which prohibits them from sharing information they collect) very seriously.

They're trying various options to share some limited data (like address data) [1], but haven't gotten very far. They've done some work on "Community TIGER" [2], which aims to give validation information back to local governments for geographic data, but not the improvements that Census generates as part of the decennial census.

The US Postal Service has no motive to share with the Census Bureau. For one thing, the USPS makes all of its money selling limited access to its address list to advertisers. Additionally, the USPS's address list (or delivery points) doesn't necessarily correspond with people and where they live.

PS: I'm interested in this sort of thing because I help run OpenAddresses [3], a community-built list of authoritative address data sources from around the world. There's a lot of data out there!

[0] https://www.law.cornell.edu/uscode/text/13/9 [1] https://fcw.com/articles/2011/09/14/census-bureau-title-13.a... [2] https://www2.census.gov/geo/pdfs/gssi/Community_TIGER.pdf [3] https://openaddresses.io/

1 comments

I used to be GIS Coordinator for a medium size municipality. The Census Bureau sent out DVD's of address points for the 2010 count for cities to validate. There was some collaboration between cities, USPS, and Census on it but as you likely know address standards are a mess in the US. Many states have efforts to standardize but it's often difficult because addressing is left to the municipality and often falls to a non GIS person (city planner, building inspector etc) who has little concept of normalization or why it's a bad idea to create addresses that are difficult for a computer to understand.

Openaddresses.io is an awesome project, as a GIS professional thank you! This type of data can be immensely useful!

Our of curiosity, what sorts of addresses are ones that are difficult for a computer to understand?
It's almost always just a geocoding/data format issue. It's machine readable but you can get quirks when an address is something like "1 1/2 Hacker Street" - is that "1.5 Hacker Street", "Unit 1 1/2 Hacker Street", "1 Hacker Street, Unit 1 of 2" etc. It might not be a problem if you are dealing with a small discrete area but when you are trying to merge multiple areas it can be a challenge.

Then you have situations where a street name is the same as the region/state (is the source data just missing attributes?), confusion about the direction prefix/suffix (1 W Hacker Street, 1 Hacker Street W - can vary within the same city on the same street), poorly named streets like "1 W Hacker Street East or "1 Hacker Street East W", or street name types that are not commonly used (1 Hacker Launch Pad etc).

It's even more fun when you start talking about countries that either use older addressing formats (I've seen addresses in Ireland that are just like (Old Blue Cottage, Some Town) or are not in english.

There are many more situations, these are just what immediately come to mind.

All addresses are pretty hard for computers to understand, so the people building the software to make them understand (geocoders) usually make them work for the most 'in-demand' address formats. Just like every other bit of software, finding the special cases and figuring out how to understand them is tricky.

http://www.columbia.edu/~fdc/postal/ is a pretty good overview of most of the postal addressing systems around the world.