Hacker News new | ask | show | jobs
by timita 1882 days ago
The guys at OpenCage Geocoder[0] are doing a great job using only open data. But they are a team with over a decade of experience parsing an deduplicating address data from dozens of countries.

That said, their jobs page doesn't have much for now, but you may want to keep an eye on it.

Disclaimer: the founder of the company and I are acquaintances, but my assessment is only based on the quality of their service. I've been using it in production for reverse geocoding for a few years now.

[0] https://opencagedata.com/

1 comments

Hi,

Ed from OpenCage here, thanks for the kinds words! It's true we don't have any open positions right now. But anyone who is into geo stuff in general and geocoding specifically can dive in to OpenStreetMap and the open source libraries we (and many others) rely on and contribute to. Most notably Nominatim https://nominatim.org

Here's a podcast interview I did last summer with Sarah Hoffmann, the lead maintainer. https://thegeomob.com/podcast/episode-35

Hi! Shame you're not hiring.

While your API service is stellar, if not the best with open data, unfortunately the data quality is always the limit and one can only extract so much from it. While I noticed a lot of sanitization when running some queries, it didn't take a long time to find hiccups, mainly because I know the types of warts open geo data has.

But from my quick tests, there are two issues.

1. Spain. Like, the whole of it. OSM Spain is lacking a lot of number information. Even Madrid (city) alone is missing a lot, and some reasonably large towns are basically unnumbered. E.g. 40.309452, -3.730451, the whole of Getafe (180k people) lacks numbering.

All that information is available in the catastro, but names are often shortened, missing prepositions, lacking accents ("Calle de la Pasión" becomes "CL PASION" in the catastro) and is a horrible mess overall with no 100% proof way to cross correlate data, but here I don't see any cross correlation happening at all.

2. Searching for "Place de Gaulle", because it's a solid no-strange-characters way to obtain an endless supply of points within France, shows a mysterious result at rank 10: 47.63341, -83.04979, in the middle of nowhere, ON, Canada. No info whatsoever. Why would that rank that high, vs thousands of French counterparts? It doesn't appear in Nominatim either, nor in any of the datasets I've worked with; not sure where that comes from. Now I am curious, what's that?

Hi,

thanks for the kind words.

You are right that a geocoder is only as good as the data available to it. Happily OSM is great for many use cases and getting better literally every day.

Whether it is good enough now for your use case will depend ... on your use case. Not everyone needs comprehensive house numbering in Getafe. Until the local OSM community decides to add those numbers we do the best we can for the use cases where open data is a viable option today. As an aside, I am not sure the catastro qualifies as "open" data (even if it may be public), and even if so, as you correctly note, someone with local familiarity for all the abbreviations and common usages will need to help with adding it. Local knowledge is key.

re: "Place de Gaulle", of the top of my head I couldn't say, I would have to a detailed look. It's complicated, which is what makes geo fun.

Catastro doesn't have a clear license, but the spirit is certainly open[0]:

"It's worth noting the mass download service of cadastral information, available since 2011, that makes it free for companies and individuals said information, including the possibility of it being reused."

Translation mine.

I'd love to hear about the origins of such mysterious Ontario spot!

[0] http://www.catastro.minhap.gob.es/esp/usos_utilidades.asp

Without having looked in detail I would guess this is a situation where no license just causes confusion. Now it's unclear what is allowed. Ideally they would be explicit about what is allowed. Anyway, if it is allowed, using that data is a decision for the local OSM community. If you live there or have a local connection, please get involved, or just with mapping generally. It's good fun. Here's a tutorial of how to add house numbers to OSM, really it is pretty simple:

https://opencagedata.com/tutorials/adding-an-address-to-open...

re: Ontario, I will eventually have a look, but the list of projects is long and priority goes to bugs reported by customers.

If you're ever hiring folks to work on open source geo stuff, please post them on FOSSjobs and the other aggregators linked from the wiki:

https://www.fossjobs.net/ https://github.com/fossjobs/fossjobs/wiki/resources