Hacker News new | ask | show | jobs
by jillesvangurp 2890 days ago
I built something like this a few years ago. We ultimately did not really succeed but I think this could be potentially still very interesting.

Location is interesting when you combine it with time. News archives contain a lot of valuable content that would be interesting in the context of a location. For example, I live in Berlin and when we started digging around in archives from news papers, we found all these gems about David Bowie visiting certain bars, being on certain streets, etc. This is interesting to people in that area, years after the fact but not necessarily for people outside that area. Just having a historical view on a place via the things that people published about it is interesting.

Our problem at the time was coming up with enough of an MVP to convince users and investors. One thing we explored was using nlp to extract clues about location references from the text. This is surprisingly hard but not impossible. People use a lot of ambiguous language to refer to locations but taken together you can sometimes deduce correctly that people are referring to a street in Prenzlauerberg (a neighborhoud) in Berlin (the capital of germany, not the village near Bremen). This is of course flaky. The good news is some content is actually geotagged, which makes this easier. However, we found a lot of low quality geotagging as well.

1 comments

It is definitely something I will explore if I continue the project, combining time and location. I agree it is difficult to find locations for text but nowadays the NLP algorithms are more powerful than ever before so it is feasible :)
The problem is the references to locations in text are ambiguous. There are many places called paris (most of which outside of France). Many streets called Main street, etc. Also lots of articles mention several locations. Then there are lots of informal names for neighborhoods, people being a bit loose with boundaries, etc. You can usually guess the city but getting from there to e.g. street level or neighborhood level is a lot harder. Anyway, good luck.