Depending upon your use-case, you can get pretty good results by using spaCy for named entity recognition then matching on the titles of Wikipedia articles that have coördinates.
Agreed. That said, more often than not, as mentioned in the comment above (COVID use case), we'd look for a higher recall value in the predictions - there, NERs, although helpful, wouldn't be our go-to solution. This is exactly the reason why we open sourced the infrastructure and are rolling out the data
Tried this in the past, it's too limited... There are too many ways certain locations can be referred to. Take: New York City, NYC, NY, New York, NYCity, so on...
Wikipedia handles “New York City” and “NYC” as intended. “NY” and “New York” are ambiguous to both machines and humans (are you referring to the city or the state?) and if you have a resolution strategy for this then Wikipedia gives you the options to disambiguate. I’ve never seen “NYCity” used by anybody.
If you start processing web articles on the scale of millions you'll be surprised by how creative people can be. Not talking about tweets, just news and blog articles.
Not surprised, just not relevant. The criteria here is “you can get pretty good results”, not “you must be able to process millions of articles without failure”.