Hacker News new | ask | show | jobs
by rmbyrro 1301 days ago
Tried this in the past, it's too limited... There are too many ways certain locations can be referred to. Take: New York City, NYC, NY, New York, NYCity, so on...
1 comments

Wikipedia handles “New York City” and “NYC” as intended. “NY” and “New York” are ambiguous to both machines and humans (are you referring to the city or the state?) and if you have a resolution strategy for this then Wikipedia gives you the options to disambiguate. I’ve never seen “NYCity” used by anybody.
If you start processing web articles on the scale of millions you'll be surprised by how creative people can be. Not talking about tweets, just news and blog articles.
Not surprised, just not relevant. The criteria here is “you can get pretty good results”, not “you must be able to process millions of articles without failure”.
If a method is not generalizable to the entire dataset, it's not that useful.

When processing text at large scale, the usefulness of heuristic approaches like the one we're discussing diminishes rapidly.

> If a method is not generalizable to the entire dataset, it's not that useful.

No, in many situations, something doesn’t have to be perfect to be useful.

Again, I think you are missing the original point being made:

> Depending upon your use-case, you can get pretty good results by…

You seem to be responding as if I said:

> For all use-cases, you can get flawless results by…

Pointing out that this is not perfect is irrelevant to the point I was making. “Good enough” is usually good enough.