| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Animats 4088 days ago

Didn't they announce this on Hacker News a few weeks ago?

This might work better if you had a database of almost every street name and almost every place name. Then you could take in an address, and classify words as one or more of [StreetName, PlaceName, StreetType, etc.]. Some words can appear in more than one of those categories, which is when a deterministic parser without a full database fails. Then let the learning algorithm deal with ambiguities such as "1 Park Lane", "1 Lane Park", and such. You'd have a better chance of dealing with the hard cases. Expecting this to recognize street words on its own is a reach.

You can get about 95% successful parsing of US business addresses with a relatively simple parser that lacks a name database. (I have one running right now on 20 million addresses.) Then it gets hard. Are they doing better than that?

The commercial parsers with full address databases do much better.

"Duzbuns Hopsit pfarmerrsc"