Hacker News new | ask | show | jobs
How to parse postal addresses from documents
1 points by khichi 5455 days ago
I have seen that google and yahoo do a neat job on parsing postal address information from free form text. What technique/api do they use? What can I use to parse all postal addresses from a pdf, word, text, html document.
1 comments

Parsing, Tokenizer, NER (Named entity recognition), Matching extracted tokens to geographical names list (gazetteer matching). Thats the process more or less. For tools you can google according to the terms above and get many. Hope this helps