Hacker News new | ask | show | jobs
by tastyminerals 2292 days ago
NLP algorithms are just fine. It is the combination of regexes, NLP and deep learning that allows you to achieve good extraction results. So, basically OCR / pdf parser -> jpeg/xml/json -> regexes + NLP / DL extractor.