Y
Hacker News
new
|
ask
|
show
|
jobs
by
tastyminerals
2292 days ago
NLP algorithms are just fine. It is the combination of regexes, NLP and deep learning that allows you to achieve good extraction results. So, basically OCR / pdf parser -> jpeg/xml/json -> regexes + NLP / DL extractor.