Hacker News new | ask | show | jobs
by dav43 2531 days ago
Have you tried the tabula free program? I use it for some finance work reading filings.

http://tabula.technology/

1 comments

Yep! It's great, but is maybe 60% there, so I'm looking for something that can extract much more structure from a document. I doubt what I'm looking for will exist for another 10 years, though.
is it feasible to create loose templates for where the data is and extract that way? i have a mothballed project that did pretty well. it was able to discern different templates from a mass of documents.
I'm curious, if you email me a sample I can tell you what's possible.