Hacker News new | ask | show | jobs
by RandomBookmarks 2761 days ago
How about https://ocr.space/tablerecognition

It returns table data line by line.

1 comments

handled the non-printed whitespace but butchered the multi- line table headers, so re-building the headers is rough as it is line by line and you need to know what words go together and you have lost the structure.
Can you send me a copy of what you are trying to extract? We use proprietary stuff (we're in the business of extracting data and performing analysis on invoices for waste, recycling, cellular, etc... stuff that gets "lost" in the AP department.

Happy to see if our tools can help. I've tried everything on the market - DocParser, MediusFlow, KOFAX, Ephesoft, etc... none work well enough in my opinion.

I should be able to get you some files, getting approval now; can you let me know how to contact you?
I changed my about to have a phonetic spelling of my email address, hosted on a very popular domain name. Feel free to toss me an email