Hacker News new | ask | show | jobs
by janderson215 971 days ago
Are you able to highlight the text on the PDF? If so, I highly recommend PDF2TXT to extract text from PDFs. Would require some parsing work on your part to convert it back to a table, but zero chance of error from inference since it’s using text extraction.

If you can’t highlight the text, it won’t work.

1 comments

You can make any PDFs 'highlightable' with GitHub.com/ocrmypdf
It’s not perfect, unfortunately.