Hacker News new | ask | show | jobs
by ademup 1352 days ago
For my use case, I ended up converting the PDF to a single png image and then use Amazon Textract on it. This allows me to easily convert pdf tables into csv files all from within php. Would love to find a cheaper (local) option vs AWS, but this works.
1 comments

> Would love to find a cheaper (local) option vs AWS

How about tesseract (https://github.com/tesseract-ocr/tesseract)

There’s even a library for php (https://github.com/thiagoalessio/tesseract-ocr-for-php). Haven’t used it. I did used python Pytesseract & works fairly well.

Haven't seen this one, thanks!