Hacker News new | ask | show | jobs
by mvcatsifma 1711 days ago
I believe such an API already exists: https://pdftables.com/ (no affiliation).

Went to a presentation at a Golang meetup in Amsterdam by the guys behind this company. Seemed to know their stuff. But I have no real world experience using it.

1 comments

I had a look - from their FAQ [0]:

However, some PDFs are scanned documents, or only contain images. PDFTables doesn't perform Optical Character Recognition (OCR) to turn these images into text.

To process these kinds of documents, you will need to either enable OCR in your scanning software, or run the PDF through specialist OCR software before using PDFTables.

[0] https://pdftables.com/faq