Y
Hacker News
new
|
ask
|
show
|
jobs
by
lou1306
691 days ago
If the PDFS are textual or have OCR, then pdf2text from the Poppler suite ought to be enough? If not, add Tesseract/ocrmypdf to the pipeline?