|
|
|
|
|
by undebuggable
2219 days ago
|
|
> For the generated pdfs, I found pdftotext could pull with 100% fidelity, and so that was 'option #1'. for scanned-images-saved-as-pdfs, then tesseract could sometimes extract with 90+% accuracy. Arrived to a similar conclusion although never have bothered with DB or any web interface running locally. Simply grepping the text files works flawlessly for me. |
|