|
|
|
|
|
by pc86
4012 days ago
|
|
How does the PDF OCR process compare to images? I uploaded a sample PDF with very clear sans-serif text (printed to PDF from a webpage) and there seems to be some odd substitutions. "prohibitecL" instead of "prohibited", "ac" instead of "QC" (as part of an address), random clipping of the first letter in a few lines and random use of a capital i instead of 1. Overall very good, I'm just wondering if the library is better with image files than PDFs? |
|
The OCR library itself supports only image formats as input and is "innocent" with regards to this issue ;)