Hacker News new | ask | show | jobs
by mimim1mi 145 days ago
By definition, OCR means optical character recognition. It depends on the contents of the PDF what kind of extraction methodology can work. Often some available PDFs are just scans of printed documents or handwritten notes. If machine readable text is available your approach is great.