Hacker News new | ask | show | jobs
by troysk 671 days ago
Maybe you could try extracting the text also using some pdf text extraction and use that also to compare. Might help fix numbers which tesseract gets wrong sometimes.