Hacker News new | ask | show | jobs
by davedx 1182 days ago
It depends. There are PDFs with rasterized images of text (like in the article, when it’s a scan or photo of a document), then there are PDFs with vector positioned text runs (when it’s usually a result of some digital process). The latter are way easier to process than the former.