Hacker News new | ask | show | jobs
by gettalong 1346 days ago
If the PDF creation software supports tagged PDF, text extraction gets much easier because all the text and structural information is preserved. This also allows "reflowing" of contents, similar to how a HTML page reflows on smaller screens.