Hacker News new | ask | show | jobs
by qubex 2064 days ago
pdftotext -layout
2 comments

Sometimes works well, depending on the structure and content of the PDF. Other times it's hopeless.

Certainly not a general solution. Indeed, there isn't one, because the design of PDF allows far too many things that can't be reliably deciphered back to the source data.

That's why Adobe is throwing all their ML at it, to try and come up with something that guesses near enough right more of the time.

as with the hundreds of other converters, it probably will produce varying results