Hacker News new | ask | show | jobs
by PhasmaFelis 3338 days ago
Off-topic, but you seem like you might know: why does text copied from PDFs sometimes have messed-up spaces? It seems to guess where the spaces should go based on kerning, so with justified text, a widely-spaced line may come out with a space between each letter, while a narrowly-spaced one has no spaces at all.

(Also the thing where it inserts line breaks at the end of every print line is maddening)

1 comments

That's often caused by the font specified in the PDF not being available on the platform where the PDF viewer is running, so a different font has been used instead.
Hmm. I may have been unclear--the PDF reads fine, but if I copy and paste some text into a text editor, I get the messed-up spaces. It seems as if the PDF doesn't encode text as text but just as a series of characters and locations, leaving spaces unrecorded, so when copy-pasting the reader has to guess from the distance between letters.