|
|
|
|
|
by tripzilch
3463 days ago
|
|
Yup. For instance, Word's PDF output, has an absolutely positioned textbox for every word (and sometimes sub-word). This is for kerning purposes. If you want your original text back, you're going to need some OCR-like preprocessing and heuristics to guess what textboxes belong to the same line. If you have multiple columns, good luck distinguishing them from accidental rivers. It's not impossible, but I wouldn't know immediately what tools get this most right. And it's always a lossy operation going back and forth. |
|