Hacker News new | ask | show | jobs
by svennek 1009 days ago
That is really hard, as there are no such things as columns in PDFs, only text starting at different (x,y) positions.

Hence most (if not all) programs export the text in the order they appear in the file.

And if it is scanned, there is no text at all (but you could OCR it).