Hacker News new | ask | show | jobs
by trez 4653 days ago
you can also use pdf2html with the option -x (to get xml). You would also have the position of each text tokens.