Y
Hacker News
new
|
ask
|
show
|
jobs
by
trez
4653 days ago
you can also use pdf2html with the option -x (to get xml). You would also have the position of each text tokens.