|
|
|
|
|
by MaDeuce
3933 days ago
|
|
Here are a couple of ideas, none of which do exactly what you want. However, they may give you some ideas... PDFMiner[1] is a python toolkit for PDF. Among other things, it extracts text from PDF files. It also has a tool that lets you find objects and their coordinates in a PDF file. I have not looked at the latter functionality, but it may get you your words and locations. I've used Tesseract[2] to convert scanned documents into searchable PDF files. Since a search of the PDF file will highlight matching words in the scanned document, it clearly knows where words are and the letters that comprise them. This might be another approach. [1] https://code.google.com/p/tesseract-ocr/wiki/ReadMe
[2] https://code.google.com/p/tesseract-ocr/wiki/ReadMe |
|