|
|
|
Ask HN: Good tools for text extraction from PDF
|
|
3 points
by lucasrp
4432 days ago
|
|
Hi guys, I'm needing a tool that allows me to convert PDF to html files. Since I work with public documents, sometimes the layout from the pdf can be pretty nasty (i've attached some links at the end of this post). We have a in house soluction forked several years ago from Apache pdfBox. After a while we realized that forking a open source solution isnt the best answer, but kept on going because it worked. Does anyone have sugestions? We are willing to contribute to the open source project we choose :) Many thanks! https://www.evernote.com/shard/s226/sh/17b87c1f-8f18-4b23-96ac-a9fbc2ac8502/ea5618043f3a9c818071bd93df9f74c3 https://www.evernote.com/shard/s226/sh/17b87c1f-8f18-4b23-96ac-a9fbc2ac8502/ea5618043f3a9c818071bd93df9f74c3 |
|
http://www.foolabs.com/xpdf/about.html
But some of that is because the source I was pulling text from didn't change the document format much from month to month.
I guess it is the library underneath jeffmould's link.