Hacker News new | ask | show | jobs
Ask HN: What's the best document parsing tool/SDK that you've heard of?
1 points by voiceclonr 2791 days ago
I am looking to parse various documents (docx,ppt,pdf,pst etc), extract metadata, text etc for search. I'm looking into Apache Tika - but my gut tells me a native windows tool may be better long term. Can anyone refer to tools/SDK they've used or heard to be successful ?
1 comments

Tika is what we use. It's not perfect, but it works pretty well for our purposes.