Hacker News new | ask | show | jobs
by whyleyc 1155 days ago
If you just need to convert the files have you thought about using Zamzar (https://dev.zamzar.com/)?

We have a file conversion API that supports DOC/DOCX/ODT/PDF/TEX to Markdown conversion in one line of cURL (or you programming language of choice).

(Disclaimer: I'm the product lead for the Zamzar API).

1 comments

Thanks I'll check it out. What do you do with PDFs that lock text in images, are you using ML/OCR? And as mentioned, tables?
Currently OCR support is limited to PDF > TXT conversion but we're hoping to add support for other output formats at some point. Feel free to shoot me an email at chris [at] zamzar [dot] com if you'd like to chat further.