Hacker News new | ask | show | jobs
by kjhughes 1241 days ago
Thanks for checking it out for us.

So, it's a wrapper around not panddoc but pdf2docx,

https://github.com/dothinking/pdf2docx

which parses PDF via PyMuPDF,

https://github.com/pymupdf/PyMuPDF

which is a wrapper around MuPDF (which does the heavy lifting parsing PDF),

https://mupdf.com/

and writes DOCX via python-docx,

https://github.com/python-openxml/python-docx

2 comments

yes, it does indeed use pdf2docx under the hood. From a technical point of view, it doesn't do anything new asides from straddling Python and Electron into one App.

However, from an everyday user point of view, it does make it rather simple to convert pdf to word document. An everyday user won't be up for doing that via cli commands. And every alternative user friendly solution requires uploading your documents to servers (which could spark privacy concerns)

> And every alternative user friendly solution requires uploading your documents to servers (which could spark privacy concerns)

If the customer base is less technically adept, wouldn't most of them not care and just upload it to a cloud service? I ask sincerely - recently I've realized I don't have as firm of a grip on the 'average consumer' as I thought.

I just started testing fixpdfs.com, and some of the first feedback I heard when I asked users about pricing was "I'd rather download an app that does this than pay a subscription"
I would only trust my PDFs to Adobe, Microsoft, AWS, etc: the big players, very well-known, that are not going to use the content of the PDFs against me. And of course I'd rather use something that runs completely on my laptop.
Do we know how this would compare to using libpdf?
Hmm, I haven't used libpdf to know enough, but just from glance through its documentation, it seems libpdf is more suitable for creating and reading PDF files. If this is correct, then it'll be missing the bridge to converting the read content of the PDF file to a Word document
I see a couple of things called libpdf...lib-pdf and libpdf++. One generates pdfs programmatically. The other parses pdfs, but generates only images. Maybe you meant something else?