Hacker News new | ask | show | jobs
by WorldMaker 2121 days ago
I built a similar tool in Python years back:

https://github.com/WorldMaker/musdex https://pythonhosted.org/musdex/

Because I built it to be extensible/support plugins I've used it for all sorts of interesting file types beyond DOCX too. (CELTX, a screenwriting format from years back; prettier diffs for Inform 7 source text; experimented with an SQLite deconstructor; ...)

Looks like I take a slightly different approach too, in that I store a bunch more metadata about the deconstructed contents (not just relying on directory listings), so I end up trusting my reconstruction tool a bit more and I mostly don't store the binary blobs in git, as I assume I can reconstruct them quickly enough.