Hacker News new | ask | show | jobs
by tinus_hn 2124 days ago
It’s a zip, that’s not the hard part.

Apart from attachments and metadata the actual document is some kind of xml monstrosity that contains the text and the markup. It’s not very useful to just create diffs from that, it looks a bit like the HTML created by FrontPage if you remember that.

You can just rename a docx file to .zip, unpack it and peek around.

1 comments

The XML might be awful for viewing but I do wonder if it would diff better for storage? Git is awfully inefficient for storing binary data.
Not really, as there isn't a linearity or markup feel to the XML. Outside of straight text changes, formatting, rearranging, and internal markups, are not possible to 'visually' diff in the XML.