Hacker News new | ask | show | jobs
by cpach 2324 days ago
This! If anyone thinks that it’s trivial to diff Microsoft’s XML formats then I urge you to please try this:

• Create a simple Excel document.

• Clone the document and change the text value of one cell.

• Unzip both .xlsx files into two different directories.

• Now launch Meld/WinMerge or similar and diff the directories.

Now tell me if you still think diffing this format is trivial.

1 comments

If you just want a content-aware diff (never mind formatting), it's not actually that difficult to diff; read the stylesheet so you can understand the style refs, then parse the workbook sheets and look up style refs on demand.

(Have written streaming XLSX parser in the past.)

Cool. Someone should do that :) [1]

AFAIK there are no ready-made solutions for that so far. Would be very useful![2]

[1] It would be interesting to dive further in to this subject but personally I can’t currently find the time for that.

[2] Now that I think of it, this might be an interesting project for someone participating in Google Summer of Code. Not sure if the Git project will participate this year or not.