Hacker News new | ask | show | jobs
by rperez333 2123 days ago
This looks very interesting. Do you think it can be applied to other kinds of XML files? I'm interested in using git with a vfx software (The Foundry Nuke) that writes XML projects, and it would be great to have some versioning system for it.

I've tried using the git diff patience algorithm, but didn't work well - frequently, the diff was about to remove every single line and add all them back to the XML file.

2 comments

As with source code, if you can get a consistent linter/formatter run on the file before commit you should see less "jitter" in the diffs those commits produce.

I got some decent results with `xmllint --format` which is the linter/formatter from libxml2 (so available in most Linux distros and ported to most platforms).

(I was using xmllint as a formatting step when unpacking ODT files in my similar tool to the directly above; mentioned in a sibling comment. I found the XML files in ODT files were much more prone to being minimalized and reformatted/reordered on every save in comparison to DOCX which was surprisingly more stable in XML formatting.)

In your situation, I'd just whip together a quick PowerShell script like I have here, but tailor it to the structure of your file format: traverse the XML tree and have a few if-else statements which filter out noisy metadata you don't need to see in the diff, if any, and save the resulting collected text node contents as a text file alongside the XML files. Each commit with changes to the XML will thanks to the Git hook also have a corresponding TXT file so you can very easily view the changes in a skimable way, unlike the potentially really big and messy XML diff you'd have if you versioned only the original.
thank you guys for these ideas, both sound great (powershell script and linter) and I'm confident I will get something working now!