Hacker News new | ask | show | jobs
by palmtree3000 1750 days ago
Json diffing.

I haven't found any implementations I'd consider good. The problem as I see it is that there are tree based algorithms like https://webspace.science.uu.nl/~swier004/publications/2019-i... and array based algorithms like, well, text diffing, but nothing good that does both. The tree based approach struggles because there's no obvious way to choose the "good" tree encoding of an array.

I've currently settled on flattening into an array, containing things like String(...) or ArrayStart, and using an array based diffing algorithm on those, but it seems like one could do better.

2 comments

At the risk of not being helpful: I have some json files that are updated weekly that I keep under source control in git. The week-to-week updates are often fairly simple, but git was showing some crazy diffs that I knew were way more complicated than the update. I soon realized that the data provider was not consistently sorting the json arrays; when I began sorting the json arrays by rowid everytime before writing it, the diff's were as straightforward as expected. I think I don't understand the problem you're encountering, because this solution seems too obvious.
The problem is in the syntax of JSON itself. Use JSONL instead