Hacker News new | ask | show | jobs
by chockchocschoir 1537 days ago
Indeed. One could just do `diff $(jq . $fileOne) $(jq . $fileTwo)` and you'll end up with a "nice enough" diff even if $fileOne and $fileTwo were very differently formatted.
2 comments

The problem is when a file also needs to be normalized - e.g. object keys in a different order, YAML syntax expansion. It can be very useful to indicate when a JSON file is identical to another JSON file but some of the properties or array items are out of order and that requires more in-depth knowledge of the data format. Let's not mention that you could UTF-8 encode characters or write out the same character using backslash notation, numeric or boolean data that might be wrapped in a string in one file but not in another, etc. There can still be a lot of modelling and interpretation to consider when comparing data files rather than code files.
I wrote a tool that tidies JSON and can do things like re-orders keys in a fixed order - https://github.com/ActiveState/json-ordered-tidy
I'm not too familiar with YAML, so can't answer to that.

But re JSON:

> object keys in a different order

They can't be "in a different order" as JSON keys are not ordered. They can be whatever order, and would still be considered the same.

> array items are out of order

Then it's different, as JSON arrays are ordered. ["a", "b"] is not the same as ["b", "a"] while {a: 1, b: 1} and {b: 1, a: 1} is the same.

> you could UTF-8 encode characters or write out the same character using backslash notation, numeric or boolean data that might be wrapped in a string in one file but not in another

Then again, they are different. If the data inside is different, it's different.

I understand that logically, they are the same, but not syntax-wise, which is why I included the "differently formatted" "disclaimer", it wouldn't obviously understand that "one" and "1" is the same, but then again, should you? Depends on use case I'd say, hard to generalize.

> They can't be "in a different order" as JSON keys are not ordered. They can be whatever order, and would still be considered the same.

This is what GP is saying, I'm pretty sure. Object member order is non-semantic in json, so in order to do a semantic diff (one that understands structure), you need to canonicalize the order of the two sides. Simply diffing the output of jq doesn't do that, because (afaik) jq doesn't alter the order.

Basically, if you want this to come up the same:

    {"a":"b","c":"d"}
    {"c":"d","a":"b"}
you need more than just `diff $(jq) $(jq)`.

Can argue about whether a tool like difftastic should do that, I guess, but I would personally lean towards that it should be smart enough to see this because it's precisely the sort of thing that both humans and line-based diff can be awful at seeing.

Just an FYI, jq has a flag to sort by the name of keys, I believe it's -k.
Fair enough! I should just never assume jq doesn't have a feature.
Nitpick: diff takes filenames as arguments, so comparing the output of two commands would need the `<()` expansion. So the command would be `diff <(jq . $fileOne) <(jq . $fileTwo)`