Hacker News new | ask | show | jobs
by jrockway 1937 days ago
This is neat. I decided to diff two pods in a replicated Kubernetes service. It seemed that it was going to take forever to run, so I just wrote a short Go program to do the same thing (load two JSON files into a map[string]interface{}, cmp.Diff them) while it was going:

https://gist.github.com/jrockway/73982949b3d2ce9b443528042c4...

My program runs in less than 10 milliseconds (/usr/bin/time reports 0.00 seconds), and graphtage takes 5 minutes and 17 seconds. I'm 317,000x faster! (Not including the time to write the program; if you do that, then it's about even assuming graphtage took 0 seconds to write.)

Graphtage prints the entire file in JQ colors, with diffs inside fields colored red and green, which I love:

    ...
    "hostIP": "10.136.13̟9̟2̶1̶.139̟1̶",
    "podIP": "10.244.1.18̟6̟7̶",
    ...
(My terminal can't display the dots under the numbers and the strikethrough, but it looks great on HN! I really love it.)

My program produces relatively boring line-by-line diffs:

    -               "hostIP":    string("10.136.121.131"),
    +               "hostIP":    string("10.136.139.139"),
                    "phase":     string("Running"),
    -               "podIP":     string("10.244.1.17"),
    +               "podIP":     string("10.244.1.186"),
Honestly, I get what I want out of mine, and wait 317,000x less time, so... I probably won't be using this on a daily basis. But I will be stealing those dots and Unicode strikethroughs.
4 comments

It's neat that you built what you needed in a few lines of code. I must say, I don't quite like the output of Graphtage, I like your's a little bit more, but without a context it's not easy to see how podIP is nested.

Usecases might be a little bit different, but please allow me to share my solution.

The problem with diffing JSON and yaml is, that these formats aren't line based and hashes don't need to be ordered. But there is gron to turn json into a greppable line-based format [1]. Then you can sort. The sorted output is possible to diff now and then you can color the diff output with delta or a similar tool [2].

    diff -u <(kubectl get pod pod1 -o json | gron | sort) <(kubectl get pod pod2 -o json | gron | sort) | delta --light --word-diff-regex="\W+"
This output provides a lot of context for me to see and understand the differences.

[1] gron https://github.com/tomnomnom/gron

[2] delta https://github.com/dandavison/delta

The diff I did is aware of the structure of the object. It's not just sorted lines.
Yes, I understood this. I was just trying to say that it's not easy for me to recognise the context of a diff in a deep nested yaml or json.
Apart from the interesting conversation here, just to make sure...

You are all aware of kubectl diff[1], right? I understand that sometimes you just want to diff two k8s objects, kubectl diff is not a tool for that.

[1]: https://www.mankier.com/1/kubectl-diff

Maybe a bit of a red herring, but I just used Kubernetes as a cheap source of mildly interesting JSON to test a diffing tool with. You'll see that the manpage for Graphtage just uses things like '{"foo":["bar"]}' in their examples... and those run fast. But the second you get some real-world piece of data, it takes 5 minutes to run. That's why I tested on some real-world data first.
Well you are kinda comparing apples and oranges here.

According to their readme they don't just match on keys, but even try to detect changed keys for the same content, even when the two files have a different inner order of elements.

Your diff is probably equivalent to a pretty print and then running regular diff on it, i.e. not even sorting the file.

Having said that and assuming your file wasn't extraordinary large, a 5min runtime makes this tool kinda unusable.

Your approach works less well if someone re-orders one of the arrays, like `containers` or `env`.
Yes but does your code actually do tree diffing or does it just do line based diffing?

Proper tree diffing is a really hard (I would say unsolved) problem. The "standard" algorithm is O(N^4)!