Hacker News new | ask | show | jobs
by jimrandomh 3777 days ago
A personal pet peeve of mine when reading diffs, is when a file has some functions and you insert one and instead of looking like this:

     int someOldFunction()
     {
         // Function body
     }
    +
    +int newFunction()
    +{
    +    // New function body
    +}
It looks like this:

     int someOldFunction()
     {
         // Function body
    +}
    +
    +int newFunction()
    +{
    +    // New function body
     }
It's a small thing, but given that these diffs are equivalent, the one that balances the curly braces within added blocks should be favored. But diff utilities seem to get this pretty consistently wrong.
6 comments

It is a small thing, but it throws me off everytime i see it, and then it takes a few seconds of looking around before it dawns on me what happened.

The "user" in me would love a language aware diffing (and merging) system, but the developer in me is already groaning about how much work that would end up taking for arguably not that much benefit.

Maybe this will help: https://www.semanticmerge.com/
I really like SemanticMerge, it makes total sense, although the diff experience is unlike other diff tools in terms of immediacy. But I think it will be another great tool to add to the arsenal when you need to pull out the big guns on crazy diffs.
Absolutely. I had it installed for some time and mostly thought "this is really neat but I could live without it." And then there was a monster merge with significant conflicts all over the place. I don't think it would have been otherwise possible to pull it off with as little damage as there was if not for this tool.
I really like SemanticMerge, in its current shape it is litte more than a fancy proof of concept.

Its biggest limitation on real projects is that it works on a single-file level, while all the interesting stuff happens on patch level. You may browse their forums to get an idea of what else is missing.

That said, I wish all the best to Codice and I really hope that they continue to invest in this tool.

Looks a lot like Eclipse's semantic diffing. I haven't used Eclipse since 2006, but it was better than anything I have seen since.
I would love to press that "Buy Now" button, but I just can't seem to get past the emboss effect on that CTA.
I think it might be easier to get to a syntax-aware diff if one approached it reusing the language specific syntax highlighting specs used in various editors. I've almost sat down to start that myself a few times.
I don't think it has to be truly language aware. A diff tool that looks for matched braces, quotes, and indents to figure out where the blocks are would do better most of the time.
It doesn't have to be fully language aware, but it has to understand most, if not all, of the syntax. As soon as you start trying to match braces, you need to handle strings, comments, and probably a whole bunch of stuff I'm not thinking of.

Here's an idea: wouldn't it by trivial for <insert your favourite language compiler here> to expose the AST of your code, and solve this problem the easy way?

    git diff --patience
might give you better results? I've seen this pattern too, but I can't find a reproduction of normal git diff giving it to me at the moment.
I use this in my ~/.gitconfig

   [diff]
   algorithm = patience
I've experimented with patience diff, but not seen it deliver reliably superior results than myers (the current default).
I saw it deliver superior results enough that I spent a while figuring out how to get vimdiff to use patience.
And? What do I need to do to get that? :)
Oh, just saw this, it involves invoking the proper git tools to get the diff, and then converting the diff format from unified to ed format. The later is actually easier than it might seem as unified diffs start all of their special information in column 0; IIRC I wrote an awk script to do this. It's on my work machine though, so I don't have it handy.
I tried GP's example, and in this particular case both --patience and the default (Myers) work the same, both doing the thing you want them to. Which perplexes me, because I know I've seen the bad case too, but can't seem to find a minimal example of it (I tried a couple variations on the example; they all did the 'right' thing).
That's because they all use greedy O(ND) algorithms or equivalent.

But conceptually, no matter what the algorithm, the greediness is usually a requirement to maintaining the theoretical time bound of LCS based algorithms.

Patience trades the time bound for "better" results (patience is worst case O(ND^2).

Histogram is a neatly engineered and extended version of patience with an O(ND) timebound (and in fact, is faster than both myers and patience while providing good patience-like output).

Clojure (lisp) is even more fucky because adding a single outer form can re-indent (and therefore modify) an entire huge block of code.

At least on GitHub we have the ?w=1 URL parameter on PRs which helps a bit.

Protip: ?w=1 on GitHub is synonymous with git-diff's -w flag.
I find Araxis Merge especially useful in these times. This application has a feature to "set the synchronization link" at any place you want in the code. It is not automatic (or language aware), but once you realize the difference, you can 'fix' the diff in real time. That helped me a lot for big diff files (unfortunately in my current company we don't use Araxis :( )
diff --patience uses a different algorithm that works to optimise the diffed number of lines (it's less efficient though), which will probably solve your issue (the way the standard algorithm works is to essentially find common features on a first-come-first-served basis.
The two diffs have exactly the same number of lines here, the difference is which lines are selected as part of the insertion.