| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jimrandomh 3824 days ago

A personal pet peeve of mine when reading diffs, is when a file has some functions and you insert one and instead of looking like this:

     int someOldFunction()
     {
         // Function body
     }
    +
    +int newFunction()
    +{
    +    // New function body
    +}

It looks like this:

     int someOldFunction()
     {
         // Function body
    +}
    +
    +int newFunction()
    +{
    +    // New function body
     }

It's a small thing, but given that these diffs are equivalent, the one that balances the curly braces within added blocks should be favored. But diff utilities seem to get this pretty consistently wrong.

6 comments

Klathmon 3824 days ago

It is a small thing, but it throws me off everytime i see it, and then it takes a few seconds of looking around before it dawns on me what happened.

The "user" in me would love a language aware diffing (and merging) system, but the developer in me is already groaning about how much work that would end up taking for arguably not that much benefit.

alexvoda 3824 days ago

Maybe this will help: https://www.semanticmerge.com/

jxramos 3824 days ago

I really like SemanticMerge, it makes total sense, although the diff experience is unlike other diff tools in terms of immediacy. But I think it will be another great tool to add to the arsenal when you need to pull out the big guns on crazy diffs.

m_fayer 3824 days ago

Absolutely. I had it installed for some time and mostly thought "this is really neat but I could live without it." And then there was a monster merge with significant conflicts all over the place. I don't think it would have been otherwise possible to pull it off with as little damage as there was if not for this tool.

ddimitrov 3821 days ago

I really like SemanticMerge, in its current shape it is litte more than a fancy proof of concept.

Its biggest limitation on real projects is that it works on a single-file level, while all the interesting stuff happens on patch level. You may browse their forums to get an idea of what else is missing.

That said, I wish all the best to Codice and I really hope that they continue to invest in this tool.

lobster_johnson 3824 days ago

Looks a lot like Eclipse's semantic diffing. I haven't used Eclipse since 2006, but it was better than anything I have seen since.

phaed 3824 days ago

I would love to press that "Buy Now" button, but I just can't seem to get past the emboss effect on that CTA.

digikata 3824 days ago

I think it might be easier to get to a syntax-aware diff if one approached it reusing the language specific syntax highlighting specs used in various editors. I've almost sat down to start that myself a few times.

skybrian 3824 days ago

I don't think it has to be truly language aware. A diff tool that looks for matched braces, quotes, and indents to figure out where the blocks are would do better most of the time.

oneeyedpigeon 3824 days ago

It doesn't have to be fully language aware, but it has to understand most, if not all, of the syntax. As soon as you start trying to match braces, you need to handle strings, comments, and probably a whole bunch of stuff I'm not thinking of.

Here's an idea: wouldn't it by trivial for <insert your favourite language compiler here> to expose the AST of your code, and solve this problem the easy way?

stormbrew 3824 days ago

    git diff --patience

might give you better results? I've seen this pattern too, but I can't find a reproduction of normal git diff giving it to me at the moment.

stock_toaster 3824 days ago

I use this in my ~/.gitconfig

   [diff]
   algorithm = patience

paulirish 3824 days ago

I've experimented with patience diff, but not seen it deliver reliably superior results than myers (the current default).

aidenn0 3824 days ago

I saw it deliver superior results enough that I spent a while figuring out how to get vimdiff to use patience.

fredmorcos 3824 days ago

And? What do I need to do to get that? :)

aidenn0 3820 days ago

Oh, just saw this, it involves invoking the proper git tools to get the diff, and then converting the diff format from unified to ed format. The later is actually easier than it might seem as unified diffs start all of their special information in column 0; IIRC I wrote an awk script to do this. It's on my work machine though, so I don't have it handy.

FaceKicker 3824 days ago

I tried GP's example, and in this particular case both --patience and the default (Myers) work the same, both doing the thing you want them to. Which perplexes me, because I know I've seen the bad case too, but can't seem to find a minimal example of it (I tried a couple variations on the example; they all did the 'right' thing).

DannyBee 3824 days ago

That's because they all use greedy O(ND) algorithms or equivalent.

But conceptually, no matter what the algorithm, the greediness is usually a requirement to maintaining the theoretical time bound of LCS based algorithms.

Patience trades the time bound for "better" results (patience is worst case O(ND^2).

Histogram is a neatly engineered and extended version of patience with an O(ND) timebound (and in fact, is faster than both myers and patience while providing good patience-like output).

whalesalad 3824 days ago

Clojure (lisp) is even more fucky because adding a single outer form can re-indent (and therefore modify) an entire huge block of code.

At least on GitHub we have the ?w=1 URL parameter on PRs which helps a bit.

wldlyinaccurate 3824 days ago

Protip: ?w=1 on GitHub is synonymous with git-diff's -w flag.

piyush_soni 3824 days ago

I find Araxis Merge especially useful in these times. This application has a feature to "set the synchronization link" at any place you want in the code. It is not automatic (or language aware), but once you realize the difference, you can 'fix' the diff in real time. That helped me a lot for big diff files (unfortunately in my current company we don't use Araxis :( )

cyphar 3824 days ago

diff --patience uses a different algorithm that works to optimise the diffed number of lines (it's less efficient though), which will probably solve your issue (the way the standard algorithm works is to essentially find common features on a first-come-first-served basis.

masklinn 3824 days ago

The two diffs have exactly the same number of lines here, the difference is which lines are selected as part of the insertion.