Hacker News new | ask | show | jobs
by dan-robertson 1807 days ago
I think the diffing is the “obvious” graph search algorithm between trees, where a “tree” is a list of atoms or trees (think lisp lists).

Basically to diff a tree of n top-level elements against one of m elements, construct a graph where nodes lie on an (n+1)x(m+1) grid. Each node (a,b) corresponds to having looked at a elements of the first and matched them to b elements of the second list. Add edges (a,b)->(a+1,b) for deletion; (a,b)->(a,b+1) for insertion; and (a,b)->(a+1,b+1) for an inner diff (ie basically this graph search problem again). Choose weights to apply to node and now find the shortest path from (0,0) to (n,m).

1 comments

From you description it seems like we're just computing the standard insert-delete tree-edit-distance. These tend to be slow.

This implies that the patch language only supports insertion, deletion and modification of nodes, which is a shame since refactorings, moves and duplications are also common operations in the source-code domain. Additionally, if the patch language only supports insertion, deletion and modification, the merging algorithm will perform poorly.

Yep, that's a fair description. Note that I'm not providing a merge algorithm, just a pretty way of viewing changes.

I did look at modelling moves in an earlier prototype, but it's incredibly hard to display the result in a coherent way when there are also insertions. It was also easier to drop it when I moved to Dijkstra.

As you can see in the screenshot in the readme, it does support inserting tree nodes whilst preserving children, which covers a ton of cases.