|
|
|
|
|
by vcmiraldo
1807 days ago
|
|
That is a really difficult problem for more reasons than what fits in this comment :)
In fact, I got my PhD studying this very problem (https://victorcmiraldo.github.io/data/MiraldoPhD.pdf). I did not find any description of how your diffing algorithm works nor how you represent a patch. I'd be really curious to know more. |
|
Difftastic does not create a patch or worry about merging. That's a hard problem that I'm not trying to solve. Instead, it builds two ASTs, then marks each node as unchanged or novel.
To compute the diff, I use a graph search. Each vertex represents a position in both the left and right ASTs.
Suppose you're comparing A with X A.
Start node:
The possible next nodes are:(1) Treat A on the left as novel.
(2) Treat X on the right as novel. Both (1) and (2) are the same 'distance', but (2) is closer to the end node, because there's a edge from (2) to the end that marks A as unchanged.I've implemented this using Dijkstra's algorithm. My graph is directed and acyclic, so there are faster algorithms like topological sort. However, I don't construct the whole graph in advance (that would take O(N^2) memory) so instead I construct the graph nodes as necessary.
(This is very similar to Autochrome, which I've linked in the README. Autochrome has a worked example which is really helpful.)
At some point I think I'll have to use A* search instead. If there are more than 500 lines of code with lots of changes, difftastic can take a few seconds to terminate due to the naive graph search.