| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lichtenberger 1421 days ago

Really awesome work :-)

I've implemented the Fast Match / Simple Edit script algorithm almost 10 years ago for my Master's thesis[1] for my database project[1][2] in order to import revisions of files with a hopefully minimal edit number of edit operations between the stored revision and a new one (back then it was for XML databases).

The diffing was only one aspect for the visual analytics approach to compare the revisions (tree structures) visually [4]. Internally the nodes are addressed through dense, ascending 64bit ints stored in a special trie index. Furthermore, during the import optionally changes are tracked as well as a rolling hash is stored for each node optionally. After the import you can query the changes or execute time travel queries easily.

Technically, a tree of tries is mapped to an append-only data file using a persistent data structure (in the functional sense), COW with path copying and a novel sliding snapshot algorithm for the leaf data pages itself.

I always have the vision to implement different visualizations to compare the revisions in a web frontend, but I'm currently spending my time on improving the latency of both writes and reads.

Thus, if someone would like to help, that would be awesome :-)

Kind regards

Johannes

[1] https://github.com/JohannesLichtenberger/master-thesis/blob/...

[2] https://github.com/sirixdb/sirix

[3] https://github.com/sirixdb/sirix/tree/master/bundles/sirix-c...

[4] https://youtube.com/watch?v=l9CXXBkl5vI

1 comments

samokhvalov 1421 days ago

Comparing tree structures can be used to have diff of EXPLAIN plans. During query optimization, it might make a lot of sense.

link