Hacker News new | ask | show | jobs
by thefurman 2052 days ago
Taking into account the history of how lines have changed isn't much better, sorry Anu. (Or if you think it is, please give some very compelling real world examples).

I believe that you need to understand the semantics of the code to truly do what you are trying to do well, and for all other cases the snapshot model is more than good enough and given how we structure and modify code, it works out really well in practice. Code dealing with a single aspect should and almost always is co-located, so to get a conflict of intention in a merge is very rare. There are other human aspects like code ownership and collaborating teams which makes the issue even less of a problem.

2 comments

I don’t think there are any open implementations of data type aware DVCS yet (would be glad to be proved wrong). However, I believe a reliable file/line DVCS based on sound patch theory would be a step in the right direction. A type-aware DVCS not based on sound patch theory would probably be a disaster.
I don't know about Anu (haven't looked at it yet), but with Pijul it would be perfectly possible to take advantage of semantic knowledge. Line-based changes is a default, but you could certainly apply file deltas based on a richer understanding of the underlying filetype.
I’m not convinced by this but I’m also not convinced by the argument of the comment you’re replying to. The theoretical foundation Pijul/Anu works by starting with files as lists of lines (or some other thing) and patches as (injective) mappings from one list of lines to another which preserve the relative order between lines, then constructing the smallest generalisation of this structure to one where all merges exist and are, in some sense, well behaved. This generalisation is from lists of lines to partial orders of lines, where “B is preceded by A” becomes “A<B”.

To do something similar with more structured files, one must find the corresponding idea to “a list of lines”, and this must work in a good way (e.g. changes like x -> (x); [a; b] -> [a] foo [b]; [[p, q], [r, s]] -> [p, q, r, s] must in some sense be natural operations in your structure (and diffs need to be reasonably easy to compute)). And of course it still needs to work in a sane way for unstructured data in big comments. Therefore I don’t agree that Anu would be easily generalised to this.

I think this is basically impossible to do for situations where you want to capture all the structure (such that a patch to rename something merges well with other patches). I think it’s likely extremely hard for a part way solution.

Finally I’m not convinced that the change would be that useful. Much of the structure of computer programs is implicit in the scoping rules in such a way that the “move blocks around” changes that line-based VCSes often struggle with will still be invalid with structural diffs.

This is the same underlying theory as the “operational semantics” that is used by Google docs to merge out-of-order changes by simultaneous editors and resolve into a single consistent shared global state. So take that as a proof of principle that it works for more complex structured information.
The underlying theory is not really the same. The practice is also not the same.

Google doesn’t need a different representation where all push outs exist because they rely on a centralised server, low latency, and arbitrarily choosing how to resolve conflicts. In a DVCS, you can rely on none of these.

In not sure if Google still used operational semantics for Docs, but that is not how operational semantics works. The theory allows you to take two quite different stacks of changes and interleave them in a consistent way. It does not rely on low latency or a centralized server. The choice of arbitrary tie breaker vs. manual resolution in the case of conflicts is an application domain choice not mandated by the theory. Obviously in the case of Docs the tie breaker makes more sense.
I think this is getting off topic as Anu/Pijul is not doing operational transformations (I assume this is what you meant when you wrote operational semantics).

I still claim that the reason OT works well with google docs is that it can rely on a centralised server, low latency and tie breaking.

Tie breaking means one doesn’t need to worry about representations of conflicts (and allowing changes to merge in sound ways) which is in some sense the main thing pijul does.

Low latency means that users are able to cope with the tie breaking rules doing the wrong thing

A centralised server means that there is less need for the merges to work in the sound way that pijul aims to make them work.

Therefore I put it to you that google docs is neither an example of the same theory that pijul is based on not evidence that OT would work for some kind of well-behaved structure-aware DVCS.