| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by feanaro 2687 days ago

Sorry, I was in a hurry and I think I mixed up articles since I think another one recently appeared on HN which pointed to a page with a much better explanation.

The main idea behind Pijul (as I understand it) is that it makes merging divergent branches correct and predictable by making patch application an associative operation.[1] What this means is that that, given patches A, B and C, it should be irrelevant whether you are:

1. starting with A, applying B on top, followed by applying C 2. starting with A, applying the result of C applied on top of B all at once

In other words,

    (AB)C = A(BC)

This isn't always what happens in git because it doesn't work with patches on an abstract level. Instead, it always works with states of the branch heads (and sometimes the state of their branching point, i.e. BASE, in case of a 3-way merge).

As to why this matters, [2] and [3] has practical test cases where the difference of these two approaches is observable. [3] is an example of plausible C code where the non-patch approach may produce an incorrect result. It also demonstrates the way the patch algebraic approach is able to take individual changes into account and apply edits from the second branch in the correct place in the first branch, even though the affected code has moved in the first branch in the meantime.

The focus on patches has other important implications, such as the fact that branches then simply become sets of patches and cherry-picking retains the identity of patches instead of creating new, unique commits.

The other important aspect of Pijul is use of efficient data structures that are naturally suited to the problem in order to avoid suboptimal algorithmic complexity. This is explained in more detail here[4].

[1]: https://pijul.org/manual/why_pijul.html

[2]: https://tahoe-lafs.org/%7Ezooko/badmerge/simple.html

[3]: https://tahoe-lafs.org/%7Ezooko/badmerge/concrete-good-seman...

[4]: https://pijul.org/model/

1 comments

runyor 2687 days ago

I disagree with some of that. But first thanks for explaining more in-depth what is behind pijul. The main subthread I started because I didn't understand what it provides. If it can combine easier UX with more efficient diffing, then I think it is a very valuable contribution, actually.

Now some more in-depth bla bla if interesting:

> (AB)C = A(BC)

great feature request, I agree.

> This isn't always what happens in git

correct.

> because it doesn't work with patches on an abstract level. Instead, it always works with states

Incorrect though. States, patches, these are just trade-offs. You can represent either in the other completely. Like you can build a list using a tree structure if you just allow one branch. Or you can also build a tree on top of a list structure, if your traversal algorithm knows which item-index to pick for a certain subtree's children. All trade-offs.

That doesn't mean "Pijul does better diffs" would be wrong, though. It can still be the case. But it doesn't mean that git would need huge refactoring to also implement that better-diff-algorithm. In the end implementing this better algorithm in git might be trivial for a git core developer if you can explain to him how it works.

If you think about it a diff between two states has an unlimited way of being represented. And considering minimal steps to generate the diffs with adding lines to the diff and removing lines from the diff, the whole thing is an abstract tree. Basically to achieve associative patches one needs to make sure to always traverse this tree in the same order. Git traverses greedily though, using the very first diff that is good enough as a final result. Probably the idea behind this was also smart. Do it quickly for now, and optimize it if needed later.

feanaro 2687 days ago

> Incorrect though. States, patches, these are just trade-offs.

I still maintain it is correct to say that git at present does not work with patches on an abstract level. I do not mean to imply by this that patches cannot be recovered from states (they obviously can) nor that it would take a great refactoring of git in order to implement it in git.

On the contrary, now that Pijul has done the hard part of thinking about it and developing it into a theory, it would probably be a very useful addition to git, as you have noticed, if it can get mind share among git developers.

> Probably the idea behind this was also smart. Do it quickly for now, and optimize it if needed later.

The issue is of course whether the property of associativity is useful or not in a correctness sort of sense. IMO, the answer is yes, and I would gladly take a small performance hit in order to have this result.

Also, I've seen one of Pijul's authors claim that it could be made competitive with git with regards to merging performance (in terms of time). It is already quite quick. We'll see.

Another point made by one of the Pijul authors elsewhere in the thread is that Pijul uses novel data structures which enable it to also have commutativity of patches. I'm less clear on how this works, but it essentially means that branches in Pijul become sets of changes, not sequences of changes. In other words,

ABC = ACB = CAB = <any other permutation>

This is what I hinted at when I said commits retain their identity across rebases and cherry-picking. At that point, history stops being important and you only deal with changes as first-class entities. I feel this is enough to justify the claim that Pijul is fundamentally different from git by being patch-centric. I'm also not sure that this could be retrofitted to git as easily.