Hacker News new | ask | show | jobs
by runyor 2688 days ago
This is unreasonable. First of all, often you don't need patches, so it's premature optimization. Second, git decided specifically for the actual-state version to have this key-object store, which means you throw away not just the disadvantages but the advantages as well. Last but not least, git ALSO STORES DIFFS when it packs stuff up, which is what happens if big amounts objects need to be transferred or big amounts of old data needs to be stored.

That doesn't mean pijul is a bad tool. I didn't check it out. And I really think for everyday coders who don't want to become git-gurus it would be nice to have a simpler git-like VCS, for instance.

But the marketing must be updated according to the facts. Try to find features that people really care about, e.g. ease of use, e.g. integration with build tooling and docker, e.g. better federation through more automation which makes centralized servers like github go away.

2 comments

The primary novelty of Pijul is its sound patch theory, not any kind of technical achievement like smaller repository size. I do think it's premature calling it unreasonable without understanding this aspect.
Why not summarize it a little if you feel that info got lost.

Technically if you need a patch you can generate it on the fly by comparing both objects (you can even do that in a bash script with `diff`, if you are willing to type in the logic to look up commit->tree->file->object-name first). So there shouldn't be anything lost. The only parameters I can see with storing diffs vs immutable objects is disk space vs processing time, which is also what git proofs by not storing old immutables and instead store diffs for old stuff (reducing space by increasing processing time).

No, it's actually related to the ease of use and correctness. By design, Git lacks two fundamental properties that would make working with repositories much easier:

1. Git merge is not associative. This results in cases where Alice and Bob work together, Bob pulls Alice's work, and her lines get merged into places she's never seen (blocks of text newly introduced by Bob).

2. More importantly, Git is not commutative. This means that cherry-picking and rebasing are complex operations that change commits' ids. This forces Git users to branch (an extra step that needs to be done before starting to write), but more painfully, it sometimes forces them to do multiple steps of rebasing before being able to merge what they want, or to solve the same conflict again and again after an unlucky cherry-picking.

Of course, all this could maybe be prevented by stricter planning and a more vertical organisation. But this is not how many people write code in 2019. Continuous delivery, for instance, means that teams no longer know long in advance what they are going to over the course of the project. Also, the best developers don't work in vertical teams where they get continuously told what to work on.

Sorry, I was in a hurry and I think I mixed up articles since I think another one recently appeared on HN which pointed to a page with a much better explanation.

The main idea behind Pijul (as I understand it) is that it makes merging divergent branches correct and predictable by making patch application an associative operation.[1] What this means is that that, given patches A, B and C, it should be irrelevant whether you are:

1. starting with A, applying B on top, followed by applying C 2. starting with A, applying the result of C applied on top of B all at once

In other words,

    (AB)C = A(BC)
This isn't always what happens in git because it doesn't work with patches on an abstract level. Instead, it always works with states of the branch heads (and sometimes the state of their branching point, i.e. BASE, in case of a 3-way merge).

As to why this matters, [2] and [3] has practical test cases where the difference of these two approaches is observable. [3] is an example of plausible C code where the non-patch approach may produce an incorrect result. It also demonstrates the way the patch algebraic approach is able to take individual changes into account and apply edits from the second branch in the correct place in the first branch, even though the affected code has moved in the first branch in the meantime.

The focus on patches has other important implications, such as the fact that branches then simply become sets of patches and cherry-picking retains the identity of patches instead of creating new, unique commits.

The other important aspect of Pijul is use of efficient data structures that are naturally suited to the problem in order to avoid suboptimal algorithmic complexity. This is explained in more detail here[4].

[1]: https://pijul.org/manual/why_pijul.html

[2]: https://tahoe-lafs.org/%7Ezooko/badmerge/simple.html

[3]: https://tahoe-lafs.org/%7Ezooko/badmerge/concrete-good-seman...

[4]: https://pijul.org/model/

I disagree with some of that. But first thanks for explaining more in-depth what is behind pijul. The main subthread I started because I didn't understand what it provides. If it can combine easier UX with more efficient diffing, then I think it is a very valuable contribution, actually.

Now some more in-depth bla bla if interesting:

> (AB)C = A(BC)

great feature request, I agree.

> This isn't always what happens in git

correct.

> because it doesn't work with patches on an abstract level. Instead, it always works with states

Incorrect though. States, patches, these are just trade-offs. You can represent either in the other completely. Like you can build a list using a tree structure if you just allow one branch. Or you can also build a tree on top of a list structure, if your traversal algorithm knows which item-index to pick for a certain subtree's children. All trade-offs.

That doesn't mean "Pijul does better diffs" would be wrong, though. It can still be the case. But it doesn't mean that git would need huge refactoring to also implement that better-diff-algorithm. In the end implementing this better algorithm in git might be trivial for a git core developer if you can explain to him how it works.

If you think about it a diff between two states has an unlimited way of being represented. And considering minimal steps to generate the diffs with adding lines to the diff and removing lines from the diff, the whole thing is an abstract tree. Basically to achieve associative patches one needs to make sure to always traverse this tree in the same order. Git traverses greedily though, using the very first diff that is good enough as a final result. Probably the idea behind this was also smart. Do it quickly for now, and optimize it if needed later.

> Incorrect though. States, patches, these are just trade-offs.

I still maintain it is correct to say that git at present does not work with patches on an abstract level. I do not mean to imply by this that patches cannot be recovered from states (they obviously can) nor that it would take a great refactoring of git in order to implement it in git.

On the contrary, now that Pijul has done the hard part of thinking about it and developing it into a theory, it would probably be a very useful addition to git, as you have noticed, if it can get mind share among git developers.

> Probably the idea behind this was also smart. Do it quickly for now, and optimize it if needed later.

The issue is of course whether the property of associativity is useful or not in a correctness sort of sense. IMO, the answer is yes, and I would gladly take a small performance hit in order to have this result.

Also, I've seen one of Pijul's authors claim that it could be made competitive with git with regards to merging performance (in terms of time). It is already quite quick. We'll see.

Another point made by one of the Pijul authors elsewhere in the thread is that Pijul uses novel data structures which enable it to also have commutativity of patches. I'm less clear on how this works, but it essentially means that branches in Pijul become sets of changes, not sequences of changes. In other words,

ABC = ACB = CAB = <any other permutation>

This is what I hinted at when I said commits retain their identity across rebases and cherry-picking. At that point, history stops being important and you only deal with changes as first-class entities. I feel this is enough to justify the claim that Pijul is fundamentally different from git by being patch-centric. I'm also not sure that this could be retrofitted to git as easily.

> Technically if you need a patch you can generate it on the fly by comparing both objects

Automatically generated patches might not work in the long run in certain situations (e.g. when standard 3-line context is repetitive, making the patch applicable at multiple places in the same file). Those patches also suck at conveying actual file changes (how many actual changes start with "- }" or contain lots of context braces?); that eventually prompted me to do lots of split commits in git whose sole purpose is to make automatic diffs more readable (e.g. separate commit for indentation fix).

That brings the question: can Pijul store user-formatted patches?

I'd say maybe check it out first, especially the parts about patch commutation and the problems with associativity in git merges and rebases.