Hacker News new | ask | show | jobs
by James_K 493 days ago
It seems odd that they boast about being patch based, given that the original advantage of git was that it uses snapshots instead of diffs as previous VCs did.
4 comments

Even if previous VCSes (edit: other than Darcs) used diffs internally to store changes efficiently, they were still inherently snapshot based.

I think Git's advantage was mainly from using Merkle trees rather than specifically that it was snapshot based.

To be clear, darcs is neither diff nor snapshot based.
What is it based on then?
Git (and nearly every other version control system) the state of the repository is an ordered list of changes. This is massively oversimplified of course, as in git it is a tree rather than a list as in CVS/SVN, and git is actually storing snapshots rather than diffs, but whatever. That doesn't matter here. Traditional version control is an ordered list of changes, and that ordering is what matters here.

In darcs/pijul, the state of the repository is a *set* of order-independent changes/patches. I don't know about darcs, but pijul even uses clever homographic hashing so that the commit hash after applying patches A, B, and C (in that order) is the same as if you did C -> A -> B, or any other permutation.

In Linux kernel dev, for example, developers think of a particular build as "upstream 6.13.2 with zfs and realtime patchsets applied" but in reality in any given instance it is actually "6.13.2 -> zfs -> realtime" or "6.13.2 -> realtime -> zfs" depending on which order you cherry-pick/rebase the patches. Both have different git hashes and are properly speaking different things. In pijul it would be "6.13.2 + zfs + realtime" and the order doesn't matter.

It's based on "patches" (which was what the original comment was referring to).

A Darcs patch is somewhat like a diff, in that most patches will describe a sequence of text to be removed and added from specific points in each file.

But: Darcs patches add:

- The ability to work out precisely how to apply the same patch in a different context in a reproducible way (or to refuse to do so if it can't do it safely) - More "semantic" kinds of change such as replacing all occurrences of a token in a file, or renaming a file. In both cases this will merge cleanly and reproducibly with other changes to the same file.

To me this has always sounded like the kind of error-prone excessive cleverness that I wouldn't want anywhere near my version control system.
You could always go back to emailing around file changes :-)

Most VCSes including git have some level of complicated algorithms behind them. Darcs does have significant weaknesses when it comes to handling conflicts but the core logic of figuring out how to apply changes to files has been pretty solid for its entire existence.

It’s the same thing that happens every time you rebase or cherry pick commits in git.
It's a thing that nobody is smart enough to use and didn't actually work, but hey at least Darcs is also 1000x slower than git and will lose data!
This isn’t about snapshot vs diff. It is about tree of changes vs. bag of patches.
"In our dependency graph you can also see that all our changes concerning the files A and B are totally independent from each other. So I could pull only the changes concerning B from the repository while ignoring the changes regarding A altogether. This is an incredibly powerful mechanism that snapshot based version control systems do not have. Now you can pull in a set of patches and ignore all those that don’t depend on the change you are currently interested in. Other version control systems like git for example call this cherry picking, but compared to darcs they are suffering from some short comings. In those version control system you re-record the patch when you cherry pick it. That means that its whole identity changes, one way in which this manifests is that it will get a different hash. So even though it’s the exact same change the patches are now different. This can become quite annoying when working in a distributed setting, with darcs this is a complete non-issue."
The probability of working software resulting from this operation is approximately 0.
No. The advantage of Git (and Darcs, Hg, and others) over SVN/TFS is twofold:

1. A much better merging strategy which remembers previous conflict resolutions. This is possible in both diff and snapshot VCS's.

2. Distributed repositories. Instead of storing history in one centralized server, which is prone to several kinds of failure, Git/Darcs/Hg/Fossil etc. all replicate history to each client by default. This makes it very difficult to actually lose code or history, which definitely happened in older centralized systems.

> which remembers previous conflict resolutions

IMHO this is far from the truth. git is a mess with conflicts and nothing is remembered. When you rebase an old branch a few times, you stumble upon merge conflicts every time you have to rebase that old code. It's really infuriating.

I think git became popular for 2 reasons: cheap branches, and Linus Torvalds.

> The git rerere functionality is a bit of a hidden feature. The name stands for “reuse recorded resolution” and, as the name implies, it allows you to ask Git to remember how you’ve resolved a hunk conflict so that the next time it sees the same conflict, Git can resolve it for you automatically.

https://git-scm.com/book/en/v2/Git-Tools-Rerere

Git does indeed remember conflict resolutions (if the ReReRe feature is enabled), but in practice it's only helped me a handful of times.

(And I am big on obsessively rebasing and cleaning up history before pushing a change.)

Ah, you're a rebaser. That's why you have this issue. As someone who prefers merge commits, this has been entirely a non-issue for me.
Rebasing is very important for people who don't understand how git works.
Merging works fine, as long as Git knows how to resolve the ancestry. Rebasing breaks that (save for hacks like rerere).
Cheap branches are important. Git is no worse and in most cases far better than any other VCS when it comes to conflicts.

If you are getting conflicts over and over you should change your pattern of use to stop causing them, it's quite possible to do.

You say that as if using git isn't a team sport. I only have control over my behavior, and that goes double for what happens after $(git pull) or its $(git rebase) friend
If only there were a way to influence the behavior of other humans...

I've never had any major frustrations with 'lots of conflicts'. There's something peculiar about what you are doing.

I would sit down and write down what the 'features/benefits' are of your current git workflow. Then you can brainstorm about things you can change about what you are doing to get those benefits without the frustrations (like maybe you don't need to rebase as much or at all, or you can avoid rebasing work branches and just squash the final commit on top of main, I don't know what you are doing but my point is there are endless variations of how to collaborate on git and lots of them don't cause 'lots of conflicts'), and then work with your team to get them. When you are successful they will thank you for making their lives less frustrating. Or you can be secure in the belief that there's nothing you can do, either way.

Being able to merge code is important, distributed repositories is nice but nobody switched to get that.

Apart from making merging much much less painful, the huge selling point was that SVN was extremely slow. Changing branches required chatting with the server in most cases, and could take a minute or two. You couldn't just switch back and forth between branches.

"Changing branches required chatting with the server in most cases, and could take a minute or two."

Which is directly solved by being a DVCS, yes. Your second paragraph contradicts your first.

Darcs and Pijul can change branches very rapidly despite being diff-based, because they too are DVCS (and the inverse patches basically already got computed upon commit, so it's easy on the CPU).

But Darcs replaces slow branch changes with something that no other VCS has (that I'm aware of), slow commits! Making a commit can take 5 minutes!