Hacker News new | ask | show | jobs
by MrJohz 1204 days ago
I wonder if this has more to do with how people visualise commits than branches per se. I think the git UI by default often encourages users to think of commits as diffs. A branch then is a bundle of diffs that, when stacked together, produces the current state of the repository. Internally, git uses some sort of optimisation to make calculating that current state quicker. In this mental model, it doesn't really make sense to talk about a pointer to a specific commit, because a commit without the rest of the information that makes up its branch is useless.

The problem is that this mental model isn't that useful in the first place, and often leads to confusion. Instead, it's usually easier to think of each commit as a complete snapshot of the entire codebase, that includes a link to the previous complete commit that it was made from (which in turn contains a link to the previous commit, and so on). In this scenario, a branch is just a pointer to a given commit - that's pretty much the easiest way to think about it - and the commit itself is a stack of history. Internally, git optimises for compression by removing redundant information between different snapshots.

In this mental model, thinking about branches as just a different type of tag is easier than thinking about it as a stack of commits, because each commit is already a stack of commits. Moreover, I think the snapshot model often ends up being clearer and easier to use overall. All you need is the basic concepts of snapshots, a linked list, and pointers, and the whole thing kind of just falls into place.

2 comments

A ‘series of snapshots’ vs a ‘series of diffs’ are just duals of one another.

Since you can go back and forth between them at will, it seems odd to claim that one perspective is inherently superior. Like insisting that a chess board is actually a white board with 32 black squares on it.

It doesn't seem odd to claim that one of them is on average easier for beginners to use for intuition than the other, and that introducing both models simultaneously may bring more confusion than clarity.

I don't see claims of inherent superiority or correctness. It's about what's useful for education.

Nobody is arguing that you need to adopt a new mental model if what you have works for you.

There are some subtle differences in whether the canonical representation is a snapshot of the current state of the repo or a patch applied to the current state. One simple example is reordering; assuming your modifications don't change the same line, reordering patches arbitrarily won't change the final result. If you instead store snapshots of the state at each point, then reordering the snapshots won't necessarily result in the same final state, since you might have moved a different state to the final position.

You're correct that the two models are equivalent, but version control is about operations that you perform on the models, and those operations will not be the same for both models. You can reason about your git history as if its a series of patches, but git itself doesn't know how to deal with any model other than snapshots.

Given that git commits are immutable, reordering doesn’t matter. Any history rewriting involves creating new commits - whether those are new snapshots or new diffs.
The snapshot model is the correct model. How data is optimized via compression techniques is secondary. Thinking in terms of "diffs" is incorrect.