Hacker News new | ask | show | jobs
by gwright 3589 days ago
Opinions differ on this matter because of different concepts of what 'history' is appropriate to maintain.

At one extreme, you could keep track of all your keystrokes in the editor so that you could have a full history of your work including backspaces to correct typos.

On the other extreme is the mythical programmer who crafts perfect commits in exactly the correct order on the first attempt.

Most mortal programmers need the ability to iterate over their code to get it to a reasonable place before they want it enshrined in the blessed commit history that is shared with others or otherwise retained over time. The intermediary states, with false starts, poor implementations, hand crafted functions instead of standard functions, poorly named classes, and so on are all part of development but aren't particularly interesting to keep in the permanent history of a project.

Git's ability to manipulate the existing commit tree (amend, reset, rebase, etc.) is extremely useful for this normal 'exploratory' development. Once a stable point has been reached though (often because the tree has been published or shared with others), these commands do become inappropriate and a different set of tools becomes relevant (revert, merge, etc.).

2 comments

My two main arguments for "cleaned up history" are

1) Reviews are much more enjoyable when the commits reflect the final understanding of the problem rather than false starts etc.

2) Looking back through history is much more enjoyable when the commits reflect the final understanding of the problem rather than false starts etc.

I agree 100%. In my mind the 'stable point' I mention above isn't reached until after the review process is complete and any recommend changes are applied, which often involves fixups, merging commits, splitting commits, re-arranging commits, and so on.
couldn't there be an immutable approach to this, a folded view of history. Mark commits of value, hide the iteration ones. The log reflects the folded history first; if needed you can unwrap for full details.
One clarification: amend, reset, rebase and their ilk don't 'manipulate the commit tree' other than adding commits. The manipulation is with the branch names associated with the commits.

I've always hated the common description of 'rebase' as 'rewriting history'. None of the existing commits are modified by rebase, new commits are added and the branch names are shuffled around.

I think this is pretty pedantic. I count that shuffling as rewriting history - that's not what happens in the background but that's what appears to happen, and that is what matters. What would you term it instead?
What actually happens does matter. You can't understand how lots of git commands work if you don't understand that the git commit tree is an append-only data structure and that branches are just labels of leaves in the tree.
It doe rewrite history in the sense of which events followed which events.

Imagine the following sequence of events:

I make a commit on my local master

Someone else makes a commit on their master

They push

I 'pull --rebase'

That history now shows their commits before mine in the history, even though I made my commits first, directly on top of master.

Let me see if I can clarify. Here is a summary of the situation you described before any push or pull.

    origin repo: M (master)
    your repo:   M---C1 (master)
    other repo:  M---C2 (master)
    
 If other pushes `master` to origin we have:
 
    origin repo: M---C2 (master)
    your repo:   M---C1 (master)
    other repo:  M---C2 (master)
If you then run, from master, `pull --rebase` we have:

    origin repo: M---C2 (master)
    your repo:   M---C1
                  \
                   C2---C1' (master)
    other repo:  M---C2 (master)
 
Your master branch will be positioned at C1'. As you can see from the diagram, the `pull --rebase` didn't change any existing commits, it just added C2 (same SHA as in the origin and other repos) and added C1', which are the changes in C1 applied to C2 instead of M. If those changes can't be made automatically, you'll get a conflict that has to be resolved before C1' can be created.

I don't think it is helpful to describe this as adjusting the order of commits or re-writing history or any similar language that suggests some sort of mutation to the commit tree. The only thing that has happened here is that additional commits have been added to the tree and the label `master` has been moved to a new leaf commit.

I realize some other commenters have said I'm being pedantic but I would instead say that I'm being accurate. You can't really understand how rebase, rebase -i, rebase --onto, fixups, reflog manipulations, and so on work if you don't have the correct mental model of the git commit tree.

Given that commits are immutable objects, the only sane interpretation of "rewriting history" is that it rewrites your view of the history rather than somehow rewriting immutable content-addressed objects.
Sure, but lots of people using git don't really understand that commits are immutable or that the commit tree is an append-only data structure. The pervasive use of the phrase 'rewrite history' hinders this understanding.
I disagree. I've used Git for a long time, and talked to a lot of Git users, and I've never seen anyone say something that implied they thought they were literally modifying the commit objects, as opposed to rewriting the history of a branch.
Rebasing also rewrites all your commits to have a different parent commit.
Nope, parent is correct. If you use hg's changeset evolution and run rebase, and then do a hg log --graph --hidden you can see that your original commits have not been touched, other than to mark them as hidden and obsolete.
This is a discussion about git.
They both work the same here, I just used hg because it illustrates the inner workings nicely. With git you don't get the nice hidden commits view, just the refleg (which is trying to show you the same thing).
You can see hidden commits with git log --reflog.