Hacker News new | ask | show | jobs
by fulafel 3203 days ago
What's even the point of using rebase? Merging the development branch into your feature branch periodically is the obvious history preserving thing.

Git already has merge commits, that can be used to label and describe bigger sets of changes in retrospect. There is no need to rewrite the commit history with the benefit of hindsight, it only erases the record of how changes were arrived at, thus losing the opportunity to revisit conclusions from debugging or experiments.

You can also use merge commits to describe sub-units of work in your feature branch. Just rename your branch to some subtask and merge that into your feature branch.

edit: towards the middle of the article, the author also opines "What motivates people to rebase branches? ... I’ve come to the conclusion that it’s about vanity. Rebasing is a purely aesthetic operation. The apparently clean history appeals to us as developers, but it can’t be justified, from a technical nor functional standpoint."

6 comments

There's no such thing as "real" history - you don't commit every line of code you add or remove, or even every character. Rather, you choose some points in time to commit.

For me, those are often arbitrary - I can't get something to work at a certain point, so I make a WIP commit with the buggy work at that point, and will come back to it the next day.

Before I merge my branch back into master, though, I want my commit history to be useful. "This is the point where I went home or was disturbed that day" is not useful to future developers. "This is the work I did on this individual feature and everything that's needed to run it and to have the tests succeed is in this commit, and this was the reasoning behind what I did", however, is.

In other words, I rebase to divide my codebase into non-arbitrary units of code, not based on chronology, but on what is useful together.

Well, assuming that you are adding features in parallel you have an history. You have a branch with feature A and are about to build feature B, which is now based on A, someone else is building feature C which is also based on A. Building B and C might require different or equal changes to A.

There's no guarantee that any of these features will be built in order since they might have different priorities, difficulties or level of acceptance so it's hard to tell which one must or will be done in what order. Rebasing pretty much settles that while merging is much more sane to that workflow. It's harder to keep it functional yes, but it's enabling parallel development. Using rebasing and/or merge isn't a source control problem but a feature management problem.

Yeah I think my point was more that a blanket ban on rebasing is too rigorous. I agree that once you want to integrate B and C (or both into A), a merge is usually the best way to do it - and I think that's what TFA was actually referring to.

It fails to consider the case, however, of rewriting B's history internally, not as a way of integrating with A or C, but as a way of making its commits clear. Afterwards, you'd still do a merge of A into B and then merging B back once you see that's successful.

This makes a lot of sense to me. If you're maintaining a large project having lots of non-cohesive commits make it much harder to figure out what the logical changes were. I care about the set of code changes that were required to add a feature or fix a bug. I don't care about the set of changes that were required to get halfway to working code.

This is also potentially a big deal if you're doing maintenance bugfix releases for a project - that's way easier if porting the bugfix to an older branch just requires cherry-picking a single commit.

This is precisely why we rebase and squash. We maintain about 4 release branches at any given point and having everything squashed into one or two commits makes a bug fix on master simple and straightforward to downstream.

Although more frequently what we will do is do the bug fix on the furthest downstream branch and because we tag our branches with semver scheme we jus have our prep-for-deploy build automatically walk back up the branches back to master and attempt to merge the feature in along the way.

The major usecase I have for rebase is code review. If your code review stage happens at the point of merge/rebase/whatever into master, then rebase allows you to present the feature changes in a digestible way for the reviewers (split up into nice individual commits that make sense individually and are small enough to read and review without too much effort). The classic open source "send patches by email" workflow works this way.

The author is correct that a rebase may require resolving conflicts; but then those conflicts need resolving anyway if you choose to merge instead. It is also possible to miss a "semantic conflict" that doesn't make git complain but produces a commit that doesn't build -- you can put in tooling that checks that each commit builds individually to avoid the "oops, bisect on master isn't much use" issues.

Ok, if your set of changes is too large to review in one piece, and you put in the work of refactoring your changes into a speedrun-style best possible history so they are cognitively review-friendly, I grant that this is a valid use case for rebase. Though erasing the real history is still a serious drawback, and the unit of code review is still too large. Not many projects work like this, however.
You don't have to erase the whole history, just duplicate it.

What we do is to have a developement branch, with the real history. When ready for merge/review, we duplicate that branch, rebase -i (on the same origin) to clean it all up without making any changes -- changes aren't allowed, only history split then merge the clean branch in the dev one (no conflicts: no changes!) and THEN ask for reviews on that clean branch.

If there are further changes, park the 'clean' branch, continue working on dev branch as before, and make any changes needed (typically we use the 'autosquash' naming convention), and re-do the cleanup before re-submitting.

Once the review is complete, just merge again dev<->clean, and merge that in the trunk.

That way you have the whole history complete, and you also have a set of 'public' patches that have been reviewed and potentially can be upstreamed.

My definition of "too large to review in one piece" is "more than 200-250 lines", so most non-trivial changes benefit in my view from being split into clean patchsets. It is extra effort that's purely to reduce the code review workload, so it makes most sense when the project is very short of code review resources and the reviewers aren't all the same small set of people as the coders (ie not all working for the same company). Keeping a clean set of commits gets easier with practice though, especially if you do it as you go along rather than trying to do it all at the end. I like stgit for tooling that allows you to think of your branch as a stack of patches and avoid the horrible UI of raw git rebase.
On a given feature branch the history is only important if there is more than one review until the work is complete. What the developer does in between these visible/public events is almost always not important. For example in a "make work, make right" situation where the original work is thrown away and rewritten you definitely are not interested in the history.
Have you ever gone back to your (or someone else's) code from 6 months ago and needed to refresh your memory about what you tried and what didn't work, or why some of your test cases/data are what they are? Or why you commented out some test? It can be very helpful, I think this history is often important. And you won't know if a given bit of code history is important until you need it.
The main use case for rebase for me is to add automatic rebases on pull to the git config.

Normally if you pull from the remote and have local commits not in the remote, you'll have an "extra" merge. Automatic rebase on pull takes care of this, to avoid those useless merges.

They are not useless: they record when git used its fallible automatic merging logic to reconcile changes in two different branches. It works most of the time, but you really want to keep a record of it for later troubleshooting.

Have a look at https://github.com/git/git/blob/master/Documentation/merge-s... to get some idea what is going on behind the scenes in automatic merges.

git rebase gives that just fine, even demonstrated here: https://git-scm.com/blog/2010/03/08/rerere.html.
Surely our tools should simply have an option of hiding useless merges. It's a display problem not a reason to change history.
They do. git log --no-merges
Rebase is often required by open source projects. It makes that a few commits, once integrated, look like they are made directly on top of master instead of merged.
Periodic merges in which nothing interesting is happening are just useless cruft. Just spam the Git history.
Gee, sounds like a fun PR to review.