Hacker News new | ask | show | jobs
by gpspake 1041 days ago
I think this is a good guide. Git tends to be an emotional topic for a lot of people - myself included - and the hill I dramatically die on time and time again is that I think, tragically, this is about where developers stop learning git. I think operations like rebase, cherrypick, and squash are just as important as some of the ones you first encounter. Especially when you're working with other people. I use them every day and I see some of the spaghetti experienced developers pile on the graph because they only know merge. I guess my point is "don't sleep on rebasing" :)
2 comments

Git is one of those tools that exposes so much of the underlying infrastructure that people just can't help diving in and making their own lives so much more difficult.

After using git for well over a decade, I'm completely convinced that if you find yourself frequently rebasing/cherry-picking/reflogging you're using git wrong.

rebase and cherry picking are both cornerstones of trunk based development workflows, and those have proven to be extremely successful in my experience, vs other methods (like Git Flow, the GitHub overly simplistic branch per feature and merge approach, which feels like trunk based but isn't etc.)

rebase makes roll backs extremely easy if you need to roll back specific commits because of bugs and makes releases easier via cherry picking (so you don't slow down trunk merges just to do a release) and allow for fine grained continuous deployment that is harder to achieve than without it.

It is my experience however, that either everyone needs to rebase or you end up with issues eventually when only some developers are and other ones aren't.

I don't care as much for squashing myself as a general case, as you lose fine grained per commit rollback strategies though

Blindly squashing every branch into one commit is lame and done by people who have either never had to bisect a bug or too lazy to figure out how to properly rebase. Where squashing is important is turning a work in progress branch into a series of commits for the master branch. There shouldn't be any commits fixing your own shit, for example. I don't want to see one commit where you do the work then five commits fixing your own work. Nobody needs to see that. It only makes things worse.
> It is my experience however, that either everyone needs to rebase or you end up with issues eventually when only some developers are and other ones aren't.

The only time I merge is when I'm working on a shared remote branch. I haven't found a workflow (although I'm all ears if you have any suggestions).

Here's my current workflow:

1. write some code on a local branch

2. upstream has new revisions? rebase my branch on top

3. if not finished with my task yet, go to 1

4. if ready for review, open PR

5. if accepted, squash and merge

6. if changes are requested, write more code

7. upstream got more commits causing a conflict? don't rebase! it will screw up the PR history on GitHub and can cause issues for reviewers who might've checked out your branch locally and maybe done some experiments. merge upstream into your local branch. then you can push fast-forwardable commits.

8. push new commits to PR and go to 5

I used to think of rebasing as just rewriting commit history. But now I also think of it as altering the history of collaboration that is captured in a PR. So I switched from rebasing onto new upstream base branch commits and force pushing to PRs that already had reviews, to merging in new upstream base branch changes. I only do this after someone else has done anything on my PR; if I open it but nobody has reviewed yet, I'll do the rebase/forcepush to keep it current until someone does.

I prefer squashing to merge because I prefer the default branch to have one commit per unit of collaborative work. The way different people split up commits on a branch is arbitrary and varies widely; you'll never get more than 2 engineers to agree on a convention here. Keep all the messy stuff in the PR, and you can always revert one of those individual commits if you want finer-grained rollback. If you want a PR to have generated more than one commit, then it should be more than one PR.

> I only do this after someone else has done anything on my PR; if I open it but nobody has reviewed yet, I'll do the rebase/forcepush to keep it current until someone does.

I believe the only reason to do so is GitHub's lackluster PR UI. Force-pushing with an updated version of a branch after a review works reasonably well with GitLab's MRs.

> I prefer squashing to merge because I prefer the default branch to have one commit per unit of collaborative work.

There's no reason to squash when you can create merge commits from fast-forwardable state instead (again, one of the easily achievable options in GitLab's UI; GitHub doesn't make it easy AFAIK). This way you don't lose commit granularity while you can still obtain the "one commit per unit of work" view with simple `git log --first-parent` (or do the opposite and skip the merge commits with `git log --no-merges`).

> Force-pushing with an updated version of a branch after a review works reasonably well with GitLab's MRs.

The problem I run into is that other people have different workflows.

If they `git checkout remote/branch`, then everything's fine. But if they want to make a local copy of the branch, it'll get all messed up if I force-push. And I only want to adopt practices that are as robust as possible in the face of the possible ways other people could work.

> There's no reason to squash when you can create merge commits from fast-forwardable state instead

Sure there is. Less noise commits.

My workflow is almost identical to yours except step #2. Why rebase on main when you can just merge from main? It's much simpler, less likely to get hairy merge conflicts. If you're going to squash your PR anyway, the end result is identical.
I really don't get all these people who insist on usingrebase instead of merge. Who wants to spend time resolving meaningless conflicts?! Every time I try it, I instantly regret it.
I like being able to see the graph with my commits lined up on top of the latest main revision. It helps me order my work in my head. There’s probably a bunch of other ways to visualize that, I just haven’t learned them yet.
I agree that if you're having to use the reflog frequently, you're using git wrong (not least of which because the reflog is not designed for readability and understanding the context where it came from).

But for the rest? If you're working in a repo with more than 5 people, rebase, cherry-pick, and squash are necessary to keep your sanity. Merge nodes are awful once you get beyond more than maybe 3 developers.

Someone else pointed out that it's probably confusing that I didn't mention that I do religiously squash my branches before committing, so we still have multiple developers with a clean main branch and no merge commits.
I usually use reflog to answer the question "What was the name of the branch I was just working on again?"

What's supposed to be wrong with that?

That's perfectly reasonable IMO.

If you just need to checkout the last branch you can also `git checkout -`

git config --global alias.recent for-each-ref --count=20 --sort=-committerdate --format='%(committerdate:short): %(refname:short)' refs/heads/

use it like

    > git recent
    2023-08-07: jimkubicek/add-journal-table-creator
    2023-08-07: main
    2023-07-28: backup/git-squash-to-main
    2023-07-28: backup/cleanup
If you’re doing collaborative trunk based development then you’re only cherry-picking. So far Dave Farley is the only person I’ve ever heard advocate for this but it does have its place in the universe. Cherry picking is not destructive to history fwiw.

There’s absolutely nothing wrong with rebasing/squashing/amending/resetting heads on personal feature branches. In fact, it’s a pretty good practice if you make messy history and can make PRs less of an eyesore. I think the confusion comes up about when destructive history operations are appropriate because the git cli client does not have a concept of protected (shared) branches vs feature branches.

As long as you keep history destructive operations away from shared branches, you’re good.

In some cases, you're still good even when rewriting shared branches.

At work we're maintaining a downstream Linux tree with a few hundred patches on top of mainline. The tree gets frequently rebased on top of new upstream releases, and some changes are being progressively upstreamed. It's much easier to reason about the remaining downstream changes and deal with conflicts when rebasing than when merging upstream releases back into the downstream tree. Of course you can't expect to be able to carelessly `git pull` in such workflow, but if you're working with people who actually know how to use git it's not really a big deal.

Naturally, this particular project uses a special workflow that fits its needs. It doesn't usually make sense to rewrite shared branches in projects where you're the upstream.

I don't understand this:

> If you’re doing collaborative trunk based development then you’re only cherry-picking.

All my work is collaborative trunk based development, and I never cherry-pick.

> There’s absolutely nothing wrong with rebasing/squashing/amending/resetting heads on personal feature branches.

I agree that there's nothing _wrong_ with it, just that it's unnecessary. If your branches are focused on a single feature and you're always squashing your PRs to main, the cleanliness of the branch while you're working on it is unimportant.

> always squashing your PRs to main, the cleanliness of the branch while you're working on it is unimportant.

I'm personally not a fan of always squashing, for large features you lose a lot of history. I like a merge commit in some cases, you can still undo everything easily and most git commands support --first-parent so you can "pretend" everything was squashed in certain cases. But when you're got blaming, you have a lot more context to go off.

I use rebase multiple times per day. Mostly for putting my changes on top of the latest development branch, but also for squashing commits.

I'm curious why you don't like it?

My branches are always focused on a single atomic change†, so if I want the tip of my branch to be up-to-date with main (or the dev branch or whatever), merging from that branch accomplishes the same thing with a lower likelihood of conflicts.

I always squash‡ before pushing a PR, so the end result is identical to a carefully rebased PR.

† occasionally branches will need to be split into separate commits, but that's not my default working style

‡ I know `squash` is a rebase under the hood, but it won't ever result in conflicts, so I'm happy to use it with every PR

I think you'd get a lot less pushback if you mentioned that you squash every branch before merging in your original comment. That actually seems like a pretty good policy if you can keep your branches relatively small.
How/why do you use rebase for squashing?
Rebase is pretty much just an automation for cherry-picking and squashing, and interactive rebase is the primary and most convenient way to put the branch you have worked on into shape before presenting it to someone else.

When I hear about people not using rebase in their daily workflow, I imagine myself 10 years ago when I barely knew git and couldn't really use it as a helpful tool like I do today. It's almost like looking back to before I started using VCS in the first place - somehow I did manage to not use one for years (even collaborated via FTP!), but now it seems impossible. Usually most of the useful magic with git happens before anything gets pushed out, and `git rebase` has a huge part in it.

Use the interactive rebase. It shows a list of commits which you can reorder, squash, remove or edit.

Reordering is pretty powerful. If you made a mistake, commit the fix, then move it to the commit where you introduced the mistake, and squash. Removing broken commits makes `bisect` nicer to use when you're desperate enough to use it.

Obviously don't do this on commits you've already published.

> I'm completely convinced that if you find yourself frequently [cherry-picking] you're using git wrong.

A previous employer had a multi-tenant application that was deployed as a client-specific application which loaded the "core" as a dependency. They didn't really know how to do versioning and most version changes were just arbitrary "I feel like we should call it 1.8 now".

At one point I ended up maintaining a client-specific branch of the core dependency on version 1.10 (branch was 1.10-$CLIENT) while the "main" branch was 2.3 or something. For context, it went 1.10 to 2.0 because general cognitive dissonance.

This meant any change that needed to be made in the application core for this particular client also needed to be cherry-picked in some direction, usually by making the change on the client branch and cherry-picking it back as necessary. In some cases another client -- naturally, they would be on a separate branch like 2.3-$CLIENT -- also wanted that change so it needed to be cherry-picked again to that branch.

The result was a minimum of two PRs, one a cherry-pick of the others' commits (one commit unless I felt like spending my time in self-loathing), that I would make for every change. Not knocking cherry-pick at all; it's wonderfully useful when used correctly. That's just the result of non-technical decision-makers making decisions about technical tools.

On the plus side, I learned a ton about git in that job.

.... Why not rebase before merging into main?
Resolving rebase conflicts is technically and conceptually much more difficult than resolving merge conflicts, with the added bonus that rebasing can sometimes force you to resolve conflicts for each commit in your branch.

Here's how I think everyone should use git:

1. Create a new branch for your changes 2. Make commits and merge from main with wild abandon 3. One final merge from main 4. Squash everything into a single commit, push a PR

If you keep your branch focused on only a single change, the end result is a tight, focused, single commit PR that merges cleanly into main and didn't involve any complex or error-prone shenanigans.

> Resolving rebase conflicts is technically and conceptually much more difficult than resolving merge conflicts

The opposite is true: resolving a conflict during a rebase is much easier, as you get to resolve the conflict in the context of a single commit and its parent. In some cases it may end up being more work than resolving a whole merge, but it's much easier to reason about.

This is a very common workflow for larger OS projects, and I think it translates really well to corporate environments too. It reinforces some work/feature discipline and gives you a nice clean history.
So, I get the "squash your feature branch into a single commit before merging upstream", but what does doing "git merge main" instead of "git rebase -i main" give you? (Assume I have my global git config to remember conflict resolutions via rerere)
This is how I do it. Almost never touch rebase. Also, I hate git :-)
I love git, but I also never touch rebase.

It's there when I need it, but I also work in such a way to never need it.

I'm a big fan of that practice but I get the impression that rebasing scares a lot of devs that either didn't take the time to learn git or are still recovering from that one time that their change got too far away from mainline. That latter reason is why I prefer the practice actually...
Or they were taught Git the wrong way, by memorizing a bunch of commands, as in TFA.
When in doubt, git diff main>~/patch.out

(... && Git checkout main && git pull --rebase && git checkout -B clean_branch && git apply ~/patch.out)

(I like the light rhyming of the first part)

I can’t imagine using VC for exploratory programming without rebase or something equivalent. I don’t want to bother writing a meaningful commit message for a change I’m probably going to throw away. I also don’t want to push a history like “WIP, WIP, works now, broke again, WIP” and that’s what it looks like at a first pass when I’m moving quickly.

Instead I squash away the garbage and push out a reasonable looking chain of commits with nice descriptions.

> I'm completely convinced that if you find yourself frequently rebasing/cherry-picking/reflogging you're using git wrong.

A lot of people want to use git as a checkpoint/backup system, and commits and associated changes reflect that. The rebasing/cherry-picking/reflogging is one way to update the set of commits on the branch in order to make a set of meaningful commits for the feature branch they're working on.

I've been using Git for the same length of time, but I have not reached this conclusion. That's the problem with teaching someone how to use a very powerful flexible tool that accommodates a variety of workflows and styles: different people use it differently.
As long as you never ever cherry-pick from one branch to another when the source branch is intended to eventually be actually merged (directly or indirectly) into the destination branch I think it has its use cases.

If you break this rule you could be in for dealing with some atrocious merge conflicts though, so I try not to do it unless the branch I'm cherry picking from is a definite actual dead end (e.g. the change was an urgent hotfix against an old release branch and your workflow doesn't involve merging those back into main/master).

Is that really an issue? When you rebase, git automatically figures out that you've cherry-picked something and will skip it.

I will occasionally chery-pick something from master, do my work etc. Before making my PR, I'll rebase against master and potentially squash/reorganize my commits. When the PR eventually gets merged to master there aren't any problems.

I don't think I ever merge without rebasing though, so maybe rebase has been saving me from any potential problems.

Yeah I very rarely rebase, just autosquash PR commits into one on merge to master (and also delete the source branch to avoid similar headaches) + making sure PRs are fairly small and focused. Regular merges where commits have been cherry picked from one side to another, and then later also unmerged changes have touched those same files tend to result in a lot of spurious merge conflicts.
How do you pull a critical fix across branches w/o the occasional cherry pick? What do you do instead?
If you use those commands in your local repo to keep the central shared repo clean then yeah, rebase and friends are great.