Hacker News new | ask | show | jobs
by andrewvc 1310 days ago
Rebase should never be used. Or, if it is used, it should be treated as a dangerous thing to do that’s well outside the norm.

Most of the arguments in favor of rebase are by people fanatical about having a git history organized just so. It’s not worth the headache and effort. PRs are a better unit of work than commits in practice.

Configure GitHub or whatever you use to squash merge only and you’ll be good.

Since moving to this workflow I’ve had zero issues losing data due to a confusing git situation.

12 comments

So you're the one making me code-review 10000-line PRs because you just dumped your WIP branch — with three PRs' worth of code, plus formatting changes — directly into a PR, rather than factoring apart said WIP branch either during or after the fact.

The designed unit of a (distributed) git workflow is a patch — i.e. a locally rebase-squashed set of cherry-picked commits from development branches, with `git reset --soft` + `git add -p` (or even `git format-patch` + manual editing) used to prune the patch to a minimal size. Everything you do in your local repo should be with the intention of producing readable patches for code review (whether that patch is then done via PR or mailing list.) It's not about creating a pretty history (retroactive); it's about making it easy on the people who will discuss and reformulate your changes, one at a time, before accepting them upstream.

To be clear, you can do whatever you want when your git workflow isn't distributed (i.e. if you're committing to only your own private projects, not proposing changes to other people's projects.) But if your workflow isn't distributed, then why be opinionated about git? You can simplify your life at that point by using something with central-repo-oriented semantics, e.g. Subversion. There's no rebasing in Subversion. :)

If you regularly need to review 10000 lines of code per PR your dev workflow is seriously broken. It‘s got nothing to do with git and its implementation‘s complexity.

Sometimes features do require large changes. But usually you can break a feature into different parts (e.g. database, backend, frontend) and merge them in separately.

Not regularly, no. But sometimes a feature change requires a dependent architectural change — a refactoring of the internal library code that the feature will be implemented into. Or sometimes the language of choice doesn't have a pre-commit-hookable CLI auto-formatter, only an IDE auto-formatter, and the dev editing a file triggers formatting changes to be applied that should have been done in a previous change. And sometimes, a dev thinks it's a good idea to change the representation and decoding logic for a data file or embedded data-structure literal at the same time that they're adding an entry to it (usually because they can't represent the added item's additional semantics without said change.)

> But usually you can break a feature into different parts (e.g. database, backend, frontend) and merge them in separately.

So you've already written all that code, because you couldn't get anything to "work" for end-to-end testing until you wrote all parts of it. The patchset as a whole is inherently large.

Now what? How do you "break [the] feature into different parts" when it's already all written and committed on a WIP branch?

That's right: cherry-picking and rebasing.

The GP is arguing against bothering with this process. Presumably because git-rebase(1) is unintuitive to them, and they don't realize that you should start this workflow with a copy of your branch, or a new branch with cherry-picked commits from your WIP branch, to guarantee non-destructive rebasing. Like making a copy of a layer in Photoshop. (Yes, you can always restore your branch from the reflog, so it's technically always non-destructive; but `git checkout -b foo` is something you learn in Chapter 1 of the Git book.)

Nobody should have long running WIP branches with these massive commits full of unrelated stuff. That’s an antipattern.
Disagree.

No one maintains change logs in their repos most times, so a linear git history where you rebase existing branches on top of their base branches allows for a clean commit history on new features to be merged in which can then be squashed down for a linear commit history on the trunk branches.

Then you can use things like bisect, and just... ya know, read through your change log when you need to.

Shoot, you can even add a few details in the notes while you're at it. How about a link to the ticket and PR at the very least with some notes on implementation.

That's my approach, it really doesn't take much time.

But hey, if civil engineers had the level of rigor software engineers do we'd all be dead.

So you live with what you've got, do what you can on your own branches, and just accept that no one cares about having a clean git history cause we can't be bothered as a profession to spend a couple hours learning how one of the most important tools we use every day works.

I think you misunderstood my post, if you squash merge as I suggested your main branch is linear as with a rebase. Your PRs and the the working branches behind them should just use merged however. Come merge time the diff is turned into a single commit
Well of course it's as good as a rebase -- it is a rebase.
If you squash into a single commit upon merge, ignoring for the moment the fact that as a blanket rule that's a bad pattern, you've now eliminated one of the core arguments against rebasing. The merge commit adds no value if the branch itself is a single commit. Just rebase your squashed-into-one-commit branch ontop of latest master and push that to master instead. Now you have one commit representing your whole PR, with no pointless merge commit.

I really discourage the squashing upon merge approach entirely though, because that's just a bandaid for lazy and/or misinformed developers to cover up the fact that their whole git workflow is completely borked.

Seems you don't understand merge commits, they are nothing special.

Just don't: https://news.ycombinator.com/item?id=33518496

Your perspective is one I've only recently come to understand after migrating a team to git and being the "source control guy."

The lesson I learned was: Prescribe everything about the workflow because nobody is going to learn git.

All the nice flexibility of git just becomes risk. By the time you have enough structure in place, you're back where you started: rigid source control, and you're using git locally on the sly.

The only person who bothered learning git well was a summer intern. And he mastered it, so I remain frustrated.

For what it’s worth I know git fairly well and have used more git strategies than most. I just happen to have found that simple usage actually works better for me personally and most teams I’ve been on.

Knowing a tool also means knowing what not to use:

Like an acceptable subset of C++.
> Since moving to this workflow I’ve had zero issues losing data due to a confusing git situation.

Nobody who understands git will ever lose data, because once committed you can never lose it (it's in the reflog). Indeed, even just adding a file means you will never lose it, although it's not as convenient as having an actual commit.

So yeah, you kind of revealed the anti-rebase case quite tellingly there. It's for people that understand git so poorly that they regularly shoot themselves in the foot and lose work or make other similar mistakes.

> Most of the arguments in favor of rebase are by people fanatical about having a git history organized just so. It’s not worth the headache and effort. PRs are a better unit of work than commits in practice.

PRs and good git commit history are not mutually exclusive. But there are many drawbacks of trying to make PRs themselves your source of truth. A big one being that it's not actually stored in git, so if you ever migrate from github to gitlab or some other system, that context is gone.

> Configure GitHub or whatever you use to squash merge only and you’ll be good.

See, now this becomes even more absurd. What's your fear of rebasing if you're going to do the equivalent of a `git rebase -i` upon every merge anyway?

This is a very confusing and nonsensical ideology.

For those who want to improve their grasp of git, I highly recommend https://git-scm.com/book/en/v2. That book changed the game for me, because I finally understood how to visualize git history in terms of the DAG, and furthermore learned about how git actually works under the hood (blobs and the like) which made me confident I would never lose anything I've ever added/committed ever again.

> A big one being that it's not actually stored in git, so if you ever migrate from github to gitlab or some other system, that context is gone.

Request is committed to the repo on acceptance. Closed are typically useless.

> What's your fear of rebasing if you're going to do the equivalent...

The system takes care of the details without incidental complexity or errors.

> This is a very confusing and nonsensical ideology.

Pretty simple, folks are trying to get shit done. Not screw around with tools. One or two clicks where someone else did the hard work correctly wins every time.

From today, design is important:

- https://www.ncsc.gov.uk/blog-post/so-long-thanks-for-all-the...

- https://news.ycombinator.com/item?id=33531560

> Pretty simple, folks are trying to get shit done. Not screw around with tools.

I'm trying to get stuff done, not screw around with the Github UI. `git pull --rebase main` beats clicking around in a browser.

I use the cli as well, the clicks above refer to clicking the squash checkbox on a merge request in gitlab. This is 10x faster than hand-crafting an artisanal one to tell a story.
This is like throwing away 90% of usefulness that git provides you. That's what you get if you don't wish to spend some time learning one of the most important tools in your career.
People don't want to learn git because it's a bad tool. There are better source control systems, that are far easier to reason about, but they don't have the proliferation that git does.
That's nonsense. Git is the best version control history of all time. It has some regrettable UX difficulties, but as far as the system itself, there is no better decentralized development tool.

There's a reason git came into existence for linux kernel development. The linux kernel is a project so massive and so decentralized that it needed a fitting tool to be able to tame the chaos. And git did that perfectly.

Out of curiosity though, what to you is a better source control system?

Can you expand on what that 90% is? I'd guess more the other way around.

Squash merges are perfect to me for the bulk of PRs- atomic test-passing iterations on the working product. Exactly what I want to see in my history. Useful for bisection. Good for reviewing line-based code changes as I can find all the related work for that feature.

They don't seem appropriate for long lived feature branches, or merging into release branches, but those aren't really being discussed here.

Forcibly squashing PRs just loses information and doesn't bring any benefits in return.

In my experience, only the simplest PRs boil down into what's logically a single commit. Many PRs are simple, sure, but often you end up with bunch of logically connected atomic changes instead.

Let's take Mesa, an established and fairly high quality project, as an example. Look at its open MRs.

You can find bunch of single commit MRs, but some of them consist of approx. 2-4 commits, all of them with proper commit message. See for example https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19... or https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19.... It wouldn't make any sense to squash them when merging.

You can also get monster MRs consisting of 10-20 commits, like https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19... or https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19.... Splitting them out into separate MRs would serve nobody, only increasing noise and review turnaround time. Squashing them would lose a lot of useful information that you definitely want to retain for things like blame or bisect.

Now, Mesa actually doesn't utilize git as well as it could - it doesn't encode the commit's relation to merge request in any way other than commit message (`Part-of:` tag). Personally, I would merge a rebased branch using `git merge --no-ff` option, which would create a merge commit. This way, you get best of both worlds: by using tools like `git log` or `git bisect` with `--first-parent` flag you get what's essentially a list of merged MRs, filtering individual commits out; and if you don't add that flag, you get every single individual commit considered, useful for stuff like `blame` or single file `log`.

Also, before pushing a MR for review, my work branch is usually a mess. Lots of poorly divided up commits, without proper commit messages, sometimes undoing each other. `git rebase -i`, with squashing and rewording, is part of my everyday workflow. It allows me to use git as my personal undo and "let's-try-it-in-CI" tool without putting that baggage onto the reviewer. I get to be as messy as it's useful during my work, and the reviewer gets properly curated list of commits that's ready to be merged into the repository as-is. It's a win-win.

Not using rebases when working with git is fine when you work alone or when you're just learning how to use git, say, during a university project or internship. Otherwise, you're doing yourself a big disservice if you don't put that tiny effort into getting comfortable with tools you're using every day in your work.

I really like squash merges because then you know tests passed at every commit. Makes bisect easier, and thus more likely to be used. And no headaches when you can't rebase or you screw up a rebase, which will happen.
> I really like squash merges because then you know tests passed at every commit. Makes bisect easier

Here is a script (just 3 lines) that tells git bisect to ignore commits in the feature branches, so you can bisect only the commits (usually the merge commits) in the main branch. Best of both worlds.

https://quantic.edu/blog/2015/02/03/git-bisect-debugging-wit...

Not shared much opinion by something like half of all git users likely... as any absolute opinion. Also can only counter the insults by responding: you are a fanatic by using git for backup and work tracking, which in contrast to you, I wouldn't condemn you for ;) everyone/team can use a tool for what fits best and is their use case.

So far I had never lost data ever with git, just confusing git situations from people who never rebased, but still shoot into their own foots with duplicate commits added by helplessly merging around multiple branches too much, all with much too many "fix typo" commits and then unresolvable conflicts :D

(Oh, and btw, when people end up there and I'm asked to get them back to sanity, it usually involved rebasing or cherry-picking them out of their messes).

...here we go again.
> Most of the arguments in favor of rebase are by people fanatical about having a git history organized just so.

Seems to be a bit of an OCD compulsion.

Rebase is fine as long as it's your own unshared work. The alternative is https://xkcd.com/1296/
Don't rebase main.

Don't rebase shared branches.

It's amazingly powerful at making clear annotated changes. And removing small fixup commits for work in progress.

When someone says "never do x" there's probably a missing understanding of nuance.

THIS. Rebasing and squashing local/private changes allows for easier experimentation/rollback while you're implementing. BUT, rebasing anything shared will cause pain to others and should be avoided.
If having a nice history isn't important, and if PRs are a better unit than commits, why squash?
How do you lose data from a rebase?
A rebase creates new commits from old commits semi-automatically. Git then has no permanent record of the old commits, and even if you want to get back to them right away it requires some delicate git surgery.

This is why you can't generally share work using a rebase workflow.

It is not a big deal in practice in most every case, but in a version control system it is a little bit odd that rolling back such a fundamental operation isn't a first class feature.

> even if you want to get back to them right away it requires some delicate git surgery.

The reflog tracks rebase commit history. No surgery required.

>This is why you can't generally share work using a rebase workflow.

Rebase of public facing commits is not discouraged due to data loss. It is discouraged due to the possibility of someone creating change sets off the published work, and then the update re-writing the history to change the merge-base, requiring a re-merge of their changes.

They might be exaggerating a bit the amount of work to recover it immediately, but resetting a tag to a commit in the reflog might be considered at least a minor git surgery. And that’s assuming no public pushes have been made. Then all bets are off and it’s surgery’s time.
> and even if you want to get back to them right away it requires some delicate git surgery.

`git reflog` to get the old commit ID, and then `git reset --hard <commit>`. Seems more like "basic everyday git operations" than "some delicate git surgery".

Many past confused teammates of mine I've dropped in to help would disagree. reflog and reset, for better or worse, require what seems to be above-average comfort with git.
I don't think you can call yourself "using git" if you're not comfortable with such simple concepts. Pointing a branch to a specified ref is one of the most basic operations you can reason about when using git!

I'm perfectly aware that many people don't think when using git at all and instead merely copy'n'paste memorized commands hoping that they'll do what they want to accomplish, but this is something you should move past when you want to stop calling yourself a "junior developer". This is one of the most important and helpful tools in your field of work, you can either take advantage of it or suffer.