Hacker News new | ask | show | jobs
by cballard 3730 days ago
This is a bad idea masquerading as a good idea. Before making a pull request (or doing any sort of merge), you should rebase against upstream master (or whatever you're going to push to). However, keeping distinct atomic commits that change one and only one small thing, when possible, is much preferable if bisect or blame is used. If you have broken or poorly written commits, use fixup, reword, squash, etc. in rebase -i.

Using fast-forward (and possibly only allowing fast-forward) is a good idea. Squashing entire pull requests that may change multiple things into a single commit is a very bad idea.

14 comments

If someone prepares a pull request with a well-structured series of commits, making a logical series of changes, where the project builds and passes tests after each commit, then those commits shouldn't get squashed.

However, I frequently see people adding more commits on top of a pull request to fix typos, or do incremental development, where only the final result builds and passes, but not the intermediate stages, and where the changes are scattered among the commits with no logical grouping. In that case, I'd rather see them squashed and merged than merged in their existing form, and having a button to do that makes it more likely to happen.

The trick is to not squash everything into one giant commit, but to use rebase -i liberally to squash/fixup those typo fix commits where they belong.
That's what the author of the pull request should do. But this provides a potentially useful alternative when that doesn't happen.
Plain squashing commits, while still a valid option in very few cases, will likely lead to gigantic commits that are hard to reason about.

I've seen projects where maintainers clean up poor commits before merging them: rebase/squash/reword only what's appropriate.

It's also the case that you lose the code review if you force push to a PR's branch after adding in a typo fix and squashing locally, right?

That's a pretty good reason not to squash till the review is done.

> It's also the case that you lose the code review if you force push to a PR's branch after adding in a typo fix and squashing locally, right?

Not as far as I can tell; I've force-pushed pull request branches many times, and the code reviews seem to stick around. (Perhaps they wouldn't if the code changed more drastically, like files disappearing; I haven't tried that.)

I used to feel the same way (and still do to some degree). However I think the issue is more nuanced. I agree that rebasing beforehand is a good idea. But I can see the value in keeping commits on the master branch corresponding to specific features or bug fixes (which presumably map to PRs).

I think the argument can be made that if you don't feel comfortable performing a squashed merge of a PR, then that PR contains too much work and should be split up. However, I don't think there's an easy rule to decide in either case.

Small PRs are an issue because PRs are dependent on other users and can't be dependent on a prior PR.

Let's say we're adding an interface/typeclass/protocol and a concrete implementation. I'd say these should be two separate commits, as they're adding two different things. An interface doesn't require a provided implementation to work. But, if we were to create those as two separate pull requests, it would be more work for the project maintainers, and the initiator wouldn't be able to create the PR for the concrete implementation until the interface PR was merged - the concrete PR can't be added as a dependent PR of the interface one, or something to that effect.

Since you can "compare" almost anything on Github, small commits aren't really an issue, just view a larger-scope comparison to get an idea of the whole PR.

Another way to put this might be that commits are for individual code changes that build up to a pull request, which is a conceptual change?

> and can't be dependent on a prior PR.

This pinpoints the major problem exactly. Without dependencies between PRs there's really no sane way (with this feature enabled) to submit a series of commits while expecting those commits to remain separate.

Oh, and I object to the general sentiment in the responses to your post that seem to value drive-by/inexperienced contributors over the "experts". Yes, we definitely should make things as easy/simple as possible for new contributors, but NOT at the expense of adding a gotcha for expert contributors. The experts are what keep a project going over many years instead of just releasing version upon version of trivial spelling fixes.

(And, btw, the default "merge" option for GitHub PRs also sucks. It should be possible to simply disallow non-FF merges and to force all merges to be FF. EDIT: Interestingly, this seems to be about the only workflow explicitly forbidden by the new rules... unless, of course, you're willing to merge everything manually using your local copy of the repo and pushing from that to GH.)

Yeah, it is also the obvious thing that is missing in the review stage of a pull request - viewing all the content and all the diffs separately but on one page, in a serial way that corresponds to the actual order they will show up when you do git log, all on the same page.
This is something that Gerrit supports natively: you can have a Gerrit CL that depends on another CL. It's unfortunate that Github doesn't support any equivalent.
Yup. Happy user of Gerrit here. :)
How does not squashing your commits help the protocol/implementation scenario you described?
You can merge the interface PR into the concrete implementation's PR. You don't have to work off of just one remote branch.
> Before making a pull request (or doing any sort of merge), you should rebase against upstream master (or whatever you're going to push to)

See, and maybe this is because I'm just dumb or something, but I have never gotten rebasing to work for me. Ever. Every single time I do it I read at east 3 articles about it so I don't screw something up, I attempt to do it and ultimately I lose a bunch of work.

I just don't get it. I can write web, mobile and desktop apps and I like to think I'm pretty good at it. But I'm one of those people who constantly have commits of merges in their code because for whatever reason I just can't get my head around making rebasing work correctly.

Am I the only one? Sorry for the derail but it's bothering me that I've never gotten this to work correctly and I feel otherwise normally smart. ¯\_(ツ)_/¯

A few tips!

1. Always use the "upstream" branch as your rebase target - "git rebase -i master", or " git rebase -i origin/master". This is almost always what you want, and picking the wrong base is the most common error I've seen when teaching people rebase -i

2. Use autosquash! https://robots.thoughtbot.com/autosquashing-git-commits. If you have trouble with the text-editor interface you get when you run rebase -i, this will both handle its usage, and in the long run give you some visual examples of how the interface is supposed to be used. If you're really into this, set the config option "rebase.autoSquash true" to avoid the extra command-line flag.

3. If you mess up and realize in the middle, git rebase --abort.

4. Use the reflog after the fact for both finding and undoing mistakes: git diff branchname branchname@{1} to check for unintended code differences, and git reset --hard branchname@{1} to undo the rebase.

Thanks! I get the feeling I should give up on using a GUI for most of my git usage as doing many of these seems awkward or impossible with the GUI. That's probably part of my problem.
For what it's worth, I pretty much exclusively use git through Eclipse's EGit UI. I do a lot of rebasing, as we use Atlassian's "centralized" workflow. [0] It's interactive rebase interface is pretty good. If you haven't used it, maybe try loading a repo into it and see how you do.

[0] https://www.atlassian.com/git/tutorials/comparing-workflows/...

Interactive rebase works fine for me in Eclipse (eGit). At least on wiindows that's preferable to me over the command line editor.

However, I am squashing very rarely, mainly for commits which correct typos.

Yeah. I'd recommend that. Personally, I found that I didn't really understand git until I did it all from the command line. YMMV, but that's what made it all click for me.
I find SourceTree to be surprisingly effective. Also, try setting Sumblkme Text to be your core.editor in git

  $ git checkout master
  $ git pull
  $ git checkout branch-name
  $ git rebase master
If there are merge conflicts, open the affected file(s) and resolve them. Then:

  $ git add filename.ext
  $ git rebase --continue
Finally:

  $ git push origin branch-name
If you've already pushed the branch, use -f. Make sure to always specify the branch name when using that flag!
For those cases where you have created a fork of a project and are preparing a pull request, would that be something like:

    $ git checkout master
    $ git fetch upstream  # https://help.github.com/articles/syncing-a-fork/
    $ # git merge upstream/master  # <- leaves merge commits in your fork
    $ git checkout branch-name
    $ git rebase upstream/master  # Use rebase instead of merge?
    $ git push -f origin branch-name
I think the advice to rebase runs up against the business pattern of pushing your branch as soon as you create it (git-flow and a lot of jira/stash integrations work like this). Also some teams want to see evidence of your commits as you make them, which means pushing as you commit.

If you have a branch and it's already pushed, rebasing just feels kind of funny and can sometimes cause a lot of problems if anyone else has checked it out.

If you have a branch and it's local only, then merging from mainline into your branch and selecting rebase instead of merge is relatively painless.

> ultimately I lose a bunch of work.

One trick that's worked ok for me in a private repo is, before starting to edit the fix-spline-reticulation branch (which has a handful of separate logical changes, fixes discovered midway through a later change that really belong in an earlier change, and temporary debug code that was never meant to go into the product) for publication, to do

    git branch fix-spline-reticulation.0
(or .the-next-sequential-number). Then no matter how badly the "rebase -i master" goes, there's a branch tag pointing at the original state, and

    git branch -D fix-spline-reticulation
    git checkout fix-spline-reticulation.0
    git branch fix-spline-reticulation
will destroy the failed attempt and restore the branch to its earlier state. (Note that if you decide in the middle of the rebase that you're losing, "git rebase --abort" will undo anything you've done so far; you need the backup only if you regret the rebase after you're finished). It also makes it easy to "git diff my-feature.0..my-feature" and confirm that all the changes in the edited history add up to the same as the real history.

Sometimes I do this in the middle of development to move all the changes intended for the product ahead of the temp debug stuff in case I suspect the debug code is causing problems. Keeping the debug code in the dev branch even after the cleanup rebase makes the diff to check the rebase easier (then, of course, the merge should take the commit just before the debug).

Best never to do let anything but the cleaned-up branch hit a shared repo.

> See, and maybe this is because I'm just dumb or something, but I have never gotten rebasing to work for me. Ever. Every single time I do it I read at east 3 articles about it so I don't screw something up, I attempt to do it and ultimately I lose a bunch of work.

Rebase takes a little bit of practice, but everyone who's using git owes it to themselves to learn it by heart. It's almost like having superpowers compared to any VCS which doesn't have rebase.

My advice[1] would be to simply create some dummy repository (perhaps just copy an existing repository with some real code) and going through various scenarios described in the git-rebase man page (using some trivial changes). If something blows up, don't worry, you can always just start from scratch.

The key to making rebase work for you is: 1) understanding the underlying model of git[2], and 2) practice, practice, practice. With enough practice you'll get a good feeling for which "type" of rebase works best in a given situation.

[1] In addition to the excellent advice given by others in this thread.

[2] It may look like it's really all based on snapshots of files, but the workflows are definitely mostly centered around patch-based thinking.

I know that I probably swear in church since git is the current de facto standard for version control but this shows that gits usability is way too low. Why do I have to invest so much time to understand the inner workings of a tool that should just help me collaborate with my coworkers? I've given up on understanding git and use gitflow and the built in tools in Intellij Idea for all my branching/merging/committing needs.
While I like the git flow model, I find the git flow plugin really useless.

You lose so much of the power that makes git such an awesome tool.

Same with every single front end to git I've ever tried, free or purchased. I always come back to the CLI because it's so much more powerful.

The IntellJ merge tool is pretty nice though.

Because this is a quite advanced feature, and creating an interface for it is inherently hard. IIRC, the IDEA interface for interactive rebase ends up looking exactly the same as the text-mode one.
Version control is hard, simple as that. Git actually makes a great job at keeping simple stuff simple, but if you need some of the more complex stuff... well, I guess you could always go back to pen, paper and a secretary (aka: make someone else do it for you).

  ultimately I lose a bunch of work.
Take a copy of the entire repository before attempting anything potentially destructive.
This is completely unnecessary. Everything git does, git can undo as long as your working tree is clean when you start the rebase. git reflog and git reset are your friends, if you want to "get back to where I was before I started this awful merge".
Or just write down the latest commit (or use git reflog to find it post facto) and if you mess up, do "git reset --hard <commit>" to get back to it.
In tricky situations, I always commit work done. Then I attempt to do potentially harmful work. Note that afaik you can't lose commits in your history (they may be hidden, but reflog to the rescue). If I am very unsure whether something will work as intended, I place a dummy branch (a tag will do as well) onto that safety commit which will make it easier to find it back (in that case you don't need to resort to reflog). I never lost work once committed even when I painted myself into a corner. Note as well that rebase -i will always create a new commit rebased onto the entry commit. Going back to where you started is always possible.
Use `rebase --interactive` so you can have a better idea of what is going on.
I don't blame you. Git has terrible usability.
I think that GitHub's pull-request-based model is fundamentally broken. Gerrit's model, where every commit is quasi-independent (and hence must pass tests) and you can easily edit without force-pushing anything or losing review history, is superior (though not perfect) in almost all cases. (Exception: merging a long-running feature branch where all the commits in the branch have already been reviewed.)

This is GitHub's attempt to solve the problem without really changing anything. It won't really change anything. Since pull requests routinely contain a mixture of both changes that should be squashed (fixups) and changes that should not be squashed (independent changes), this just means that you get to pick your poison.

I used to strongly believe what you do until my company started using Phabricator, which forces the squash workflow on you. It makes your history more useful, not less. The pull request is the appropriate unit of change for software. Make small commits as you develop, then squash them down into a single meaningful change to the behavior of your software.
As a git novice I wonder, doesn't a proper workflow do the same thing? When I submit a feature branch it might have a lot of ugly commits. However, once I merge it to an integration branch there is one nice commit explaining what I did.

When coworkers create Pull requests I don't go through all of their commits and changes along the way. I just look at the diff so, I don't see the need for them to squash it first.

Once merged, the history of your feature branch becomes part of the history of the integration branch.

Sounds like you're using GitHub (Enterprise) or something similar where the pull request view shows you all of the changes in a "squashed" fashion.

Yes Bitbucket so, maybe that explains it.
Sometimes yes, sometimes no. Merge commits are nasty imo and I'm glad we can now forbid them outright, but a full squash isn't always the solution either.

Take a look at this PR for example:

https://github.com/HearthSim/python-unitypack/pull/4

Lots of back & forth. All the commits are related, and the PR is there to land all of those commits at once. I could land some of them right now (as they're safe to land), but keeping them in the PR keeps everything related in the same place (and none of them are required until that last commit lands).

A PR mirrors a "patchset" on mailing lists. You don't always want to squash all of it.

What you do want to avoid is a situation like this:

https://github.com/jleclanche/django-push-notifications/pull...

Where the original author creates their original commit and doesn't know how to --amend + push --force to the PR, and you end up with a ton of commits which you don't want to land all at once.

To clarify, pull requests shouldn't be squashed before review. They should be squashed when they get merged in, like this GitHub feature does. The granular commit history is useful during code review. It is not useful as a future developer trying to evaluate how the behavior of the program changed over time.
Right, sometimes. Pull requests are not necessarily a "unit of change" like you mentioned, though. For example, the first link I gave should not be squashed. But I don't want it to create a merge commit either.

I'm a little underwhelmed with the feature, it looks like it's either "squash everything" or "make a merge commit". There's no option to rebase & merge and/or selectively squash.

Exactly. In order for always-squash to work, then every PR must be the smallest possible atomic change. But often times certain features don't degrade to nice small atomic changes. It's sometimes useful to see that as a development process smell and consider using feature flags and other things to incrementalize the development, but sometimes the cleanest thing to do is just have a series of commits. I don't see a good reason to take this option off the table since source code history management is truly a craft in its own right.

FWIW, I prefer a carefully curated rebase and then merge --no-ff so that you can still see the branch and get at the branch diff trivially, but the history is still linear for practical purposes so bisecting is clean, etc.

Interesting. It's the first time I hear a fairly solid argument in favour of --no-ff. My main argument against it is keeping a clean commit log on github itself (and the default git log), as it's an entry point for new contributors.

Example:

https://github.com/lxde/pcmanfm-qt/commits/master

vs

https://github.com/facebook/react/commits/master

I think your first link should be squashed once the code has been reviewed. When looking back on history, commits are most useful when they're a list of behavior changes in the software. That pull request "exports meshes as OBJ." That's what's useful for future developers, not "add Transform fields." Leaving those in your history makes it harder to work with, not easier. If someone cares about the back and forth that happened within that behavior change, the pull request is always available.
"Add transform fields" may not be a very descriptive commit message, but those fields are fully independent from the OBJ exporter. They are the implementation of unity's Transform class. This implementation was stubbed before, it happens to be needed for an OBJ exporter to go through, but has nothing to do with the exporter itself.
It seems like it'd be nice to have two levels of granularity exposed in views of a source control system's history, basically corresponding to pull requests and commits. So you could drill-down to individual commits as needed, but would normally be able to work at the PR level.
Does "git log --merges" get us there?
I suppose it does, though at least the tool my org uses doesn't write good descriptions on the merge commits that are created upon merging in a PR — they're just like "Merge pull request 190 from …" when I'd prefer it to be named for the changes that are actually in that PR. That's good to know that exists, it could be useful. Thanks.
That's the `diff` tab on the PR
But the PR isn't retained (afaik) in the repo's history as a distinct entity (I'm talking about git proper here, not extra tools like GitHub, etc). In the end, git just has commits. [Edit: as prodigal_erik points out, perhaps merge commits are really what I need to be looking at.]
Nope, it does this by default, but doesn't force it.

$ cat ~/git/ATLAS/.arcconfig { "project_id" : "ATLAS", "repository.callsign" : "ATLAS", "conduit_uri" : "https://phabricator.$MYCOMPANY/", "arc.land.onto.default" : "develop", "immutable_history" : true }

Sure, it's a terrible feature to always use. And it's likely to be of little use to contributors who know how to use Git well. But in large open-source projects, often new contributors make a small change that needs a few minor corrections. Eliminating that final back-and-forth ("squash please") is a huge win for maintainers.
Not only maintainers - anyone tracing back through the history to find out what broke their use case.
There's no guarantee that every individual commit of a feature branch is meaningful, or even builds. It also makes the history of the master branch a lot harder to read when it has tons of commits representing the minutiae of the feature's development.
It really depends on each individual's workflow. I tend to use lots of "in progress" commits (each time things are "green"), and as I go, I regularly squash the commits, so in the final pull request I typically have several commits (and if I wasn't squashing it would have been a dozen). If I do feature and a refactor, they are always separate commits, it's easier to review these and bisect if something turns out to go wrong.

Some people might do similar things but they might not assure each commit is green, and they never squash anything (so you end up with non-meaningful commits).

As @3JPLW said, I see when it can be useful for opensource maintainers to have the option to squash someone's commits, when the change is small, but there are many commits (due to a review ping-pong etc)

There's no guarantee, but there are many benefits to striving for this ("git bisect run", CI test results).
If it isn't meaningful, then that is something the review stage should catch.
I use this workflow:

  1. branch off master
  2. work, commit, push, test (on CI server)
  3. decide it's time to ship
  4. rebase -i, push, test (again)
  5. git checkout master && git merge --no-ff feature_branch 
(make the merge commit message a summary of the feature)

master ends up being a list of feature branch commits, bookended by the merge commit which introduced the feature. Getting the squash commit diff is as easy as 'git diff feature_branch_merge^..feature_branch_merge'.

Squashing entire pull requests that change multiple things into a single commit is a bad idea, yes. But uploading and asking for review on such wide-reaching pull requests is a bad idea in the first place.

Using fast-forward without squash is also a bad idea in many cases: the string of commits may contain multiple points that don't actually build or pass tests, even if the final commit in the chain fixes all that. There's no point in landing those broken commits, and doing so will confuse bisection tools.

Fast-forward with squash, and enforcing reasonably sized code reviews as a matter of culture, is the best of all worlds in my opinion.

Why would you want to rewrite your whole work history and change the actual state of the repository at each of your commits? Why don't just merge?
Rebasing seems to clutter the Github PR's commit history and diff with all the commits to master that were made between the time the branch was cut and the time the rebase happens. But it doesn't do that if you merge in master. I never understood this.
It's a bad idea because it's a bad implementation. If it allowed you to select what to squash, defaulting to the behaviour of git rebase -i --autosquash master then it would be a clearly good feature.
> Squashing entire pull requests that may change multiple things into a single commit is a very bad idea.

If changes are too large/complex/disjoint to fix in a single commit then why have them in one PR?

I wonder why they did not add `--ff-only` as an option, like GitLab has.