Hacker News new | ask | show | jobs
by Olshansky 617 days ago
Is there anyone else that enforces a simple "just squash & merge everything from PRs into main" across the entire team?

I'm comfortable git fooing w/e is necessary, but ever since we adopted this, git related conversations went to almost zero. It's great.

7 comments

I’ve recently spent time truly learning git, and I’m realizing how much better it is to take care of your commits in a PR even if you squash to main.

My reviewers love when I write “review commit by commit” in the PR description. Then each individual commit has its own reasoning, and you can mentally switch into reviewing if that commit does it’s one thing it’s supposed to do correctly. I will accept the argument that each commit should be its own PR though :)

This is so puzzling to me. It’s like people[1] get so stuck in the GitHub PR framework/mindset that they then, after realizing that GitHub PRs suck for reviewing individual commits, discard commits as the reviewable unit and then shoehorn one-commit-per-PR as a replacement.

The PR is just the best GitHub had to offer. There are other approaches to code review.

[1] Here we are generalizing.

I treat PRs like the email workflow. You send a diff my way for a particular changes, I either accept it or reject it. Or I suggest modifications. Recursively. It’s the whole patch I’m interested in, not the implementation history (I’m not your tutor). Once approved, I make a commit in main for this diff.
The classic email workflow is either one patch or a patch series. Where each patch becomes a commit. And each patch can be reviewed in isolation (like a commit).

It is not anemic like the squash workflow.

There’s nothing stopping anyone from creating a PR series. My reasoning for the squash workflow is described here[0]. I just equate a PR to a patch. And it becomes a commit in the main branch. I don’t really care about the commits in the PR, just like no one cares about the commits that produced the patch in the email workflow.

[0]: https://news.ycombinator.com/item?id=41839282

> I don’t really care about the commits in the PR, just like no one cares about the commits that produced the patch in the email workflow.

They do care. They go through the trouble of reviewing it so that the resulting commit[1] that lands in the upstream repository is good.

[1] Presumably you don’t mean “they don’t care about the commit that produced the patch”… since the patch is just a transport format for the commit.

Exactly! Stacked diffs/changes are the way, with one commit per PR. https://www.stacking.dev/
Yep, IMO this is always the best of the 3 strategies.

The PR is the unit of work, people get too hung up on the PR's individual commits.

Worst: rebase and merge - you end up with your coworker's broken WIP commits all over master, and have a terrible git revert story

OK: merge commit - you can revert, but there is a less intuitive `-m 1` flag (IIRC?) you have to pass into revert, and IMO you rarely need the intermediate history.

Best: squash & merge - you get one single commit representing the unit of work merging in, git revert is dead easy

Also setting the commit message in main to the `{title} (#{prNum})\n\n{prDescription}` format preserves all the good context from your PR and lets you get back to it if you need.

Three options? The best alternative to squash-merge is to freely rebase before you merge. Or for that matter squash if you just end up with one commit (because rebase subsumes squash).

1. You get rid of WIP commits, typo fix commits, all kinds of transient changes (for review and so on)

2. You keep the substantive ones

3. All changes are logically separated

I don’t see where you cover this option.

> Also setting the commit message in main to the `{title} (#{prNum})\n\n{prDescription}` format preserves all the good context from your PR and lets you get back to it if you need.

Again. It is ironic that people are so comfortable with dumping the history in a webapp when you are working with a version control system.

I am fine with a lot of the history being on GitHub or whatever other webapp. The review history, that is. But I’m not comfortable with leaving the history of the pre-squashed commits on GitHub if those pre-squashed commits were useful for the long-term history.

If the code author deems the pre-squashed commits useful, they would've split their PR into multiple PRs. I personally do this all the time: each PR has only one commit, but PRs can depend on each other.

And if the PR author deems their commits insignificant, they can feel free to make one PR and they squash & merge.

This is like a game of tag. We move on from the squash policy to a different weird requirement.

Vanilla tools make PRs have a certain overhead. Dependent branches and flipping from the terminal or IDE to the webapp and so on.

And these vanilla tools are often a team requirement. But with git(1) you can work around all that overhead.

Am I gonna stop weirdly insisting on using the version control system itself for version control instead of latching onto whatever passes for “PR”? Apparently not.

Only people who aren't good at git will think dependent branches are overhead.

The version control system itself doesn't have enough features for collaborative coding. It doesn't have all the discussions and messages exchanged between authors and reviewers. The main task here isn't version control itself but a collaborative social process. The kernel has LKML. The rest of us mostly use GitHub. By making one large PR with distinct commits you force that discussion to be intertwined together. That's why I continue to believe the best way is to make multiple smaller PRs each with one commit (or multiple commits that are squashed upon merge).

> Only people who aren't good at git will think dependent branches are overhead.

[redacted]

You have to create branch names and keep them in synch. with `git rebase --update-refs` constantly. But worst of all: vanilla forge PRs don’t support it. So you have to thread these dependencies manually instead. What magic eliminates this overhead?

I think "every PR should be a single commit when merged" is a bit too dogmatic. If the PR is logically split up into bits of work that conceptually separate and that compile/run independently, then it makes sense to have those as separate commits. You might argue that those should then be separate PRs, but that's going a bit too far, I think. You can still revert it, it's just have to revert the individual commits and squash the revert commits.

IMHO the guideline should be "clean up your branch and rebase it before merging. Usually that means a single commit, but it can be multiple if that makes more sense to <future person> reading the history".

> IMHO the guideline should be "clean up your branch and rebase it before merging

In my experience, engineers tend to fall into 1 of 2 camps: 'Deep' Git knowledge who routinely dig through reflog and keep backup branches, commit early/often and autosquash logcal chunks until their PRs tell a Story through their commit history. The other side pretends git is p4; and has no concept of fetch vs pull vs rebase. A base assumption that branches are expensive and to be avoided.

I'd like to think probably fall in the middle, but nearly every engineer I've worked with falls on those edges based on the number of DMs i get asking for help after after the rote `git stash; git pull; git stash pop` throws conflict.

My local git copy can be a mess of branch and commits. But I want the main copy to have a clean history that embodies the project. Which means having only the main branch and a few others that represent main activities in flight, each changes can be tied to a ticket, a task, and people, and easily manage releases and reversions.
What’s that got to do with this squash sub-topic?
Individuals who don’t care and prefer squash can still use the squash option for their own work. The option is there in the $forge PR menu. No git(1) required.

The problem with the squash strategy is when it is used as a team-wide policy. Like it was described here. I can’t understand why we have to collectively limit ourselves like that.

I don’t see it as too dogmatic. I see it as taking the ambiguity out of the decision whether to squash or not. Just always squash into master. There are plenty of options to make creating PRs more lightweight.

Another nice thing with squashing is that merges into master always look the same regardless of individual engineer workflows.

Any dogmatic decision could be described as taking the ambiguity out of decisions.
Sure. And that’s positive in the context of this thread.
We’ve discussed the downsides of having such a policy in this subthread.

I want to have the choice to do a true merge or some other strategy. It’s a downside for me personally. Not having a choice doesn’t help me since it’s a meaningful choice in my book.

As a policy. I leave this decision for others. People who prefer can squash-merge all the time if they want to.

That’s when you introduce feature branch if the work is going to take days or months to get right. That feature branch will be the target of these PRs. Then you either merge (if you want the history) or squash if you don’t
Everyone knows what a feature branch is. Even people who choose to not use them.
I generally dislike being told how to handle my commit history as 'i can do it better' - but this is really true.

Commit history on a (larger?) PR tends to be most useful during the PR itself; I tend try and make my commits tell a story I can walk people through (on the CLI during a call) moreso than anything that will be useful in 6mo/year.

I've been convinced on Squash/Merge. If the PR needs more granular commits; maybe it should be 2 independent PRs.

I do want something that will be intelligible for as many years from now as possible.

We constantly use that on the OSS project I’m occasionally involved with. “Well why is it like that…” and the project has enforced a good history so this can often be answered.

And that’s the primary benefit of Git. With some discipline the history becomes legible.

On the other hand it doesn’t give you much out of the box for code review. Unless the code review you use is covered by the email tooling.

Enforcing a good history can be as simple as writing a long commit message explaining things in detail. A good mental model is to default to a two-section message, explaining why a change is made and how it is made. And then when you squash & merge, include all of the detailed commit messages. Fun fact: at a previous company I'm in the top 1% for writing long commit messages.
Your commit message can be as long as it wants. If you squash twelve distinct changes you’re not going to see what part of the message corresponds to what change. Unless you invent some markup which points at the lines in the diff (but the diffs are dynamic so not something static you can count on). But at that point you’re doing so much work that you might as well use a regular merge.
Please if you have twelve distinct changes you are making twelve separate PRs.
My job follows this, and "rebase" isn't even in the vocabulary at work. It's just so simple- our developers spend zero time with git issues because there aren't any. The master branch history ends up being a sequence of pull requests that all have a corresponding code review. Everything stays nice and tidy.
Rebase isn’t in our vocabulary either. Because those of us who use it just do it. And those that don’t, don’t.

And so there is no reason to interfere with anyone’s way of working.

We do this. We deploy on commit to main and use the commit SHA for versioning. So squash merges makes it easier to know which commit caused a bug or bisect. This work well for our PR based workflows. If required the PR can be used as reference for un-squashed commits.
If you deploy on merge (squash or regular) to main then having the commits squashed makes no difference with regards to finding out what was deployed.
But that way you make stacked diffs/changes/PRs really hard, so is it really worth it?

I firmly believe GitHub makes reviewing individual commits in a PR so painful that I'd rather not do that.

> Is there anyone else that enforces a simple "just squash & merge everything from PRs into main" across the entire team?

No. This kind of policy has plenty of downsides[1] and the upsides are relatively minor.

[1] https://news.ycombinator.com/item?id=41653892

The upsides are great. PRs should be atomic, introducing changes as a set. I don’t care about the history of the branch, I only care that the changes in the PR are one unit that can be reverted as a whole. And each commit in the main branch passes the tests.
I get those so-called upsides with `git log --first-parent`. I can view the changes as one set of changes or all of the individual ones.

But I can’t relate in general to those workflows where PRs have to be tiny and revertable (why need to revert so often).

It’s mostly about project management. In a project, the unit is a task. These tasks will then have discussion around then and then get assigned to the people that are going to do the implementation. After the latter is done and approved, it will be deployed and the task will be marked as done.

Having your git history reflecting this is great when you need to take decisions. Because no one will care about your individual commits. Every discussion will revolve about the task. So you make the PRs a bijective relation with it, and by extension the commits on the main branch. Then for special group of tasks like features, you have a specific branch you manage the same way. The other branches you create PRs from can be as messy as you like. And your local copy is yours to do whatever you want.

I know what project management and tasks are.

Commits are in practice a superset of all of that. Some commits will correspond to a task. But many commits will just do the work without having to go through the beaurocracy of “project management”. Having to “project manage” each and every commit if you want a useful and fine-grained (but not micromanaged) version control history is impractical.

There’s a lot of small tasks in the concrete code. It doesn’t make sense to give every little change a task.

And the flipside of that is that you can give up on making commits that are fine-grained enough. Now you’re back in squash town where, sure, your task list is just right. But your commits are sometimes a jumble of different concerns in order to stick to one-task-on-commit.

> Because no one will care about your individual commits.

Will someone care about your tasks?

You don’t pre-empt what commits you care about when you are trying to find the reason for some change in the Git history. You don’t find the exact commit that explains it, clear as day, and then throw it away because there is no associated task.

> So you make the PRs a bijective relation with it, and by extension the commits on the main branch.

It sounds useful with a mathematical name I guess?

Insisting on one-to-one mappings for such small things like commits smells of micromanagement. Yes, either people-management or your own time management.

> Having to “project manage” each and every commit if you want a useful and fine-grained (but not micromanaged) version control history is impractical.

You're misunderstand me there. I'm talking about the main branch's commits, on the main repository, not every branch and every commits, and not the local repos. And it's not dogmatic, just very nice to have. If you want a hotfix, it's very easy to create a new branch, commit the fix, push it, create a PR, and squash merge it. Unless you like to edit "main" directly on a collaborative project.

Or you can then instruct everyone how to cleanup their commits and make sure that each one is atomic. And you can merge your usual way. I have no dog in this fight. It's just that the above is easier to do. It's pretty much the same result.

Yes after I recently discovered that even with squash & merge, the intermediate history isn't lost: they are not in the git history but they can still be viewed on GitHub.
But then git isn't your source of truth anymore. GitHub still references these commits until they're garbage collected one day.
My understanding is that these commits don't get garbage collected ever. After all they still need to support `gh pr checkout` so they need to store them forever as part of the GitHub repo. Sure they won't be included in `git clone` but that's kind of the point.
That’s correct. They will not ge GCd.

And yet that’s not usually what the problem is with having your canonical history “over there” in a proprietary webapp.

But if the history is easily available on those PR refs then I guess it is (under doubt) fine.

If you really care, you can use the API to extract the PR’s data in a local archive.
If I really care about the version control history? Yeah.

Alternatively I could stick with git(1) which does all of this stuff out of the box. Instead of having to learn a superflouous (for the job) API.