Hacker News new | ask | show | jobs
by globular-toast 248 days ago
Indiscriminate squashing sucks. Atomic commits are great if you want the git history to actually represent a logical changelog for a project, as opposed to a pointless literal keylog of what changes each developer made and when. It will help you if you need to bisect a regression later. It sucks if you bisect and find the change happened in some enormous incohesive commit. Squashing should be done carefully to reform WIP and fix type commits into proper commits that are ready for sharing.
3 comments

> It sucks if you bisect and find the change happened in some enormous incohesive commit.

But why are any PRs like this? Each PR should represent an atomic action against the codebase - implementing feature 1234, fixing bug 4567. The project's changelog should only be updated at the end of each PR. The fact that I went down the wrong path three times doesn't need to be documented.

> Each PR should represent an atomic action against the codebase

We can bikeshed about this for days. Not every feature can be made in an atomic way.

That's true, some are big and messy, or the change has to be created across a couple of PRs, but I don't think that the answer to "some PRs are messy" is "let's include all the mess". I don't think the job is made easier by having to dig through a half dozen messy commits to find where the bug is as opposed to one or two large ones.
> I don't think that the answer to "some PRs are messy" is "let's include all the mess"

Hey look at us, two alike thinking people! I never said "let's include all the mess".

Looking at the other extreme someone in this thread said they didn't want other people to see the 3 attempts it took to get it right. Sure if it's just a mess (or, since this is 2025, ai slop) squash it away. But in some situations you want to keep a history of the failed attemps. Maybe one of them was actually the better solution but you were just short of making it work, or maybe someone in the future will be able to see that method X didn't work and won't have to find out himself.

I can see the intent, but how often do people look through commit history to learn anything beside "when did this break and why"? If you want lessons learned put it in a wiki or a special branch.

Main should be a clear, concise log of changes. It's already hard enough to parse code and it's made even harder by then parsing versions throughout the code's history, we should try to minimize the cognitive load required to track the number of times something is added and then immediately removed because there's going to be enough of that already in the finished merges.

> If you want lessons learned put it in a wiki or a special branch.

You already have the information in a commit. Moving that to another database like a wiki or markdown file is work and it is lossy. If you create branches to archive history you end up with branches that stick around indefinitely which I think most would feel is worse.

> Main should be a clear, concise log of changes.

No, that's what a changelog is for.

You can already view a range of commits as one diff in git. You don't need to squash them in the history to do that.

I am beginning to think that the people who advocate for squashing everything have `git commit` bound to ctrl+s and smash that every couple minutes with an auto-generated commit message. The characterization that commits are necessarily messy and need to be squashed as to "minimize the cognitive load" is just not my experience.

Nobody who advocates for squashing even talks about how they reason about squashing the commit messages. Like it doesn't come into their calculation. Why is that? My guess is, they don't write commit messages. And that's a big reason why they think that commits have high "cognitive load".

Some of my commit messages are longer than the code diffs. Other times, the code diffs are substantial and there are is a paragraph or three explaining it in the commit message.

Having to squash commits with paragraphs of commit messages always loses resolution and specificity. It removes context and creates more work for me to try to figure out how to squash it in a way where the messages can be understood with the context removed by the squash. I don't know why you would do that to yourself?

If you have a totally different workflow where your commits are not deliberate, then maybe squashing every merge as a matter of policy makes sense there. But don't advocate that as a general rule for everyone.

I can see your point and sometimes I myself include PoC code as commented out block that I clean up in a next PR incase it proves to be useful.

But the fact is your complete PR commit history gives most people a headache unless it's multiple important fixes in one PR for conveniency's sake. Happens at least for me very rarely. Important things should be documented in say a separate markdown file.

This simply isn’t true unless you have to put everything in one commit?

To be honest, I usually get this with people who have never realized that you can merge dead code (code that is never called). You can basically merge an entire feature this way, with the last PR “turning it on” or adding a feature flag — optionally removing the old code at this point as well.

So maintaining old and new code for X amounts of time? That sounds acceptable in some limited cases, and terrible in many others. If the code is being changed for another reason, or the new feature needs to update code used in many places, etc. It can be much more practical to just have a long-lived branch, merge changes from upstream yourself, and merge when it's ready.

My industry is also fairly strictly regulated and we plainly cannot do that even if we wanted to, but that's admittedly a niche case.

> So maintaining old and new code for X amounts of time?

No more than normal? Generally speaking, the author working on the feature is the only one who’s working on the new code, right? The whole team can see it, but generally isn’t using it.

> If the code is being changed for another reason, or the new feature needs to update code used in many places, etc. It can be much more practical to just have a long-lived branch, merge changes from upstream yourself, and merge when it's ready.

If you have people good at what they do ... maybe. I’ve seen this end very badly due to merge artefacts, so I wouldn’t recommend doing any merges, but rebasing instead. In any case, you can always copy a function to another function: do_something_v2(). Then after you remove the v1, remove the v2 prefix. It isn’t rocket science.

> My industry is also fairly strictly regulated and we plainly cannot do that even if we wanted to, but that's admittedly a niche case.

I can’t think of any regulations in any country (and I know of a lot of them) that dictate how you do code changes. The only thing I can think of is your own company’s policies in relation to those regulations; in which case, you can change your own policies.

Medical industry, code that gets shipped has to be documented, even if it's not used. It doesn't mean we can't ship unused code, it just means it's generally a pretty bad idea to do it. Maybe the feature's requirement might change during implementation, or you wanted to do a minor version release but that dead code is for a feature that needs to go into a major version (because of regulations).

> I can’t think of any regulations in any country (and I know of a lot of them) that dictate how you do code changes

https://blog.johner-institute.com/regulatory-affairs/design-...

Our regulatory compliance regime hates it when we run non-main branches in production and specifically requires us to use feature flagging in order to delay rollouts of new code paths to higher-risk markets. YMMV.
> Each [X] should represent an atomic action against the codebase

That's called a commit. Not sure why some insist on replacing commits with vendor lock-in with less tooling and calling it progress.

yes, that would be ideal. especially in a world with infrastructure tied so closely to the application this standard cannot always be met for many teams.
Yeah "should" is often not reality, BUT I'm arguing that not squashing doesn't make things better.
I so miss bazaar's UI around merges/commits/branches. I feel like most of the push for squashing is a result of people trying to work around git's poor UI here.
Alternative to squashing is not a beautiful atomic commits. It is series of commits where commit #5 fixes commit #2 and intruduces bug to be fixed on commit #7. Where commit #3 introduces new class that is going to be removed in commits #6 and #7.
Yeah, I don't see the value in looking through that. At best I'll solve the problem, commit because the code works now, create unit tests, commit them, and then refactor one or both in another commit. That first commit is just ugly and that second holds no additional information that the end product won't have.