Hacker News new | ask | show | jobs
by SketchySeaBeast 252 days ago
> It sucks if you bisect and find the change happened in some enormous incohesive commit.

But why are any PRs like this? Each PR should represent an atomic action against the codebase - implementing feature 1234, fixing bug 4567. The project's changelog should only be updated at the end of each PR. The fact that I went down the wrong path three times doesn't need to be documented.

3 comments

> Each PR should represent an atomic action against the codebase

We can bikeshed about this for days. Not every feature can be made in an atomic way.

That's true, some are big and messy, or the change has to be created across a couple of PRs, but I don't think that the answer to "some PRs are messy" is "let's include all the mess". I don't think the job is made easier by having to dig through a half dozen messy commits to find where the bug is as opposed to one or two large ones.
> I don't think that the answer to "some PRs are messy" is "let's include all the mess"

Hey look at us, two alike thinking people! I never said "let's include all the mess".

Looking at the other extreme someone in this thread said they didn't want other people to see the 3 attempts it took to get it right. Sure if it's just a mess (or, since this is 2025, ai slop) squash it away. But in some situations you want to keep a history of the failed attemps. Maybe one of them was actually the better solution but you were just short of making it work, or maybe someone in the future will be able to see that method X didn't work and won't have to find out himself.

I can see the intent, but how often do people look through commit history to learn anything beside "when did this break and why"? If you want lessons learned put it in a wiki or a special branch.

Main should be a clear, concise log of changes. It's already hard enough to parse code and it's made even harder by then parsing versions throughout the code's history, we should try to minimize the cognitive load required to track the number of times something is added and then immediately removed because there's going to be enough of that already in the finished merges.

> If you want lessons learned put it in a wiki or a special branch.

You already have the information in a commit. Moving that to another database like a wiki or markdown file is work and it is lossy. If you create branches to archive history you end up with branches that stick around indefinitely which I think most would feel is worse.

> Main should be a clear, concise log of changes.

No, that's what a changelog is for.

You can already view a range of commits as one diff in git. You don't need to squash them in the history to do that.

I am beginning to think that the people who advocate for squashing everything have `git commit` bound to ctrl+s and smash that every couple minutes with an auto-generated commit message. The characterization that commits are necessarily messy and need to be squashed as to "minimize the cognitive load" is just not my experience.

Nobody who advocates for squashing even talks about how they reason about squashing the commit messages. Like it doesn't come into their calculation. Why is that? My guess is, they don't write commit messages. And that's a big reason why they think that commits have high "cognitive load".

Some of my commit messages are longer than the code diffs. Other times, the code diffs are substantial and there are is a paragraph or three explaining it in the commit message.

Having to squash commits with paragraphs of commit messages always loses resolution and specificity. It removes context and creates more work for me to try to figure out how to squash it in a way where the messages can be understood with the context removed by the squash. I don't know why you would do that to yourself?

If you have a totally different workflow where your commits are not deliberate, then maybe squashing every merge as a matter of policy makes sense there. But don't advocate that as a general rule for everyone.

Commits aren't necessarily messy, but they're also not supposed to be necessarily clean. There's clearly two different work flows here.

It seems some people treat every commit like it's its own little tidy PR, when others do not. For me, a commit is a place to save my WIP when I'm context switching, or to create a save point when I know my code works so that I can revert back to that if something goes awry during refactoring, it's a step on the way to completing my task. The PR is the final product to be reviewed, it's where you get the explanation. The commits are imperfect steps along the way.

For others, every commit is the equivalent of a PR. To me that doesn't make a lot of sense - now the PR isn't an (ideal world) atomic update leading to a single goal, it's a digest of changes, some of which require paragraphs of explanation to understand the reasoning behind. What happens if you realize that your last commit was the incorrect approach? Are you constantly rebasing? Is that the reader's problem? Sure, that happens with PRs as well, but again, that's the difference in process - raising a PR requires a much higher standard of completion than a commit.

I can see your point and sometimes I myself include PoC code as commented out block that I clean up in a next PR incase it proves to be useful.

But the fact is your complete PR commit history gives most people a headache unless it's multiple important fixes in one PR for conveniency's sake. Happens at least for me very rarely. Important things should be documented in say a separate markdown file.

This simply isn’t true unless you have to put everything in one commit?

To be honest, I usually get this with people who have never realized that you can merge dead code (code that is never called). You can basically merge an entire feature this way, with the last PR “turning it on” or adding a feature flag — optionally removing the old code at this point as well.

So maintaining old and new code for X amounts of time? That sounds acceptable in some limited cases, and terrible in many others. If the code is being changed for another reason, or the new feature needs to update code used in many places, etc. It can be much more practical to just have a long-lived branch, merge changes from upstream yourself, and merge when it's ready.

My industry is also fairly strictly regulated and we plainly cannot do that even if we wanted to, but that's admittedly a niche case.

> So maintaining old and new code for X amounts of time?

No more than normal? Generally speaking, the author working on the feature is the only one who’s working on the new code, right? The whole team can see it, but generally isn’t using it.

> If the code is being changed for another reason, or the new feature needs to update code used in many places, etc. It can be much more practical to just have a long-lived branch, merge changes from upstream yourself, and merge when it's ready.

If you have people good at what they do ... maybe. I’ve seen this end very badly due to merge artefacts, so I wouldn’t recommend doing any merges, but rebasing instead. In any case, you can always copy a function to another function: do_something_v2(). Then after you remove the v1, remove the v2 prefix. It isn’t rocket science.

> My industry is also fairly strictly regulated and we plainly cannot do that even if we wanted to, but that's admittedly a niche case.

I can’t think of any regulations in any country (and I know of a lot of them) that dictate how you do code changes. The only thing I can think of is your own company’s policies in relation to those regulations; in which case, you can change your own policies.

Medical industry, code that gets shipped has to be documented, even if it's not used. It doesn't mean we can't ship unused code, it just means it's generally a pretty bad idea to do it. Maybe the feature's requirement might change during implementation, or you wanted to do a minor version release but that dead code is for a feature that needs to go into a major version (because of regulations).

> I can’t think of any regulations in any country (and I know of a lot of them) that dictate how you do code changes

https://blog.johner-institute.com/regulatory-affairs/design-...

That document doesn’t say that, as far as I can tell. If you’re using a compiled language, the dead code likely gets removed anyway, it is never shipped.
Our regulatory compliance regime hates it when we run non-main branches in production and specifically requires us to use feature flagging in order to delay rollouts of new code paths to higher-risk markets. YMMV.
> Each [X] should represent an atomic action against the codebase

That's called a commit. Not sure why some insist on replacing commits with vendor lock-in with less tooling and calling it progress.

yes, that would be ideal. especially in a world with infrastructure tied so closely to the application this standard cannot always be met for many teams.
Yeah "should" is often not reality, BUT I'm arguing that not squashing doesn't make things better.