Hacker News new | ask | show | jobs
by cornstalks 405 days ago
> The idea, particularly as realized in the GitHub pull request workflow, is that the real “unit of change” is a pull request, and the individual commits making up a PR are essentially irrelevant.

I loathe GitHub PRs because of this. Working at $dayjob the unit of change is the commit, and every commit is reviewed and signed off by at least 1 peer.

And you know what? I love it. Yes, there's some overhead. But I can understand each commit in its entirety. I and my coworkers have caught numerous issues in code with these single-purpose commits of digestible size.

Compare this to GitHub PRs, which tend to be beastly things that are poorly structured (not to mention GitHub's UI only adding to the review problems...) and multipurpose. Reviewing these big PRs with care is just so much harder. People don't care about the commit message, so looking at the git log it's just a mess that's hard to navigate.

7 comments

My ideal workflow is commits are as small as possible and PRs "tell a story", meaning that they provide the context for a commit.

I will split up a PR into

- Individual steps a a refactor, especially making any moves their own commits

- I add tests *before* the feature (passing, showing the old behavior)

- The actual fix or feature commit is tiny, with the diff of the tests just demontstrating how behavior changed

This makes it really fast to review a PR because you can see the motivation while only looking at a small change. No jumping around trying to figure out how the pieces fit together.

The main reason I might split up a PR like that is if one piece is likely to generate a discussion before its merged. I then make that a separate PR and as small as possible so we can have a focused discussion.

I hate squash merges.

As someone who has often had to dig into the history to figure out what happened, I always want to see at least this. And I wouldn't be opposed to seeing it broken down even more as it was worked on. Not one big squash merge that hides what really happened.

I'll also add one more to your list: Any improvements that came out of the review but stayed in that merge should each be individual commits. I've seen hard-to-trigger bugs get introduced from what should have been just a style improvement.

One of the problems is that GitHub's UI and workflow isn't very good for this in various ways (can't review commits, can't really diff with previous after amending commit).

So as a rule, I tend to stick with "1 PR == 1 commit", except when there's a compelling reason not to.

Worrying about individual commits is also what makes it possible for you to later use git bisect without going crazy.
And to have a useful "git blame". My editor setup shows me a subtle git blame on each line of code, and I find it quite helpful to know who changed what last and why. Both when coding, and when debugging.

This is why, contra the linked article about commit messages, I strive to make minimal and cohesive commits with good messages. Commits are for future archaeology, not just a way to save my work every minute in case my hard drive dies.

How often do you find that command useful?

In ~18 years of git use, I have never needed it, but I see it mentioned often as an important reason to handle commits in some certain way. I wonder what accounts for the difference.

I used it a few weeks ago to track down a weird regression/bug in Firefox. A few years ago I used it to track down a regression in Wine.

That's probably the most important case: large complex codebases with lots of changes where "wtf is going on" isn't so obvious from just the code.

I've never used it for any of my personal projects or even at my dayjob, because they've much smaller with far fewer changes (relatively speaking).

It's useful when the codebase is difficult to debug directly. Eg, your users have a bug that maybe appears on specific hardware, which the developers don't have. The users can't be expected to comprehend the code base enough to debug, but bisect is a mechanical process that they are capable of.

Having said that, bisect is also an O(log N) method and it's useful where otherwise you might end up spending O(N) time debugging something. I have myself split a configuration change into many stupidly-small commits (locally, without the intention to push that) purely so I could run bisect instead of manually reviewing the change to figure out which part broke stuff.

Git bisect is one of those tools that - once you learn how to use it effetively - fundamentally changes how you think about your git repos. I have had people tell me that a clean git history isn't worth the effort but once they really grok what you can do with that solid foundation really come around.

One case where git bisect really saved me was when we discovered a really subtle bug in an embedded project I was working on. I was able to write a 20 line shell script that built the code, flashed the device, ran 100 iterations of a check, and do some basic stats. I left the bisect chugging away over the weekend and came back to an innocuous-looking commit from 6 months earlier. We could've spent weeks trying to find the root cause without it.

whats N here?
In the case of git bisect, the number of commits. In the alternate case, it depends what debugging strategy you decided to use.
I think it’s not that you couldn’t have used it but because you discount it it wasn’t something you reached for. If you flip the script as something that’s out there and explicitly look for opportunities to use it it’s there. Alternatively, you don’t structure your commits carefully and thus git bisect for you is a mess that would pull up a giant amount of code anyway.

Heck, I used it yesterday because I had a PR where I was cleaning things in the C++ build system where things stopped working in CI in weird ways I couldn’t figure out but was fine locally. I used bisect locally to figure out which commits to test. You just have to think that a blind bisect search is going to be more effective than trying to spot check the commits that are a problem (and for tricky bugs this is often the case because your intuition can mislead you).

I’ve also used it to find weird performance regressions I couldn’t figure out.

Occasionally, but when it's useful, it's very useful. But generally only if most commits are buildable and roughly functional, otherwise it becomes a big pain (as does any manual process of finding what change introduced a regression).
Same. I've only done bisect debugging a few times. I'm almost always able to use more traditional debugging especially if I have a good idea about where the bug must be from behavior.

Bisects are good when the bug is reproducible, you have a "this used to work and now it doesnt" situatiuon, and the code base is too big or too unfamiliar for you to have a good intuition about where to look. You can pretty quickly get to the commit where the bug first appears, and then look at what changed in that commit.

How do you trace the origin of breaking changes, especially those arising from integration problems? For fairly busy codebases (>10 commits per day), and a certain subset of regressions, bisect is invaluable in finding the root cause. you can always do it the "hard way", so it's not the only way
I don't really ever find myself having to do that. I guess it's been a long time since I worked in an environment which did not use an "only merge to main after passing CI" workflow, and back then we weren't using git, anyway.

There was one git-using startup I worked for which had a merge-recklessly development style, and there was one occasion when I could have used `git bisect` to disprove my coworker's accusation that I had thoughtlessly broken the build (it was him, actually, and, yuck - what a toxic work environment!), but the commit in question was like HEAD~3 so it would probably have taken me longer to use git bisect than to just test it by hand.

> I don't really ever find myself having to do that. I guess it's been a long time since I worked in an environment which did not use an "only merge to main after passing CI" workflow, and back then we weren't using git, anyway.

You _never_ have bugs that slip through the test suite? That is extremely impressive / borderline impossible. Even highly tested gold-standard projects like SQLite see bugs in production.

And once you find a regression that wasn't covered by your test suite, the fastest way to find where it originated is often git bisect.

Bugs that slip through the test suite and it's important to trace the origin and it requires building more than a couple best-guess versions to find it. And even then, if it wastes an hour or two once in a blue moon that's not a big motivator for workflow changes.

You're skeptical of a far stronger claim than the one they actually made.

> I guess it's been a long time since I worked in an environment which did not use an "only merge to main after passing CI"

It's the same for me, but some integration bugs still escape the notice of unit tests. Examples from memory: a specific subset users being thrown into an endless redirect due to a cookie rename that wasn't propagated across all sub-systems on the backend, multiple instances of run-time errors that resulted from dependency version mismatches (dynamic loading), and a new notification banner element covering critical UI elements on an infrequently used page - effectively conflicting CSS position. In all these cases, the CI tests were passing, but passing tests don't mean your software is working as expected in all cases.

I also find git bisect to be useful, but very rarely and never for personal projets.

In the cases you mentioned, robust e2e and integration tests would ideally be able to catch the bugs. And for the UI issue in particular, I wouldn't think to track down the commit that caused it, but just fix it and move on.

Honestly if you haven't ever used got bisect I'd say you're missing out on a very powerful tool. To be able to, without any knowledge of the code base, isolate down to the exact commit that introduced a big is incredibly powerful
A coworker taught me how to use it long ago, else I would never have known it was there to reach for.

And the few times I've reached for it, I was really thankful it was there.

I’ve just used it two times in the last few months. One was to track down a commit which triggered a bug I found in Git. I wouldn’t be able to troubleshoot it myself. And I couldn’t send the whole repository because it’s not OSS. But with a script to reproduce the bug and half an hour I was able to find the problematic change.

I also tried to make a minimal reproduction but wasn’t able to.

I don't use git much. But at my job, where we use mercurial, where the unit of work is commit, I use bisect frequently. When one of our automated tests starts failing, I can run bisect and easily find the commit that caused it.
Often enough that I am truly shocked to hear someone say they have never needed it.
I used it multiple times to track down which commit introduced confusing bug.
I personally bisect regularly to find when issues were introduced.
Squash merge gives you the same thing.
Squash merges simply guarantee that git bisect will not be able to pinpoint a breaking change, because that history is gone.
If you treat a PR as a unit of work, then there is nothing to bisect. If you don't treat it as a unit of work, then people just edit their git history to merge commits just like a squash.
> If you treat a PR as a unit of work, then there is nothing to bisect.

You're bisecting the history of PR merges.

You ignored the part where I claimed that without squash merging, people will just do it manually with git rebasing or amending.
Here’s the most relevant (to me) difference:

- The real unit of change lives in Git

- The real unit of change lives on some forge

I want it to live in Git.

They are the same if you use squash merge.
modulo the commit message, which github apparently takes a lot of effort to not surface when needed; the most egregious example is that it's a complete afterthought to fill out right before the 'squash and merge' button becomes green.
You can make the PR description the commit message through configuration.
>Working at $dayjob the unit of change is the commit, and every commit is reviewed and signed off by at least 1 peer.

Respectfully, that's the dumbest thing I've ever heard.

Work on your feature until it's done, on a branch. And then when you're ready, have the branch reviewed. Then squash the branch when merging it, so it becomes 1 commit.

Commit and push often, on branches, and squash when merging. Don't review commits, review the PR.

I've had people at various jobs accidentally delete a directory, effectively losing all their progress, sometimes weeks worth of work. I've experienced laptops being stolen.

If I used your system, over the years me and various colleagues would have lost work irretrievably a few times now, potentially sinking a startup due to not hitting a deadline.

I feel your approach shows a very "Nothing bad will ever happen" attitude.

Yes, of course you should have a backup. Most of those don't run every few minutes, though. Or even every few hours.

"Just trust the backup" feels like a really overkill solution for a system that has, as a core feature, pushing to a remote server. And frankly, a way to justify not using the feature.

What's the difference between this and squash merging PRs? A commit or a PR can be large. I don't see the difference.
> A commit or a PR can be large. I don't see the difference.

They made it pretty clear they're talking about not-large commits. And they're contrasting that with any-size PRs.

That's a false dichotomy, lurker. A PR or a commit can be large or small.
It's not a false dichotomy. They're just using different terms than you would, based on their experience with how the people around them use those systems.
It's comparing carefully manicured git history hacking to unregulated PRs. It's pure dogma.
You agree that they are making a real comparison between two different things.

But you also say it's pure dogma?

I'm confused.

I think it's the review process. Do you review 1 commit or multiple?
Is this what Gerrit does?
Pretty much, yes. I've only used Gerrit a few times so my direct experience is limited.
This encourages commit size to grow drastically.