Hacker News new | ask | show | jobs
by gorset 631 days ago
The use case is when you look at a branch as a series of patches.

Reviewing a clean set of commits is much easier than a branch full of mistakes and wrong paths taken.

Useful when we optimize for reviewing and good history for future maintenance. This has been important and useful when I’ve worked on big mission critical backend system, but I also understand it might not be the most important factor for a new project’s success.

2 comments

A branch with some base is already a series of commits. I don’t get where the conceptual re-imagining is here.
Patch series comes from the linux kernel workflow, which git was developed to support. https://kernelnewbies.org/PatchSeries

In this workflow you review every commit and not just the branch diff. Each commit is crafted carefully, and a well crafter series of commits can make even very large changes a brief to review.

It takes a certain skill to do this well. As the page above says > Crafting patches is one of the core activities in contributing code to the kernel, it takes practice and thought.

This is in contrast to using git more as a distributed filesystem where you don't care particularly much about the history, and you typically squash commits before merging etc. It's simpler and easier to work this way, but you lose some of the nice attributes of the linux kernel workflow.

That’s a nice summary.

What I don’t like about the Git documentation as I’ve read it is that they go between “patch” and “commit” in some places without stopping and explaining what the difference is. It makes sense to them. It’s obvious. But it isn’t necessarily obvious to most people.

A patch is a patch proper plus a commit message encoded in a format that git am understands. That’s fine. And the core developers understand that you cannot transmit a commit snapshot via email (or you shouldn’t). But I prefer to mostly stick to “commit” in the abstract sense, whether that to-be-commit is from a pull or from an email (or: it’s in the form of an email and it could be applied as a commit).

git rebase talks about “patch series” I think. Without explaining it. Why not “commit series”?

Sometimes it seems like talking about your changes by the way it happened to be transmitted. It’s like talking about “attachments” instead of commits because you happened to send them via email as an attachment (instead of inline).

Then you now have “stacked diffs” or “stacked commits”. Which are just a series of commits. Or a branch of commits (implicitly grounded by a base commit). For a while I was wondering what stacked diffs/stacked PRs/stacked patches and if I was missing out. When it just turned out to be, as you explain, essentially the Linux Kernel style of being able to review a commit in isolation. But in a sort of context that pull request inhabitants can understand.

I prefer to mostly talk about these things as “commits”.

(At several times writing those paragraphs above I wondered if I would be able to string together them in a coherent way)

I think part of the confusion is because 'patch' and 'commit' (really snapshot) are duals of each other, but in practice have important technical differences. When speaking abstractly about 'changes' it often doesn't much matter which term is used, but most interactions are with 'commits' so that tends to be the default term to use.

However, sometimes the details matter. For example, a 'patch' (diff + description) tends to be small enough to transfer conveniently, and human friendly. Patches do not describe their relationship to other patches, so it makes sense to talk about a series of patches which must be applied in sequence to accomplish some larger goal. 'patch' doesn't imply any particular storage format, so sometimes saying 'attachment' or 'patch file' or 'email body' is an important distinction. Git documentation assumes you know what you want. It might help to think of patches as "outside time", though of course any particular version of a patch will only apply to a subset of all snapshots.

A 'commit' (snapshot + description), on the other hand, tends to be large in that it describes the entire state of the codebase and contains (implicit or explicit) metadata describing its predecessor(s) -- possibly a large graph of them. People talk about a 'commit series' all the time, but use the terms 'history' and 'branch' instead.

Stacked PRs are built on top of commits and branches to gain the advantages of a patch series while retaining the advantages of snapshots. There's no industry consensus on if they are preferable, let alone how to best implement them (eg. rebase, rebase+squash, merge-squash, merge+first-parent, etc.), so different people have different ideas about what they look like. It isn't correct to say they are just a series of commits, because sometimes they are implemented as a series of (implicit or anonymous) branches. One of the few agreed-upon features of stacked PRs (or its other names) is that it is a sequence of 'changes' which are ordered to both satisfy change dependencies and broken into smaller pieces which "tells a story" to reviewers.

It certainly matters when you e.g. find a patch in the wild and have no idea what commit it was originally based on. Metadata fiddling becomes more important in email workflows.
Conceptually, a branch is a series of tree states. The patches are derived from that.
Conceptually a branch is the last snapshot in a series of snapshots.
So this wouldn’t work very well in workflows that flatten merges to a trunk?
It would mostly be unnecessary. The separate commits won’t matter after the PR if they’re getting squashed.

Debatably, if you’re making changes during a PR review, it could be helpful to make those changes in relevant commits. That way if someone goes through them during the PR, they get one clean “story”, rather than see the pre-PR commits and the conversation after.

Tools where they allow you to squash merge tend to be fairly incompatible with stack diff workflows generally.

It's tricky to review individual commits on Github and Gitlab and they don't even allow you to leave comments on the commit messages.

In areas where people do review individual commits they tend to use tools that support that workflow. Git uses email patches where each commit is sent as a separate email. Tools like Gerrit do code review directly on a commit by commit basis and the usual strategy to get it into trunk is to either "fast forward" (advance the HEAD of the trunk branch to that commit) or to cherry-pick it.

I like for non-trivial stuff to have a branch with a series of logical commits, cleanly rebased atop main, then use -no-ff to force a merge commit anyway. That way you can the whole branch appears as a single commit/diff atop main in primary history but you can dig in to the original components of it if/when that's useful.

The primary obstacle to doing this for me is, if I'm honest, not having automated it sufficiently that I can't forget to do that.