Hacker News new | ask | show | jobs
by burntsushi 906 days ago
> Surely you can understand the difference between omitting less interesting points along a timeline and literally changing what was recorded retroactively for points that have been selected as meaningful along that timeline?

Yes? But that isn't what you said. You said "actual" history. Using that phrasing makes this conversation extremely difficult because it doesn't acknowledge that our positions are different by degrees than by categories. You said "actual" as-if it was somehow inherently better because it's the "actual" history. But it isn't the "actual" history. So your communication on this point just becomes befuddled. Please be more precise.

> Not when "curation" is revision after the fact.

How many times do we have to go over this? Unless you're recording every single keystroke, then you are also doing "revision after the fact." There are differences between our approaches, for sure, but "revision after the fact" does not capture them.

> When I'm trying to answer the question "how did this ever work?" or "what were they thinking" and the answer is "it didn't", because they committed something different, this makes troubleshooting and determining intent infinitely more difficult and complicated.

What are you talking about, "committed something different"? I don't take a PR, rewrite history and then merge it. I rewrite the history, push it back up to the PR branch and only merge (via rebase if appropriate, or sometimes via squash) when CI passes. The collection of commits still passes. CI doesn't guarantee that each individual commit does, but I already acknowledged and discussed that downside. The curation of commits is specifically all about making intent and understanding the change easier. That's the entire point!

> I've definitely spent days of my life trying to track down inexplicable problems in other people's code as a result of their rebasing, that cannot be fully explained because the history of what they actually committed was erased.

I can't even conceive of example of this. Can you give one? Even if it's hypothetical, that's fine.

To be clear, I can imagine the following examples of things going awry:

* Squashing is used which causes many commits to get squashed into one, and thus can make the history of changes less clear depending on the commits. For example, if a PR contains 2 commits where there's a thousand lines as a result of adding a new function parameter in the first commit, and then a second commit with one additional line calling the function using that new parameter in an interesting way, then squashing those two commits into 1 will lead to history that is less clear. But this is why I don't advocate for squash & merge in all cases.

* Since CI doesn't run on every commit, if you need to revert a PR the merged multiple commits via rebasing only, then you might need to revert all of the commits that came in from that PR individually. That can be a pain and it can be difficult to discover which commits you need to revert.

* Since CI doesn't run on every commit, it's possible that `git bisect` can be more annoying than it otherwise would be. Maybe tests don't build on one commit. Then you need to do `git bisect skip`.

But none of those are about browsing the search history when using rebase & merge. I can't even begin to imagine a single example of browsing the search history where I would specifically want an "actual" accounting of the history without any intentional curation. In literally every instance of me browsing source history in over 20 years of programming, I cannot imagine a single instance where I found curation to be unhelpful and wished that the source history was somehow more faithful to how the programmer arrived at the change instead of focusing on communicating the change to other programmers.

> What are you even calling "continuous integration" if you're not running tests on every commit? This also highlights that if you were doing that, which I do, and you should be, that history becomes misleading after a rebase unless you re-run tests against every commit.

If you open a PR on a GitHub project with 5 commits, GitHub Actions will not run on each commit by default. I'm not aware of easy way of changing that behavior. If you "rebase & merge" that PR, CI still won't run on every commit merged. Here's an example from one of my projects, where you can clearly see that not every commit has a green checkmark: https://github.com/BurntSushi/ripgrep/commits/master/

I run dozens of projects this way. I've never had a major issue because it just isn't a big deal if one commit now and then doesn't pass tests. If it were a bigger deal, then I'd absolutely either reconsider my curation or invest more in improving CI tooling.

> Again, do you not understand the difference between capturing something that actually occurred and changing that capture to be something that never occurred?

We're speaking past each other. I don't know how else I can explain that there is no such thing as "capturing what actually occurred." You keep saying that, but even in that case, you aren't capturing what actually occurred. You're capturing an ad hoc curation of what actually occurred.

> Your curation is literally a series of lies about the code (that I understand you may find easier to read and more convenient for the goal of forming a high level understanding of the changes over time), whereas what I prefer is a faithful recording of history. The integrity of this captured history matters a lot when you're dealing with executable, deterministic code, and the outcome of running a program can be changed by your "curation".

You don't have a faithful recording of history though. Your source history is also a lie. And the thing you call a "faithful recording of history" is more like a meandering series of "fix typo" or "fix lint" or whatever commits. The only benefits it has that I'm aware of are the following:

* It's easier, in the sense that you don't pay any attention to how a patch series is structured. You just code and commit and don't worry about anything. To me, this is like writing code without caring about whether someone else (including you) can read & understand it. Which is a thing. Lots of people do that. Let's just be open and transparent about it.

* In some cases, there is less friction with the tooling.

I still don't think you've actually tried the type of curation I'm talking about. On the other hand, I arrived at my position on curation after years of doing your approach of capturing a "faithful recording of history" and realized it was just about useless.

2 comments

> Unless you're recording every single keystroke, then you are also doing "revision after the fact." There are differences between our approaches, for sure, but "revision after the fact" does not capture them.

He corrects mistakes forward (new commit fixes old commit), we correct them backwards when possible (just fix the old commit directly), otherwise forwards. I know which I prefer, but nobody's going to convince anybody.

I just wish I didn't have to wade through dozens of pointless "Fold in John's suggestions from the PR" commits when trying to get to the meat. Or have git bisect land on a merge commit with two parents, throw up its hands and say "over to you, pal".

Funny thing is, I'm normally a proponent of "worse is better". I wonder why I'm not in this case. Probably because a rebase repo is a single train track, and so much easier to reason about.

(I bet the answer to your "when did rebase screw up so badly it took days to unpick" is something to do with "push --force". With great power...)

Yeah I actually don't necessarily mean to convince anyone of my way of doing things. I'm more or less just trying to convince others to both convey their ideas more clearly, and more importantly, recognize the trade offs in each of the approaches accurately. I do not feel either one of those things is really being done here.

I feel like this sort of confusion comes up every time there's a discussion about rebase versus merge. My favorite explanation of this is the combination of overloaded terminology (jargon versus layspeak, e.g., "history" and "merge") and inexperience. I don't jump in every time, but when I do, it feels like I'm banging my head against the wall. Sigh.

> He corrects mistakes forward (new commit fixes old commit), we correct them backwards when possible (just fix the old commit directly), otherwise forwards. I know which I prefer, but nobody's going to convince anybody.

I understand this. But I don't like the phrasing because it doesn't tell you anything about the differences in the approaches, when you might want to use one over the other and the trade-offs.

> I just wish I didn't have to wade through dozens of pointless "Fold in John's suggestions from the PR" commits when trying to get to the meat. Or have git bisect land on a merge commit with two parents, throw up its hands and say "over to you, pal".

Yeah those pointless commits are why I curate. And the overhead of submitting one-PR-per-commit is why I use both rebase & merge and squash & merge on GitHub.

> Funny thing is, I'm normally a proponent of "worse is better". I wonder why I'm not in this case. Probably because a rebase repo is a single train track, and so much easier to reason about.

I definitely don't chase perfection here. I don't mind having a commit that doesn't build or pass tests now and then. What I'm after is communicating clearly. Both to folks reviewing my code and to folks looking at the source history 6 months from now. I quite literally try to treat source history like I treat the code itself. Both things benefit from thinking about how other humans are going to interpret it in the future.

> (I bet the answer to your "when did rebase screw up so badly it took days to unpick" is something to do with "push --force". With great power...)

I'm sure I bungled things up pretty badly when I was first learning `git rebase`, but that was so long ago I can't remember. In working memory, the worst fuckups with rebase have been with `push --force` (well, `--force-with-lease`) and dependent PRs. But I just recently learned about `git rebase --update-refs`, and that's already made things a lot nicer.

Oh, sweet. That's going to make the branch I'm working on right this minute easier to deal with.
> You said "actual" as-if it was somehow inherently better because it's the "actual" history. But it isn't the "actual" history. So your communication on this point just becomes befuddled. Please be more precise.

actual: "Existing in reality and not potential, possible, simulated, or false: synonym: real." history: "A chronological record of events, as of the life or development of a people or institution, often including an explanation of or commentary on those events."

One of these things is a representation of the state of a codebase at a point in time. The other is a representation of a state of the codebase that never existed at any point in time.

> our positions are different by degrees than by categories

Absolutely not. I'm not sure how to interpret your comments as other than that you may not understand what rebase is actually doing.

A commit is a snapshot of a codebase at a point in time. If you commit when you've run your program, recording a point along the path of modifying the code where you've observed the codebase to be consistent, rebasing retroactively changes the snapshot of the codebase to something that you have never examined.

If foo.c defines a function foo that calls a function bar in bar.c, and you've updated the way that you call foo in foo.c and someone else updated the behavior of bar in bar.c, the act of rebasing in itself can change the output of your program without recording the step of making that change, and without you ever observing the program's behavior after that change (and before any other commits you've presumably made to get your code to its current state).

Are we at least on the same page that rebasing in itself makes changes to the atomic bits of recorded history, irrespective of what the size of those atoms are? You seem to be fixated on the size of steps being recorded, which is completely irrelevant to the point that rebase is retroactively changing the composition/snapshot of each step. The difference is between an immutable log of immutable events and a mutable log of mutable events. One of those is easier to reason about.