Hacker News new | ask | show | jobs
by wandernotlost 907 days ago
> rebasing creates a cleaner, more understandable history & state of the world without the clutter of merge commits

"Cleaner", for some definition of "clean". In this case, pretty, not accurate.

I just can't understand the draw of rebase based workflows. It seems to be an expression of a preference for aesthetics over accuracy. What is the point of source control, other than to reliably capture what actually happened in history? As soon as you start rewriting that, you compromise the main purpose.

Using merge commits preserves what you actually committed. If you ran tests before you committed, rewriting that commit invalidates that testing. If you need to go back and discover where a problem was introduced or what actually happened, with certainty, in a commit history, rebase undermines that, because it retroactively changes your commits.

It's like a huge portion of the industry is collectively engaging in a lie, so that our commit histories look prettier.

8 comments

> What is the point of source control, other than to reliably capture what actually happened in history?

Unless you're committing every keystroke, you're recording a curated history. You choose when to commit, and by choosing you declare some historical states to be worth keeping and the rest to be merely incidental.

I think usually history "rewriting" (eg, rebasing) is much more about curation - choosing which aspects of the history you care to record - than it is about presenting a false record.

Exactly. To analogize to history history: OP wants the version control history to look like a collection of primary sources. Here's the president's daily calendar, there's the letter he received on April 24 from a small child in Wisconsin. In this model, it's up to future code historians to piece it all together into a story.

When I go back and look at the git history, I would much rather have had someone do the work of compiling the story for me at the time. Commits are your chance to document what you did for future programmers (including future you). If you insist on them faithfully reflecting every change you made over the course of three days, then future you will have to piece that all back together into a coherent story.

Why not take the chance to tell the story now, so that future you can skip all the false starts and failed experiments and just see the code that actually made it into main?

This isn't a novella. We're talking about executable code. What you're suggesting is the equivalent of using an encyclopedia as a legal reference.

Merge commits tell the coherent story. Commits reveal the messy history that got you there, which is critical exactly when you need to look at history. If you're not trying to track down the source of a problem and how it was introduced, in a deterministic way, why do you bother keeping source history? Publish pretty changelogs instead.

Can you give a concrete example of when you've used the messy details of how a change was introduced at a sub-PR level?

I'm strongly opposed to squashing, but when have you found that a chronological sequence of commits-as-they-were-committed has been helpful where a sequence of heavily-cleaned-up patches would have obscured useful information?

In my experience spelunking through git history, I've only ever been frustrated at the number of different red herrings I've found in a git blame that turned out to be a failed experiment that never got merged in.

Concretely: API changes are a big one, where in the history it looks like we may have once accepted something different than we do now, but then it turns out that that change was reverted before ever making it to production. This information being in the log clutters the git blame (the function was actually last changed in 2016, but someone modified it last month only to revert the change before submitting a PR), without providing an ounce of useful information about the history of the production app.

As a rule, when debugging problems, I don't care about how your private branches changed over time, I care about how the production code changed over time.

> the function was actually last changed in 2016, but someone modified it last month only to revert the change before submitting a PR

I can't think of a specific example from my own history, but something like this is what has happened. A function was changed in order to support a different change elsewhere in the code. That other change was later modified, incompletely, to remove the need to modify the first function, and the change to the first function was subsequently reverted. Down the road, it's discovered that the modification was incomplete, and when reviewing the new code, you wonder, "how could this possibly have ever worked?" The answer is that it didn't, and when it was committed, there was another supporting change that made it work. By erasing the history of that other change, you remove the possibility of discovering the reasoning behind the change and the source of the introduction of a problem.

If I had seen that intermediate state that's been erased, remember it, and try to find it, now I'm being gaslit by source control, because I remember a real change that was there in a commit, but source control now will lie to me and tell me that it never existed.

> I'm strongly opposed to squashing

> As a rule, when debugging problems, I don't care about how your private branches changed over time, I care about how the production code changed over time.

Ironically, squashing is probably the best tool you have to deal with developers who won't clean up their PRs. It's a pretty blunt tool though.

Its better to have the full detail in the case of an audit. It's almost guaranteed to be in the developers benefit.
Can you provide more details of what you're referring to? I understand the importance of an auditable trunk/production branch, but I'm having a hard time imagining why the sequence of commits on feature branches would matter in an audit.

The commit history is not an audit log, it's very easy to make it look like whatever you want it to look like, even if rebasing as such is banned. I have a hard time picturing a scenario where the commit history is trusted as an audit trail and it matters that every detail is present.

I'm referring to an outside certified audit of your code. You can make it look worse for yourself with rebases/squash merges but assuming you are working legitimately those would tend to obscure your work in realtime. What you as a developer would want is to be able to mirror your code changes along with the change requests.
Okay, but rebasing is changing each point in time of that history–that you curated by choosing when to commit–to be something different from what it ever was, retroactively. It's literally creating an entirely new history that nobody has ever actually examined, introducing the possibility that points along that history are inconsistent with what was intended at the point of each commit.
> creating an entirely new history that nobody has ever actually examined

I think the confusion here is that you're assuming that OP's commit history looks like yours, with dozens of commits per PR that no one could possibly examine in detail with each rebase. At least for me, since I'm okay with rewriting history on local branches, I have a very small number of commits that do get examined each time I rebase.

I average 3-4 commits per PR. There's usually one that refactors the existing code to lay the foundation for a new feature, maybe one that just moves a few files around (to ensure git recognizes them as moves and not delete/recreate), and 1-2 that introduce the new feature.

When I rebase on main, I examine the diff for each commit before pushing to my branch. If something has meaningfully changed, then I adjust the commits appropriately.

My commits aren't a history of what actually happened, they're a description of the steps that it takes to add a feature to (or fix a bug in) main. If main changes in a way that introduces a conflict, I want to reevaluate each step that I'd previously laid out.

Try it like this, see what you think:

Commits serve two needs: saving your work and publishing it. Adopting an "early and often, explain what you did" approach is effective for saving, but when it comes to publication a "refine before release, explain why you did it" strategy is more valuable.

The commit history is an artifact of the development process, just like documentation, tickets, or even code. I'm sure you wouldn't complain about people taking the time to write better comments, and a commit message is like a super-comment, because it can apply across multiple files.

Honestly, do a maintenance programmer a favour - fix up your commits before publishing them. A linear history makes tools like bisect easier to work with.

I wonder if the difference here is in what your quality threshold for a commit is. I commit when I reach a point of coherence in the code, and ensure that the code passes tests before I commit. Each commit is thus a checkpoint of coherence, where the points in between may be out of order or failing tests.

Maybe I just don't consider "saving your work" to be a valid use case for commits. Use an IDE or other local tools for that. Commits are points that are worth saving (or "publishing" if you prefer) beyond your local workspace.

So you're already doing curation of what the source history is! Us rebasers just do a little more, and we aren't afraid to rewrite history (before merging to master) to do it.

What happens when you're a few commits deep and realize one of your prior points of coherence could benefit from revision? Perhaps an extra live of documentation. Or a small bug fix. Or a new helper routine. I would go back to the commit where it belongs and put it there. Or, if it deserves it's own commit, then create a new one. But the point is that the source history is itself a tool I use to communicate with others (including my future self).

Agreed. This is why I rewrite history so I curate commits so that I have only 1 commit in main ever. You’re already doing curation, I just do a little more!
I realize you're trying to be cute, but my argument isn't "more curation is always better." My argument is, "if you're going to do curation anyway, you might as well acknowledge as such and maybe even be intentional about it."

Curation is a means to an end, not an end itself. And rewriting history on main would violate the obvious rule of not rewriting history that you collaborate with others on.

If you're genuinely curious, see my other comments in this thread. That should clarify things.

No, I’m reducing your argument to the absurd extreme. We both acknowledge there’s a line to be drawn. I would personally draw it at “the commit is the finest level of curation”, which reasonable people can disagree on.

I just find it absurd of you to argue that “we’re both curators if you think about it” as if that has anything pertinent to add to the conversation.

So now you've erased the record of your actual process, that might be revealing later to someone who's trying to figure out what the heck you were thinking, for the sake of trying to create a history that looks more linear or tidy than the reality of what happened, and, if you're not running tests and re-evaluating all the intermediate steps along your history, introducing the possibility that you've invalidated something that worked at one of those points in history and no longer does after you rewrite it.

This strikes me as a crazy fastidiousness over making your history look the way that you want it to look, rather than preserving the actual history, which is detrimental to the value of being able to find out what actually happened when something goes wrong.

> So now you've erased the record of your actual process

You have too! Unless you're recording every keystroke, which I assume you are not.

We are both curating source history. The only difference is that I'm intentional about it.

> that might be revealing later to someone who's trying to figure out what the heck you were thinking

More curation makes this easier, not harder.

> for the sake of trying to create a history that looks more linear or tidy than the reality of what happened

No. For the sake of communicating changes. Linear history and curated source history are just means to an end. They aren't an end to themselves.

> if you're not running tests and re-evaluating all the intermediate steps along your history, introducing the possibility that you've invalidated something that worked at one of those points in history and no longer does after you rewrite it.

A risk for sure. Not a big one in practice in my experience. And you can always configure CI to run on each commit, although the tooling to do this isn't great these days.

It's a downside for sure. But I'm very happy to pay it. Usually the worst thing that happens is you have to skip a commit now and then when doing a bisect. Reverts can also be more painful depending. If the pain becomes too great, then absolutely reevaluate. I wouldn't spend so much effort curating history if it just led to me fighting with it all the time. But it doesn't.

> This strikes me as a crazy fastidiousness

To be honest, based on your comments, it doesn't look like you've given that much thought to this. Firstly, you think the choice is between "actual" history and curated history, when in reality, the choice is between some incidental curation and intentional curation. Secondly, you seem to think I'm just doing this for the fun of it, it for the sake of it. But I'm doing it for the same reason I try to write code in a way that can be understood by others. That's it.

> which is detrimental to the value of being able to find out what actually happened when something goes wrong.

This tells me you've likely never worked in an environment where intentional curation was prevalent. Intentional curation makes this easier, not harder. It's one of its benefits and one of the reasons I do it. Intentional curation makes it much easier to understand the sequence of logical changes over time that has brought the code into its current state.

> You have too! Unless you're recording every keystroke, which I assume you are not.

Surely you can understand the difference between omitting less interesting points along a timeline and literally changing what was recorded retroactively for points that have been selected as meaningful along that timeline?

> More curation makes this easier, not harder.

Not when "curation" is revision after the fact. What you're describing as "curation" is changing the recorded/published history from states that were intentionally recorded and examined to new states that never were even run or examined anywhere. When I'm trying to answer the question "how did this ever work?" or "what were they thinking" and the answer is "it didn't", because they committed something different, this makes troubleshooting and determining intent infinitely more difficult and complicated.

> A risk for sure. Not a big one in practice in my experience.

I've definitely spent days of my life trying to track down inexplicable problems in other people's code as a result of their rebasing, that cannot be fully explained because the history of what they actually committed was erased.

> And you can always configure CI to run on each commit, although the tooling to do this isn't great these days.

What are you even calling "continuous integration" if you're not running tests on every commit? This also highlights that if you were doing that, which I do, and you should be, that history becomes misleading after a rebase unless you re-run tests against every commit.

> you think the choice is between "actual" history and curated history, when in reality, the choice is between some incidental curation and intentional curation

Again, do you not understand the difference between capturing something that actually occurred and changing that capture to be something that never occurred? Your curation is literally a series of lies about the code (that I understand you may find easier to read and more convenient for the goal of forming a high level understanding of the changes over time), whereas what I prefer is a faithful recording of history. The integrity of this captured history matters a lot when you're dealing with executable, deterministic code, and the outcome of running a program can be changed by your "curation".

Kind of. I've been thinking about it since I wrote that, and what I say and what I do are a bit different. I don't rebase when I'm done, I rebase as I go as a constant background hum.

So I start by creating a few empty-ish commits that are roughly analogous to the tasks you'd break a ticket down into. Then I create many small WIP commits, but in the commit message I note which task they belong to. So I might have two initial commits in a branch that say "[#123] Refactor foo" and "[#123] Upgrade bar", followed by a bunch of "[WIP] typo fix, merge against foo" and "[WIP] Preparing baz to upgrade bar". Then when I feel I've reached a point of sanity I pull the main branch, rebase my feature branch on top of it, merging my WIP commits as I go. Occasionally I'll even go back and split a WIP commit in half if there's a better logical mapping to the tasks.

If I haven't pushed I don't consider it saved, so I wouldn't like to rely on local-only tools in that way. I'd much rather push to a remote repo daily. It's not like anyone's going to see it until I raise a PR.

What do you do when you spot a typo ten seconds after you committed something? A separate typo fix commit? I prefer to merge it on to the previous commit. Nobody needs to run "git blame" and see "typo fix" as the last time that line was touched. It's noise.

> So I start by creating a few empty-ish commits that are roughly analogous to the tasks you'd break a ticket down into.

This is absolutely wild to me, and admittedly not a way I've ever imagined source control being used. I can't say that I have a fully developed opinion of it, but I have a feeling this would drive me nuts as a reviewer. It seems like you're using source control to craft a descriptive history around your changes, designed to tell a story you wanted to tell rather than the messy, authentic history that reveals the struggles you went through and problems you solved along the way. But by doing so, you're creating a fabricated history and losing the aspect that is more like an audit log. So I would just not trust any of it other than the outcome.

I simply don't give much value to human narratives about code, so that's why I prefer a messy history that's a reliable log of the steps you actually went through over a narrative history that might be nicer to read.

I think the whole [WIP] approach quite a good workaround to saving vs feature complete commits. TBH i don't bother rebasing [WIP]s but I understand why that might be desirable. Each non [WIP] commit should be a complete, fully integrated feature (and ideally only one "feature").
I don’t look at anything other than the merge to a trunk or main as part of the history. It’s not an audit log. I often do check point commits to move local state to a central git as a backup, or commit when I simply want to have a rollback option for something I’m not confident in. I always commit at the end of a day, for instance, and push to a remote, as I don’t trust my laptop or whatever, or worse some cloud dev machine.

None of these commits are useful for anyone, not even myself, beyond the immediate utility. I squash intermediate commits between change sets, and try to only reveal atomic change sets on any shared branch.

It’s absolutely the history of what has changed, but it is not some sort of journal log of every event in my development workflow the shared branch should absolutely be the evolved history of the source code, but without reflecting the work style of any one developer. It should be a comprehensible history of meaningful changes that can be independent reasoned about and cherry picked or reverted to as necessary. Every other commit is noise to everyone, including yourself, once it leaves your own branch. Since it didn’t even run in production there’s not even a plausible regulatory reason to keep them.

Why not have both? If you can filter by merges, what's the harm in having intermediate positions? There have been various points I actually wanted to have the vim undo log as well. That's what I'd really like - essentially a way of undoing back to time zero, with commits denoting feature complete positions and merges denoting, well, mergeable positions that have passed review.
In branch based development you do have both, in your branch. But if you’re working with other people, do they want your undo log? Is there any value to your undo log in say 10 years? The git repo on the main branch is your shared artifact, and as a matter of good practice is should be treated a shared resource that’s presentable to all, presents an easily understood and consumed interface, and is free of individual noise. If you want an undo log for your work on X, then when you merge your branch with main, don’t delete your branch. But the shared artifact shouldn’t be filled with everyone’s individual work process artifacts.
This, I really don't mind merge commits, it's nice to see what happened when. Especially if you run into conflicts and issues caused by bad resolution it's much better to have a clear true history.
the point of git is to enable linus or al viro or whoever to review your proposed changes as quickly and efficiently as possible, so they can be confident that what they're merging into their kernel tree is relatively sane, and then to actually do the merge in a reliable way that won't introduce other unintentional changes, and to be able to reproduce their own previous state

in that context it makes sense to use rebase to present linus with the cleanest, most comprehensible patch set possible, not your lab notebook of all the experiments you tried and the obvious bugs you had. you don't want to waste linus's time saying 'you have an obvious bug in commit xyz' followed by 'oh, never mind, you fixed that in commit abc'

but for my own stuff i prefer merge over rebase because i'm both the producer and the consumer of the feature branch, and rebase seems like more work and more risk

I see no issues here.

If you run tests before commit then you also run them after rebase, same way as after merge. If tests failed - you can force pull your branch from remote and have the same state as before rebase.

You run tests against each commit in the history that you're rebasing? I doubt it, and I guarantee that nearly nobody using rebase does that.
I agree. But that was what been described.

In my experience people are not running tests locally at all. Push to the remote, open pull request and wait for pipeline results.

In such situation the result will be the same: you will never know which commit from merge/rebase brakes your pipeline tests.

> "Cleaner", for some definition of "clean". In this case, pretty, not accurate.

What do you mean "accurate"? The developer decides when to commit and what message to write, rebasing just enables more control over the final artifact that is shared.

Have you ever heard the writing advice: don't write and edit at the same time?

Rebasing allows one to use the full power of git during development, committing frequently, and creating a very fine grained record of progress while working, without committing to leaving every commit on the permanent record. The official record of development history is more useful if it's distilled down to the essence of changes, with a strictly linear history, and no commits that break CI or were not shippable to production (at least in theory). Doing so makes future analysis and git-bisect operations much more efficient, and allows future developers to better understand the long arc of the project without wading through every burp and fart the programmers did during their individual coding process.

To those who say, "don't commit until you have a publishable unit of work," I say, you are depriving yourself of a valuable development tool. To those who say, "don't rebase, just squash", I say, squashing is rebasing, just without curation. To those who say, "rebasing is more error prone than merging", I say, if a merge commit turns out to have a problem you will have a much harder problem debugging it because it could be caused by either branch, or an interaction which no one considered.

The beauty of rebasing is that it forces the developer to think about all the intervening changes commit by commit as if they started their feature development from the current state of the main branch. This is a more healthy mental model and puts more responsibility on the developer to ensure their code reflects the current state of the world, and not just hastily merging without recognition of what has changed since then. After all, production can only have one commit on it at a time, and given many investigations hinge on understanding what SHAs were in production at what point in time, it makes everything a lot easier with a linear history that hews closely to what was actually shipped.

I realize that there's a learning curve for rebasing, but once you understand it, it allows conflict resolution to be resolved much more precisely with roughly the same level of effort. You can dismiss this as an aesthetic preference, along with good commit messages, changelogs and other points of software craftsmanship, but in my experience that there is real value in maintaining a high quality history on a long-lived project.

This. It's a dirty lie, that's not what actually happened!
Why does it matter what actually happened? Can you give a concrete example of when you care the exact sequence of experiments, false starts, and refinements that a feature went through before making it into a PR?