Hacker News new | ask | show | jobs
by flir 907 days ago
Try it like this, see what you think:

Commits serve two needs: saving your work and publishing it. Adopting an "early and often, explain what you did" approach is effective for saving, but when it comes to publication a "refine before release, explain why you did it" strategy is more valuable.

The commit history is an artifact of the development process, just like documentation, tickets, or even code. I'm sure you wouldn't complain about people taking the time to write better comments, and a commit message is like a super-comment, because it can apply across multiple files.

Honestly, do a maintenance programmer a favour - fix up your commits before publishing them. A linear history makes tools like bisect easier to work with.

1 comments

I wonder if the difference here is in what your quality threshold for a commit is. I commit when I reach a point of coherence in the code, and ensure that the code passes tests before I commit. Each commit is thus a checkpoint of coherence, where the points in between may be out of order or failing tests.

Maybe I just don't consider "saving your work" to be a valid use case for commits. Use an IDE or other local tools for that. Commits are points that are worth saving (or "publishing" if you prefer) beyond your local workspace.

So you're already doing curation of what the source history is! Us rebasers just do a little more, and we aren't afraid to rewrite history (before merging to master) to do it.

What happens when you're a few commits deep and realize one of your prior points of coherence could benefit from revision? Perhaps an extra live of documentation. Or a small bug fix. Or a new helper routine. I would go back to the commit where it belongs and put it there. Or, if it deserves it's own commit, then create a new one. But the point is that the source history is itself a tool I use to communicate with others (including my future self).

Agreed. This is why I rewrite history so I curate commits so that I have only 1 commit in main ever. You’re already doing curation, I just do a little more!
I realize you're trying to be cute, but my argument isn't "more curation is always better." My argument is, "if you're going to do curation anyway, you might as well acknowledge as such and maybe even be intentional about it."

Curation is a means to an end, not an end itself. And rewriting history on main would violate the obvious rule of not rewriting history that you collaborate with others on.

If you're genuinely curious, see my other comments in this thread. That should clarify things.

No, I’m reducing your argument to the absurd extreme. We both acknowledge there’s a line to be drawn. I would personally draw it at “the commit is the finest level of curation”, which reasonable people can disagree on.

I just find it absurd of you to argue that “we’re both curators if you think about it” as if that has anything pertinent to add to the conversation.

I don't see what's so absurd about it. On the one hand, we have people talking about the "actual" history and "coherence points." And on the other, we have people talking about rewriting history so that it is curated. For example, in the follow-up comment, they said, "So now you've erased the record of your actual process." As if there is one "actual" history and one that isn't. But neither are actual history, and that's what I'm pointing out. Pointing out that both are forms of curation is important because it makes it clear that the difference is a matter of degree, not of something categorical.

But no part of this leads one to conclude that the most possible curation is the best. So your "reducing your argument to the absurd extreme" does not follow. If you're trying to use it as a rhetorical device, then try harder. If you already acknowledge there's a line to be drawn and that both are forms of curation, then I don't see what we're disagreeing with.

> No, I’m reducing your argument to the absurd extreme. We both acknowledge there’s a line to be drawn.

I just explained in my previous comment why my argument doesn't let you draw the extreme conclusion. If you don't want to engage with it directly, then don't bother.

> as if that has anything pertinent to add to the conversation

Pot, meet kettle.

So now you've erased the record of your actual process, that might be revealing later to someone who's trying to figure out what the heck you were thinking, for the sake of trying to create a history that looks more linear or tidy than the reality of what happened, and, if you're not running tests and re-evaluating all the intermediate steps along your history, introducing the possibility that you've invalidated something that worked at one of those points in history and no longer does after you rewrite it.

This strikes me as a crazy fastidiousness over making your history look the way that you want it to look, rather than preserving the actual history, which is detrimental to the value of being able to find out what actually happened when something goes wrong.

> So now you've erased the record of your actual process

You have too! Unless you're recording every keystroke, which I assume you are not.

We are both curating source history. The only difference is that I'm intentional about it.

> that might be revealing later to someone who's trying to figure out what the heck you were thinking

More curation makes this easier, not harder.

> for the sake of trying to create a history that looks more linear or tidy than the reality of what happened

No. For the sake of communicating changes. Linear history and curated source history are just means to an end. They aren't an end to themselves.

> if you're not running tests and re-evaluating all the intermediate steps along your history, introducing the possibility that you've invalidated something that worked at one of those points in history and no longer does after you rewrite it.

A risk for sure. Not a big one in practice in my experience. And you can always configure CI to run on each commit, although the tooling to do this isn't great these days.

It's a downside for sure. But I'm very happy to pay it. Usually the worst thing that happens is you have to skip a commit now and then when doing a bisect. Reverts can also be more painful depending. If the pain becomes too great, then absolutely reevaluate. I wouldn't spend so much effort curating history if it just led to me fighting with it all the time. But it doesn't.

> This strikes me as a crazy fastidiousness

To be honest, based on your comments, it doesn't look like you've given that much thought to this. Firstly, you think the choice is between "actual" history and curated history, when in reality, the choice is between some incidental curation and intentional curation. Secondly, you seem to think I'm just doing this for the fun of it, it for the sake of it. But I'm doing it for the same reason I try to write code in a way that can be understood by others. That's it.

> which is detrimental to the value of being able to find out what actually happened when something goes wrong.

This tells me you've likely never worked in an environment where intentional curation was prevalent. Intentional curation makes this easier, not harder. It's one of its benefits and one of the reasons I do it. Intentional curation makes it much easier to understand the sequence of logical changes over time that has brought the code into its current state.

> You have too! Unless you're recording every keystroke, which I assume you are not.

Surely you can understand the difference between omitting less interesting points along a timeline and literally changing what was recorded retroactively for points that have been selected as meaningful along that timeline?

> More curation makes this easier, not harder.

Not when "curation" is revision after the fact. What you're describing as "curation" is changing the recorded/published history from states that were intentionally recorded and examined to new states that never were even run or examined anywhere. When I'm trying to answer the question "how did this ever work?" or "what were they thinking" and the answer is "it didn't", because they committed something different, this makes troubleshooting and determining intent infinitely more difficult and complicated.

> A risk for sure. Not a big one in practice in my experience.

I've definitely spent days of my life trying to track down inexplicable problems in other people's code as a result of their rebasing, that cannot be fully explained because the history of what they actually committed was erased.

> And you can always configure CI to run on each commit, although the tooling to do this isn't great these days.

What are you even calling "continuous integration" if you're not running tests on every commit? This also highlights that if you were doing that, which I do, and you should be, that history becomes misleading after a rebase unless you re-run tests against every commit.

> you think the choice is between "actual" history and curated history, when in reality, the choice is between some incidental curation and intentional curation

Again, do you not understand the difference between capturing something that actually occurred and changing that capture to be something that never occurred? Your curation is literally a series of lies about the code (that I understand you may find easier to read and more convenient for the goal of forming a high level understanding of the changes over time), whereas what I prefer is a faithful recording of history. The integrity of this captured history matters a lot when you're dealing with executable, deterministic code, and the outcome of running a program can be changed by your "curation".

> Surely you can understand the difference between omitting less interesting points along a timeline and literally changing what was recorded retroactively for points that have been selected as meaningful along that timeline?

You know, the word "history" might be abused by git as much as the word "friend" is by facebook.

BTW, I never really got an answer to this one: what do you do when you notice a typo ten seconds after you committed? A new commit that says "typo fix", or squash the commit on to the previous commit?

> Surely you can understand the difference between omitting less interesting points along a timeline and literally changing what was recorded retroactively for points that have been selected as meaningful along that timeline?

Yes? But that isn't what you said. You said "actual" history. Using that phrasing makes this conversation extremely difficult because it doesn't acknowledge that our positions are different by degrees than by categories. You said "actual" as-if it was somehow inherently better because it's the "actual" history. But it isn't the "actual" history. So your communication on this point just becomes befuddled. Please be more precise.

> Not when "curation" is revision after the fact.

How many times do we have to go over this? Unless you're recording every single keystroke, then you are also doing "revision after the fact." There are differences between our approaches, for sure, but "revision after the fact" does not capture them.

> When I'm trying to answer the question "how did this ever work?" or "what were they thinking" and the answer is "it didn't", because they committed something different, this makes troubleshooting and determining intent infinitely more difficult and complicated.

What are you talking about, "committed something different"? I don't take a PR, rewrite history and then merge it. I rewrite the history, push it back up to the PR branch and only merge (via rebase if appropriate, or sometimes via squash) when CI passes. The collection of commits still passes. CI doesn't guarantee that each individual commit does, but I already acknowledged and discussed that downside. The curation of commits is specifically all about making intent and understanding the change easier. That's the entire point!

> I've definitely spent days of my life trying to track down inexplicable problems in other people's code as a result of their rebasing, that cannot be fully explained because the history of what they actually committed was erased.

I can't even conceive of example of this. Can you give one? Even if it's hypothetical, that's fine.

To be clear, I can imagine the following examples of things going awry:

* Squashing is used which causes many commits to get squashed into one, and thus can make the history of changes less clear depending on the commits. For example, if a PR contains 2 commits where there's a thousand lines as a result of adding a new function parameter in the first commit, and then a second commit with one additional line calling the function using that new parameter in an interesting way, then squashing those two commits into 1 will lead to history that is less clear. But this is why I don't advocate for squash & merge in all cases.

* Since CI doesn't run on every commit, if you need to revert a PR the merged multiple commits via rebasing only, then you might need to revert all of the commits that came in from that PR individually. That can be a pain and it can be difficult to discover which commits you need to revert.

* Since CI doesn't run on every commit, it's possible that `git bisect` can be more annoying than it otherwise would be. Maybe tests don't build on one commit. Then you need to do `git bisect skip`.

But none of those are about browsing the search history when using rebase & merge. I can't even begin to imagine a single example of browsing the search history where I would specifically want an "actual" accounting of the history without any intentional curation. In literally every instance of me browsing source history in over 20 years of programming, I cannot imagine a single instance where I found curation to be unhelpful and wished that the source history was somehow more faithful to how the programmer arrived at the change instead of focusing on communicating the change to other programmers.

> What are you even calling "continuous integration" if you're not running tests on every commit? This also highlights that if you were doing that, which I do, and you should be, that history becomes misleading after a rebase unless you re-run tests against every commit.

If you open a PR on a GitHub project with 5 commits, GitHub Actions will not run on each commit by default. I'm not aware of easy way of changing that behavior. If you "rebase & merge" that PR, CI still won't run on every commit merged. Here's an example from one of my projects, where you can clearly see that not every commit has a green checkmark: https://github.com/BurntSushi/ripgrep/commits/master/

I run dozens of projects this way. I've never had a major issue because it just isn't a big deal if one commit now and then doesn't pass tests. If it were a bigger deal, then I'd absolutely either reconsider my curation or invest more in improving CI tooling.

> Again, do you not understand the difference between capturing something that actually occurred and changing that capture to be something that never occurred?

We're speaking past each other. I don't know how else I can explain that there is no such thing as "capturing what actually occurred." You keep saying that, but even in that case, you aren't capturing what actually occurred. You're capturing an ad hoc curation of what actually occurred.

> Your curation is literally a series of lies about the code (that I understand you may find easier to read and more convenient for the goal of forming a high level understanding of the changes over time), whereas what I prefer is a faithful recording of history. The integrity of this captured history matters a lot when you're dealing with executable, deterministic code, and the outcome of running a program can be changed by your "curation".

You don't have a faithful recording of history though. Your source history is also a lie. And the thing you call a "faithful recording of history" is more like a meandering series of "fix typo" or "fix lint" or whatever commits. The only benefits it has that I'm aware of are the following:

* It's easier, in the sense that you don't pay any attention to how a patch series is structured. You just code and commit and don't worry about anything. To me, this is like writing code without caring about whether someone else (including you) can read & understand it. Which is a thing. Lots of people do that. Let's just be open and transparent about it.

* In some cases, there is less friction with the tooling.

I still don't think you've actually tried the type of curation I'm talking about. On the other hand, I arrived at my position on curation after years of doing your approach of capturing a "faithful recording of history" and realized it was just about useless.

Kind of. I've been thinking about it since I wrote that, and what I say and what I do are a bit different. I don't rebase when I'm done, I rebase as I go as a constant background hum.

So I start by creating a few empty-ish commits that are roughly analogous to the tasks you'd break a ticket down into. Then I create many small WIP commits, but in the commit message I note which task they belong to. So I might have two initial commits in a branch that say "[#123] Refactor foo" and "[#123] Upgrade bar", followed by a bunch of "[WIP] typo fix, merge against foo" and "[WIP] Preparing baz to upgrade bar". Then when I feel I've reached a point of sanity I pull the main branch, rebase my feature branch on top of it, merging my WIP commits as I go. Occasionally I'll even go back and split a WIP commit in half if there's a better logical mapping to the tasks.

If I haven't pushed I don't consider it saved, so I wouldn't like to rely on local-only tools in that way. I'd much rather push to a remote repo daily. It's not like anyone's going to see it until I raise a PR.

What do you do when you spot a typo ten seconds after you committed something? A separate typo fix commit? I prefer to merge it on to the previous commit. Nobody needs to run "git blame" and see "typo fix" as the last time that line was touched. It's noise.

> So I start by creating a few empty-ish commits that are roughly analogous to the tasks you'd break a ticket down into.

This is absolutely wild to me, and admittedly not a way I've ever imagined source control being used. I can't say that I have a fully developed opinion of it, but I have a feeling this would drive me nuts as a reviewer. It seems like you're using source control to craft a descriptive history around your changes, designed to tell a story you wanted to tell rather than the messy, authentic history that reveals the struggles you went through and problems you solved along the way. But by doing so, you're creating a fabricated history and losing the aspect that is more like an audit log. So I would just not trust any of it other than the outcome.

I simply don't give much value to human narratives about code, so that's why I prefer a messy history that's a reliable log of the steps you actually went through over a narrative history that might be nicer to read.

I think the whole [WIP] approach quite a good workaround to saving vs feature complete commits. TBH i don't bother rebasing [WIP]s but I understand why that might be desirable. Each non [WIP] commit should be a complete, fully integrated feature (and ideally only one "feature").