Hacker News new | ask | show | jobs
by austinshea 225 days ago
The value of the space is the ability to glean the commit author's thoughts, at the time they committed it.

This is extremely dissimilar to the thoughts of the author.

The value of what this emits is already handled by evaluating the diffs in the per-file history.

It's not good to throw this sort of thing over the fence, and justifying it by considering it to be wasteful of your precious time doesn't change that.

It's better to leave it blank, but a tool like this looks perfect to help someone avoid scrutiny, while simultaneously avoiding providing a tiny depiction of what they were thinking when they committed the change, at the expense of injecting vast amounts of noise.

4 comments

I would be more interested in an hybrid approach, an AI that when has low confidence in a generated commit message asks the user for their input: "Was this change meant to fix a bug? y/n" and about splitting commits "you changed 6 html files and 1 SQL file, they seem unrelated" should I split that into 2 separated commits? y/n"
I love it. A bit guided of assistance can really make all the difference.
That actually sounds like a good match for LLMs' ability to do fuzzy intuition and pattern-matching.
this is a really good idea.
The value of the space is whatever the user/team finds valuable.

You almost had the right idea there: the value of what this emits is really in the summary of diffs. I'm certainly not going to go through each commit and read the diff each time I look at the log, but I still want to understand what happened and be able to find individual commits. If extra information about the author's thoughts is just not available, I'd much rather have summaries than a blank log of "WIP" comments.

It's absurd to gatekeep commit messages to only "the thoughts of the author", even if that's what usually goes in there. A good diff summary might even be more useful than a ramble that doesn't mention important changes.

It's not absurd, and it's also not gatekeeping.

Why do you want a summary of the changes? The bulk of the information in a commit is the diff itself.

The information that is not contained in the diff is the author's intent, and that's what the space is meant to contain.

I'm not convinced you actually believe that a summary is valuable because this whole comment is coming from a very defensive place.

But you see now my LLM can use less tokens to parse his commit messages instead of the entire code base to then boil it all down to "85-- minor-- inconsequential-- commits emdash emdash emojji"
> The value of what this emits is already handled by evaluating the diffs in the per-file history.

I mean, for the initial development/contribution/PR workflow, I agree with you: any code reviewer should be reading the diffs anyway, and if you're reading the diffs, these messages (being purely summaries derived from the code itself without the LLM having any info about developer intent) don't add anything.

But that's not the only time commit messages matter. A tool "fixing up" bad commit messages before they're pushed to a PR branch like this, might still help with later code maintenance after the code is merged:

• When you or someone else is looking at the commit lines after the fact, in e.g. `git log` to find commits to cherry-pick, such summaries would be a substitute for having to go commit-by-commit reading the diffs to find the one you're looking for. Or when doing e.g. a `git bisect`, they'd allow the likely-offender commit to "jump out" at you from the list of remaining commits, after just the first few bisect steps, without having to do 10 more iterations to narrow it down with actual rebuilds+test suite runs.

• and when someone else is looking at `git blame` while bug-hunting, or seeing the latest commit that touched each file when browsing a github repo tree, having these summaries would be the difference between having an opaque timeline of "fix" -> "fix 2" -> "fix again" -> "update" -> "fix" commits to try to keep distinct in one's head (may as well just try to recognize commits by the abbreviated git ref at that point), vs. having commits with descriptive mnemonic "names".

Note that this tool is supposed to be retroactive, not incremental. It rewrites messages for existing commits, that already had some other message when they were initially committed; it doesn't have any function that you could use to do `EDITOR=this-program git commit` and have it generate a commit's original commit message just-in-time.

As the author says in the README, this tool was created with the goal of fixing one's private git commit history before making it public. At the point when this tool would be run (i.e. at the point a developer is trying to "clean up" their private git history for publication), it's often already been long enough since you created these commits, that you likely don't actually remember what you were thinking at the time you created them. Any information about "what [you] were thinking when [you] committed" has already been lost. "The rice has been cooked", per se.

At that point, there's no value you could add by going back over the commits manually, beyond that which this program could add. In both cases, whether you or the LLM is doing it, the result would just be a reconstruction (i.e. a guess) at what the original developer was thinking/trying to accomplish at the time.

---

Which is not to say that this tool is a substitute for making good commits (instead of random-junk-drawer snapshot-your-work "WIP commits") in the first place. And, in fact, I have a feeling that this tool is not nearly as widely applicable (at least in its current form) as its author thinks it is... because "WIP commits" don't generally have any good way of summarizing them; because "WIP commits" are often not coherent single-purpose edits of the code.

A better version of this tool, I think, would be one that rewrites a private work branch / PR branch by first squashing it into a patch, and then breaks that patch back apart into a series of commits that each "do one thing", essentially introducing the change in a literate programming style where you're meant to read the commit-series top to bottom.

In other words, exactly what an experienced software engineer who knows they'll need to maintain their own code will already do themselves, at commit time, with `git add -p` (and before commit time, by having the discipline to avoid getting distracted solving a second problem while in the middle of solving the first!) And it's also exactly the format that patch-based workflows like LKML already expect contributors to construct [from whatever they were doing internally] when submitting a patch-series, to allow the patch to be read and considered on the mailing list as essentially a linear literate-programming explanation of the changes.

The tool still wouldn't be able to recover the intent of each patch series if it was never added; so the commits would still just be descriptions of the kind a later "software archaeologist" or reverse-engineer would give to the code. But it would at least be generating descriptions for coherent chunks of code, rather than attaching descriptions to commits that were essentially "whatever probably-partial progress on whatever incoherent set of things the programmer was trying to do at the time they needed to sync their work to switch from their laptop to their workstation."

I agree with everything you're saying, and I think other archaeologists feel the same.

My opinionated take is: I wouldn't want to use this space for the information that this tool could provide, and rather leave it as the truth.

The truth is that it was committed without a meaningful message, and now I might recognize a chain of message-less commits, representing a moment in time where the authors were trying to figure out where they wanted to end up.

If the tool is producing this info simply by reading the diffs in the code, why not just use it when you need it, to help explain what you're digging through, instead of changing the commit history?

Either way, the critical detail is: People should get that detail out before the rice has been cooked, and that's what I do for myself, in my own private repos, and when others do that for future archaeologists, we all benefit.