Hacker News new | ask | show | jobs
by nemetroid 1940 days ago
> Who do we really help by pretending that we're more organized, coherent, and linear than we actually were?

We're helping the future reader who's reading the history because they want to understand why a change was made - and "because the author of the branch initially had the wrong idea" is almost never the answer they're looking for.

I sometimes enjoy reading stream-of-consciousness writing, but most of the time (especially when reading code) I'm more interested in the point itself. The same applies to version history. It can be used to tell the raw story, but there's usually a more useful and interesting story to be told.

3 comments

Exactly. I want to "tell a story" with my commits, and that story is really more of an idealized retelling of what I actually did.

Five years from now, no one needs to know that I forgot to add that one line to a prior commit and had to add it separately, or that my first attempt didn't quite pan out as expected.

What that future person _will_ care about is:

- What final changes actually got made?

- What task was I working on?

- What was the reason for any of these changes in the first place?

- Why did I make some of these changes specifically to implement that task?

- What additional side info is important context for understanding the diffs?

Exactly. It's also great to compartmentalize different aspects of your change.

Often my changes are

1. Refactor the existing code to support the new feature

2. Add the new feature

It's great to keep these separate, because someone can look at number 1 and see that the two versions of the code ought to be functionally the same (same tests pass, app looks the same, refactor is easy-to-understand), and look at number 2 and see the new feature.

There are countless other times where you want to tell the "story" in a logical fashion.

(Honestly, I expect that there is a significant correlation between being a good git committed and being a clear story-teller.)

I understand that you want to tell a story. But as someone examining your code, I also want to know how you got there. While you're throwing out your junk, you're also throwing away valuable information. If I'm taking the actual time to review the code history, then let me play it out in real-time, mistakes and all. I know how to step back and summarize, I don't need you do do that for me.

This is especially true if your code is clever. I'm much more likely to understand your polished gem if I can see all the things that you bumped into while you were discovering it.

That is what comments and commit messages are for. I trawl history all the time. Running into an unbisectable mess of a branch (because a bug that was introduced in commit X~15 is fixed in X on the same branch) is a complete nightmare. I have to discect the branch history and understand what is because of the branch and what is debugging/review/CI cycle cleanups. Commit messages for fixups also tend to be 100% terrible and utter trash. "Fix review comments". Thanks. If we're doing that, let's copy what the comment was in too and why it fixes it.

The problem with your request is that 90+% of the time (with the way I develop), the dead ends are on MRs that got closed or code that never got pushed in the first place. So again, comments as to why this approach is used is way better than hiding it in the history because someone coming to "clean up" code sees the thought process instead of having to remember to search for it.

I don't do much work like that -- I suspect you're part of a much larger developer team -- but I think I understand the problem you're describing.

Couldn't you simply review/bisect at the fork/join points? i.e., take the commits at which forks began or ended, ignore any intermediary commits, and run the bisect (or, read diffs) across that subset? That way you're only comparing at the chapter-markers of the story, so to speak, and not getting mired in the gory details.

Yes, `git bisect --first-parent` was a feature I wanted for a long time. It finally exists now, so yes that helps, but is not a complete solution.

Even with `bisect --first-parent`, I still want useful commit messages which "fixup" commits, again, are uniquely terrible at being on the whole.

I do software process and other things, so some of my branches tend to be gigantic (e.g., revamping the build system) and can be 200+ commits because one cannot meaningfully land a build system rewrite incrementally. That one in particular was meant to be bisectable because when rebasing on top of new development, I wanted fixes to be in the "port this library over" commit instead of after some random merge commit based on when I decided to sync up that week (it took a year to do it). So once I get it down to a particular MR, being able to inspect that topic is still a useful property.

Note that this only works with a `merge --no-ff` workflow too. The `rebase && merge --ff-only` pattern and `merge --squashed` are both terrible, IME, at making useful history. The force-rebase workflow is just as confounding to me as the no-rebase workflow (the former de-parallelizes your MR merge process and the latter tends to make a terrible commit history).

Note that even for single-developer projects I run, I tend to make PRs even for my own changes (once it's gotten off the ground).

While I understand and somewhat empathize with this desire (I'd use it all the time for personal repos, for example)... current VCS systems are terrible at supporting it.

What you probably want in this case is something like "automatically commit on every change (possibly recording every keystroke)" + "automatically tag based on tests/builds passing or failing" + "allow manual comments at any time, whether based on files changing or not". All of that is technically possible with git/hg/fossil/etc, but it's so much work for both the recorder and the viewer that it's infeasible.

This is great, except that we’re often bad at recounting this idealized history without lying in ways that make later maintenance more difficult
> or that my first attempt didn't quite pan out as expected.

Actually that's still important, it's just important from an architecture perspective.

As a much newer developer, the biggest problem I have with git is that I rarely end up actually making one change at a time. I'll be working on some larger thing, and in the process I'll notice and quickly fix a smaller thing before returning to the original task. This might be a typo in a code comment, a poorly named variable, or a block of code I realize is dead.

I suspect this is the type of tendency which goes away with experience, but it makes git a lot less useful. My commits won't really tell you what changed; the most they can tell you is the primary change I was working on.

Many of us do that, and it's not just a new developer thing. Git actually enables this, because you get to pick and choose what to add to the index (`git add`) before committing. So that little tweak you made in the unrelated function? -- no problem, just `git add` that later, and commit it under a different message. Not all SCM tools give you that kind of flexibility.

On the other hand, there's a diminishing return to placing every tiny change into a separate commit. Commit messages like "Fixed multiple small things" might make some people clutch their pearls, but sometimes you just need to get shit done and move on to solving bigger problems.

My suggestion is to consider breaking your commit into two: one for "fixed this big issue that everyone cares about", and one for "a bunch of tiny cleanup stuff that I happened to notice." (Maybe call that second one "refactoring" -- it will go over better with your audience.)

> Git actually enables this, because you get to pick and choose what to add to the index (`git add`) before committing.

That assumes the changes are in separate files though, right? I know you can do use the "-i" flag, but it's fairly labor intensive.

That kind of depends on your tooling. e.g., I use Magit (an Emacs front-end for git) which makes interactive mode really, really easy.

(But easy or not, other version control systems such as Subversion don't offer the feature at all. We kind of take Git for granted these days, but it wasn't always like that.)

A lighter weight option is the --patch flag to 'git add' and 'git commit'.
Personally I have gotten used to using `git commit --patch` for everything (even if I only have one change) just as a convenient way of reviewing the changes I am about to commit. With that, only committing part of the changes is no additional effort.
Look at `git add -i`. You can commit just part of a change to a file. So if you notice a small problem and already have a bunch of changes made, you can still make those changes, and commit them separately.

Up to you if you wanted to rebase those changes back onto main.

I don't use it often and find it's kind of painful to use, but if you're in the position where you've already saved two different things in your IDE and need to pull them apart for commit, it's a useful tool.

Have you tried using 'git commit --patch'? It makes it easy to separate out unrelated changes when committing. You can precede it with an invocation of 'git reset $HASH' to restructure your last few commits.

In general, more experienced git users aren't actually working on one commit at a time. They're just comfortable enough with editing history to make it look that way.

That is what OP was talking about: after you've done the change, make a commit with just that tiny refactoring. Once you're done and ready to review your work, you can cherry pick just that fix and move it to main / master / it's own PR. Since it is self-contained, it can be processed by itself only.
`git add -p` will take you through all the changes in your files, and let you add them selectively. I find this makes for much cleaner commits.
With the add command's interactive mode, it is often possible to selectively stage and commit individual patches in a file.

https://git-scm.com/book/en/v2/Git-Tools-Interactive-Staging

> We're helping the future reader who's reading the history because they want to understand why a change was made...

"Change" is a subject to interpretation. Most of the time it's the scope that the change belongs to is what has the meaningful value.

Say, changes made in connection to fixing an issue are logically tied for inclusion as well as for potential unwinding.

Some tangent changes technically should not be casually folded in, just in case this changeset will need to be propagated or rolled back.

Thus this elaborate muli-staged commit management in Git.

Many projects don't have such need to manange the change flow, so Version control is used as a kind of undo buffer. Which is fine, in such cases the meaning is tied to release states.

If anything, it makes more practical sense to preserve only commits with a buildable state, not just some transitional changes.