Hacker News new | ask | show | jobs
by chriswarbo 3589 days ago
There seems to be a split among git users between those who think history should show what actually happened (i.e. it's left alone), and those who think history should tell a story about the changes (i.e. once you've finished something, turn it into a coherent set of commits like "stub out X", "add tests for X", "make first X test pass", etc.).

I agree with you, that history should be left alone; mostly I think of the YAGNI argument that its futile to think that you have a better idea of what future developers want to see, compared to those future developers themselves.

My repo histories are riddled with stuff like "finished X", "stubbed out Y", "fix typo in X", but at least nothing has been hidden from future devs who might be digging around for their purposes, regardless of whatever elegant story I might come up with.

5 comments

As I understand "undo" it's a convenient feature for addressing quickly-discovered errors.

If you rapidly realize that you committed to the wrong branch, or left a line of code half-finished, or misspelled a word, there's very little value in logging that. If you had seen it two seconds before the commit you would have fixed it without a second thought, so why insist on preserving it seconds after the commit?

Presumably (if only for safety) no one is using 'undo' on anything pushed to a shared repo. I can appreciate the argument that we shouldn't rewrite history into a nice, streamlined narrative, but I don't see much reason to avoid tools like 'amend' for fixing commit messages, or 'revert' when some silly line of test code gets committed (and not pushed).

When I'm dealing with other people's Git histories, I appreciate the middle ground approach most. There's no point in spinning some imaginary, elegant story - if it's not real history then write up an essay instead of storing it in your 'history'. But I also don't need to see every line of "oops, un-stubbed X" - my experience is that at least for immediate fixes it only makes things harder to read.

I think a solid middle ground can be found. By focusing on making clean commits, instead of being tempted to make "whoops fixed" commits, you will become a better developer and your code will be cleaner. And your history, too.

At the very least, review your commits and clean then up before pushing - merge 'fix' commits if you didn't --amend them and review commit messages. What the commit does should be obvious from the subject.

I've also seen people write git commits as if they were a work log - things like "fixed a bug", or "implement feature X". That's the wrong way (IMO) to do git history, what the comment should say is what the /commit/ does, not what you did.

There is no I in programmng.

One way I enforce the review of my own commits in my own work flow is not using the CLI for committing. My git client shows me the difference in the status window and also when writing the commit message. Then also making it easy to stage specific junks without having to go down the add -p route. I personally use magit in emacs but I'm sure gitk and some of the more graphical ones have this ability. Not to say any of these don't exist in the CLI, the ergonomics are just not the same as getting the output of multiple commands in a single well designed interface.
But what good does it have for future devs if the history is

-- Added this thing -- Fixed typo -- Capitalized the letter

Etc.

One important reason is to avoid wasting time on gilding lilies.

Another reason is that the git information (e.g. from git blame) tells us when the code was written and in what order, rather than some post-hoc rearrangement.

For example, we might notice that code X is doing some tricky work which elsewhere is done by a helper function Y. We look at the git info and see that X was added after Y, so we try to figure out what special edge-cases X is trying to deal with that Y wasn't suitable for. Little do we realise that X was actually written before Y existed, but the commits got rearranged.

That kind of archeology is difficult to predict in advance (mostly because, if we realised all of the issues with our code beforehand, we'd fix them immediately!).

Future devs are just as capable at traversing repos and collapsing diffs as you or I, so there's no need to lie to them. In fact, they might have access to much smarter tools and IDEs than we do.

Personally, I only care about when the code hit master. Because that's when it could potentially have broken shit for everyone.

That I committed it locally is pretty irrelevant: I could just as well NOT have committed it, made a backup of the files on the side, copied them back in...from the perspective of the rest of my team, my local history is an implementation detail.

If the only thing I do is manipulate my local history, then open a PR and merge, master's history will actually show something much closer to the truth: That on X date I added something to master.

That I spent 6 weeks and 300 commits locally to do it (kids, don't do this at home!), literally doesn't matter to anyone.

> Personally, I only care about when the code hit master.

So just look for the merge commit on the master branch that brought it in.

By having 300 separate commits (which you were doing anyway) it helps us know what your thought process was on the day that a given line changed. Maybe you were refactoring function X to do Y. If you don't mention that you were accounting for changes happening in someone else's branch, then we know we have to look closer at that code. Without the individual commit, all we know is that giant-project-x was accomplished with this commit, and the change to that line may or may not have the necessary update.

By having a history of every single key you typed to create this comment, it would help me know what your thought process was when you typed it up. Maybe you got pissed off and wrote a swear word or two and then backspaced. Maybe you worded something awkwardly and then refactored your sentence. Without all of your keystroke history, all we know is that a comment was made by you, and your opinion may or may not have taken into account certain arguments made by others in the same thread around the time that the comment was published.
> it helps us know what your thought process was on the day that a given line changed

This is not something someone who actually spends his days reading code would say.

Code is hard to read as it is. Presenting it in well packages, readable commits is the very least one can do.

>Future devs are just as capable at traversing repos and collapsing diffs as you or I, so there's no need to lie to them.

The ability of devs to collapse a bunch of commits into a useful summary is near zero right now. You can only achieve it by rewriting history. Unless you think that feature is going to be commonplace very very soon, there is a compelling reason to lie.

> The ability of devs to collapse a bunch of commits into a useful summary is near zero right now

You can get quite far with 'git diff START END'. Something more task-specific can probably be done with Emacs, Magit, Ediff mode, bash, elisp, etc.

Even if you think collapsing commits by rewriting history is useful for making summaries, etc. what makes you think you can produce a more useful summary right now than that future dev can, considering the fact that you don't know what they might want?

The nice thing about git is that anyone can make a new branch from any point in the repo's history, merge, cherry pick, rebase, etc. to their heart's content, then garbage collect it once they've learned what they needed.

What makes me think I can write better code than the thousand people downloading my repo later? I don't, but somebody should do the summarizing, and it might as well be me.

Smashing the diffs together gets you the least useful parts of a purposeful squash commit.

> somebody should do the summarizing, and it might as well be me.

Should they? See my earlier point about gilding lilies and YAGNI ;)

Of course, there are always exceptions! The most obvious ones are processes which work per-commit, e.g. bisecting, conflict resolution, per-commit code review, etc. where having a bunch of interleaved "stories" can be tedious.

The order in which code was written is pretty meaningless. What matters is how the function of the code changed over time. If I run `git blame` I should not be presented with a whole series of "fixed typo", "changed whitespace", etc commits. That makes it really hard to work with. When I use `git blame` I want to be presented with the commit that actually made a meaningful change to the code.
That code X should be clearly commented to explain the state you describe. That's the proper, most ergonomic, solution to the problem. Sure, it might not be documented and archaeology might be needed, but it shouldn't be considered as an excuse to not write comments and/or documentation.
I agree that if X avoids Y for some subtle edge-case or whatever, then there should be a comment explaining why.

However, in my example X is written first, but just so happens to have become redundant once Y gets written. We've just spotted this redundancy, and it's up to us to figure out whether X should be refactored to use Y or not.

If we look at an unaltered history, we would see that X was written first, so we can hypothesise that it's just a special case of Y which can be refactored away.

If we look at an altered history, the commits containing X may have been squashed/rebased/etc. into a coherent "story", which just-so-happens to appear on top of the story containing Y.

If there were a comment telling us that X was added due to some edge-case, etc. that makes Y unsuitable, we could leave it alone and get on with something else. Yet in this situation there is no such comment, but that doesn't imply that it's not there to handle some subtle edge-case; we'd need to do more investigation to convince ourselves that it is indeed redundant before we could refactor it in confidence, to counteract the contrary evidence which git is telling us.

Actually as a reviewer I don't mind that "fixed typo" style commits. It keeps them separate from the real work. I can just cleanly glide past them while scanning the history.

"Added this thing" sounds like a substantial commit -- at least in terms of meaning, even if (for some reason), the actual diff is a one-liner. In that case I would want a more explanatory commit message, but the change itself is fine.

> My repo histories are riddled with stuff like "finished X", "stubbed out Y", "fix typo in X"

Where I work, our commits from years ago are like that, and since practically all the people from that era have moved on, the history is practically useless when trying to determine what they were working on and why they were working on it.

In fact, I found what appeared to be a logical error in one of the many tools we have deployed. I tried to track down when it was added and the commit just said something like "fixing integration tests".

So, not only did they change some tests, but they also added some code as well.

In other words, the reason that line of code was added is very well hidden from this "future" (now present) dev.

The three messages you give above are perfectly good members of a cleaned-up history. Each has a clear meaning, and small is good when it comes time do bisect.

Messages like "stuff", "it works", "xxx" and "everything I did last month" are not so good, but very common. Moreover if you are in the habit of avoiding them -- that is having each commit do one thing with a clear intension -- then you will keep finding times when you wish you had done something differently an hour ago. And then `rebase -i` is you friend.