Hacker News new | ask | show | jobs
by dtech 1942 days ago
The advantage if that you get a more usable and understandable list of historical changes. "You wouldn't publish the first draft of a book" [1]

A squashed merge or rebased and cleaned set of commits gives a very clean overview of which changes where made, at what point, why they were made, and what together. That picture tends to get utterly lost in the "set up X", "make test Y", "fix typo", "wip" and "change error handling" commits a feature branch typically has.

Additionally I'm not really interested in that my colleague started change X yesterday before lunch, I'm interested in when it went live and became visible for the all developers when it was merged into the main branch.

[1] https://git-scm.com/book/en/v2/Git-Branching-Rebasing#_rebas...

1 comments

You wouldn't publish a first draft, but neither would you burn it once the final draft was off to the printer. Personally, I'd prefer it if "squashing" commits was purely a UI thing; the underlying commits were all still there, but grouped together and displayed as a single big "virtual" commit. That way you could still drill down to the real history if you needed to.
Why would you want to see every typo that was corrected? Every little test that was changed erroneously and then backed out again?

That may be an accurate representation of the order savepoints were made, but it's not an accurate representation of how the software evolved. It is noise that needs to be discarded if a reader would like to know what change was really made. It also makes if difficult or impossible to use tools like git bisect.

Is the argument really that a more detailed history is always better? In the trivial case every keypress could be a savepoint, and every savepoint a commit.

One does not always know in advance that a commit needs to be split in two. The only way to produce readable commits without rebasing them in that case is to work with local _backup files. A version control system does this much better.

In fairness, you're only seeing 5% of the typos. We caught the other 95% before committing. :)

I love your question, "why not a commit per keypress?", because it raises an interesting follow-up: why not squash and rebase entire months or years of project work into single commits? If squashing is so useful, why do we only apply it at low-grain scales? Could we read and understand massive projects quickly and easily, if they only had a few commits to them?

I'm sure that we don't experiment with larger-scale rebases because of the limitations in the technology -- we all know that we're not supposed to 'git rebase' in public, and why that is. But suppose those obstacles were lifted. Now that we can rebase and rewrite at any time scale, which scale(s) is the right one(s) to choose?

> why not squash and rebase entire months or years of project work into single commits?

The argument here is that one should rebase and carefully craft commits that isolates each functional change into a separate commit, where each change is motivated and builds on previous, before pushing anything. Every commit should build cleanly, preferably even pass tests. That makes changes easier to reason about, and enables the use of tools such as bisect. Look at git itself for an example of this type of history.

The counter argument to that was that it presents a false view of history. Maybe there were false starts and mistakes made along the way. Without preserving these to history the reader is left without understanding these. This is not an uncommon argument. Some people argue rebase should never be used.

This view suggests that a more detailed history is preferable. Taken to its logical extreme, that would mean every keypress and editor command.

But "why not delete all of history" is not an example of "carefully crafted commits" taken to an extreme. Quite the opposite.

Basically, you want to keep the history of individual logical patches to the codebase, but not the meta-history of how those patches were made.
It helps to think about how git grew out of an email based workflow.

A commit is essentially an email. It has a sender, date, a subject line and a message body. The commit message format is subject, empty line, body. Think of git repository as an archived mailing list worth of patches.

Much the same as you wouldn't send an email describing several days of work without proofreading it, you should treat your commits the same way. The git design grew out of this usage, which was much harder in something like Subversion.

No one would send an email to the kernel mailing list suggesting a patch set that included errors, false starts, and reverts. That would waste reviewers' time. Code history is a craft to aid understading the code, it is not an undo log.

> it raises an interesting follow-up: why not squash and rebase entire months or years of project work into single commits?

That's effectively what happened before version control/before the small-scale rebases we enjoy now were possible. And the reason is that it's hugely valuable in certain circumstances to be able to see some granularity of the history. (Though clearly people disagree about what the grain size should be.)

> Could we read and understand massive projects quickly and easily, if they only had a few commits to them?

I don't think so. The current state is visible at the top of the git tree regardless. History comes in when you are trying to understand why the state is what it is. Usually this is for troubleshooting in my experience, but sometimes also when doing a refactor. Meaningful commit messages attached to meaningfully-clumped patches are, in my opinion, absolute gold in those cases.

There's little benefit to squashing down a year's worth of work into 5 commits because you can just as easily tag each of those 5 commits with a version number, give it a little write up, and call it a release.

I think the reason to squash commits is to cut out the noisy bits that were only useful to the original developer that day and create a timeline that's helpful for future readers. It doesn't really make sense to get more granular than the level of a single commit with a good comment and a small set of cohesive changes. So you store your history at that granular level and you can take care of the rest with tags, minor and major versions, etc.

The Fossil designer agrees with you:

"So, another way of thinking about rebase is that it is a kind of merge that intentionally forgets some details in order to not overwhelm the weak history display mechanisms available in Git. Wouldn't it be better, less error-prone, and easier on users to enhance the history display mechanisms in Git so that rebasing for a clean, linear history became unnecessary?"

I'm not a user of it myself, but I believe this is the philosophy behind how Fossil approaches it:

https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md

Pull requests can serve the same purpose; messy feature branches and a clean main trunk.
The only way you get that in Git is if you squash-and-rebase before merge, though. Which is fine if that's the process and end result that you want, but does (if you keep feature branches "messy") disconnect feature branches from their related merges into trunk from Git's point of view.
Yeah, you're reliant on Github metadata to make those links for you; there's nothing natively in git itself doing it. It's also an all-or-nothing affair, where the whole PR becomes a single squashed commit. To get anything in between ("here's my single large PR which I've rebased into N incremental commits, but you can also dig in and see the work that actually led here"), you really do need first class support in the tool.

I suppose the Github answer to all this would be "just make separate PRs", but going that way asks a lot more of the developer in terms of how polished those incremental states need to be.

Mercurial does this with the Evolve extension.

https://www.mercurial-scm.org/doc/evolution/user-guide.html#...

It still has the individual commits, but the interface will make it appear as if it's just one commit.

The real history is useless. Especially if we have tests. In that case it doesn’t matter how often we make changes.

I do think this is because I prefer to think of code as a black box. No one should need to figure out how my functions work. Someone should just need the name of the function, what inputs it receives, and what output does it return. If someone actually has to read my code, that’s a failure.

> If someone actually has to read my code, that’s a failure.

I can't tell if you're being serious, or are a brilliant troll. :)

Assuming you're serious, Hyrum's Law is one reason I might need to see your code (https://www.hyrumslaw.com/). The signature of your function is not the whole signature, it's just a sketch of the high points.

You really should just need to read the code in case something goes wrong, but otherwise, no. You need to be more careful with your time.