Hacker News new | ask | show | jobs
by m12k 3589 days ago
There can definitely be value in e.g. summarizing the final result and understanding after a number of 'failed' iterations. And that the many small commits can be considered "noise" if you only care about what happened at a high level. However, rewriting history with operations like squash or rebase IMO seems like a bad solution to a real problem - it really shows that we don't yet have the right abstractions or tools for doing this. If all the small commits are considered "noise" then it shows that we don't have good enough tooling for filtering and grouping when perusing our VC history. The information that people are currently putting into a squash commit is an aggregate or derivative of the original, and as such shouldn't replace it, it should supplement it. There are legitimate use cases where the details of those small commits are indeed valuable, even if the more high-level "just tell me the final result" use case is more common - you shouldn't have to be forced into making that tradeoff.

I know some people are using merge commits or pull requests as a place to put this information - but maybe we need a an explicit mechanism for grouping together commits and summarizing them? I'm imagining something along the lines of code folding. Such a grouping might have other uses too (e.g. signal that there's a grouping of commits where the tests will fail, so skip to the last commit if bisecting)

2 comments

So, here's a question. I make a commit to implement a feature, then I realize there's a bug in my implementation, so I fix the bug, and then squash it into a single commit. What is the scenario where anyone is going to be perusing the history and he'll actually want to know about that bugfix?

What's the actual value in that intermediate commit? Other than seriously contrived scenarios I can't think of any of the "legitimate use cases" you mention. If it's someone else's code I never want to see that intermediate commit.

Where's the tradeoff?

Let's imagine for a second that the bug-fix you make isn't perfect. Maybe you ought to have refactored something a bit more instead. Maybe you made the bugfix a couple days after making the main feature commit, and you'd forgotten some detail. At any rate, half a year later someone has to sit down and figure out why the code is behaving weirdly sometimes. If you've got an accurate history of how the code was written, they'll have an indication that the code from the bugfix was added post-hoc, and might be inclined to investigate here. They'll understand that all these lines of code were not written at once, so the ones in the bugfix are more likely not to be fully cohesive with the rest of them.

Sure, in the happy case where your code is perfect, all those extra commits are just 'noise'. But when debugging, there can be value in the forensic information about the evolution of the code. Which also documents the evolution of the understanding of the person that wrote it. It can help answer questions like "why is _this_ here?" or "what were they thinking?!?" I've fixed bugs that would have taken much longer to narrow down if I hadn't had clues like that.

As I understand it (and what I now do when on a team) is use commits as atomic, fully flushed out parts. Commit 324rte may specifically "add support for PUT operations on widgets" or asde21 may "Refactor Business Rule Unit Tests into individual files".. and looking at the log, you can cherry-pick that one commit (and deal with potential merge issues) onto your branch.

But if you're working at home on your own project and just want to sync between a few machines, what do you do? Commit, "Hashing on the Liststore kinda works, some bugs." Push. Go to your home machine, work some more, and finally squash all those kinda working commits into one commit ... potentially even need a "git push --force" (which you can safely do since you're the only developer?)

I agree with you totally though. It shouldn't just be a branch. There should be a way to group x edits into one big commit. That's the atomic unit that has a specific feature, and all the mini-commits inside of it should be totally abstracted except for specific deep searching commands.