Squashing for example allows me to have a history where each commit builds. This has been very useful for bisecting for me. I wouldn't call it "next to nothing".
The "greater detail" part can cost me a lot of time.
Having commits that do not build represents the history more accurately. It could very well be a “fix” to some build error that silently introduces an issue, that context is lost when you squash.
> Having commits that do not build represents the history more accurately.
Sure it does, but sometimes that level of detail in history is not helpful. Individual keystrokes are an even finer/"more accurate" representation of history; but who wants that? At some point, having more granular detail becomes noise - the root of the disconnect is that people have a difference in opinion on which level that is: for some (like you), it's at individual commit-level. For others (like me), it's at merge-level: inspecting individual commits is like trying to parse someone's stream-of-consciousness garbage from 2 years ago. I really don't care to know you were "fixing a typo" in a0d353 on 2019-07-15 17:43:32, but your commit is just tripping-up my git-bisect for no good reason.
Sure, I can - but should I? That's the fundamental difference in opinion (which I don't think can be reconciled). I don't need to know what the developer was thinking or follow the individual steps when they developing a feature or fixing a bug, for me, the merge is the fundamental unit of work, and not individual commits. Caveat: I'm the commit-as-you-go type of developer, as most developers are (branching really is cheap in Git). If everyone was disciplined enough not to make commits out of WIP code, and every commit was self-contained and complete, I'd be all for taking commits as the fundamental unit of code change
If the author did something edgy or hard-to-understand with the change-set, I expect to see an explanation why it was done that way as a comment near the code in question, rather than as a sequence of commit-messages, that is the last place I will look - but that's just me
But the problem with this approach is that you're making it impossible to extract the actual changes when you do want them, whereas simply skipping non-merge commits is a minor inconvenience (`--first-parent` tends to cover it).
I mean, granted, it's not ideal. I think this is a bit of a problem with the low-level nature of git - Ideally it'd be easier to semantically bundle such sequences of commits such that it's be more reliably dealt with in the broader ecosystem (not every tool supports --first-parent), and in any case, there's nothing forcing you to maintain the first-parent-is-linear-mainline-history; that's just a tradition which, again, many common tools follow. Then of course there's the poor integration with git hosting (such as github) and git - I can blame a file, but I can't easily correlate that with the discussions in the PRs, and whatever correlation there is is purely online, with all the limitations of a single-vendor non-distributed system like that entails.
Ideally this wouldn't even be a tradeoff at all; it would be obvious how to track history both at the small scale and the larger scale (and perhaps even more?), but alas, it's what we have.
Out of curiosity - when you merge via squash, what kind of commit messages do you retain? Do you mostly concatenate the commit messages, or rewrite the whole thing?
> when you merge via squash, what kind of commit messages do you retain? Do you mostly concatenate the commit messages, or rewrite the whole thing?
Context-specific. For a bigger PR that deals with an extensive refactor I'll prefer to have a descriptive title and hand-curated task list below (so definitely not 1:1 to commit messages). For smaller PRs -- or more focused ones, like those dealing with a single feature or bug -- I'll only leave a descriptive title.
But I usually never leave a list of commit messages. Not because I have no discipline; sometimes some refactoring requires 4-5 steps and all commits have 99% identical messages which is not useful when you aggregate those in a single list of bullet points in the end.
---
> But the problem with this approach is that you're making it impossible to extract the actual changes when you do want them, whereas simply skipping non-merge commits is a minor inconvenience (`--first-parent` tends to cover it).
Again, that's not the issue here. The issue is that when you work on a big project (like many of us do) you get something like 4-7 merged PRs a day; don't pull/fetch for 3 days and you'll get 60+ lines in your terminal when you get to it.
There are people who manage releases and people who chase subtle regressions. Having git bisect narrow it down to a big PR squashed commit is actually a win; it gives them a localized area inside which they can work with other tools (not bisect).
In the end I suppose we can say it's a subjective taste. But I always appreciated the main branch's history to only consist of squashed commits. Again, it gives you a good bird-eye's view.
Storing every single version of the file which ever hit disk locally on my machine in the history would be the most accurate, yet no one seems to advocate for that. Even with immutable history, which versions go into the history is a choice the developer makes.
>Storing every single version of the file which ever hit disk locally on my machine in the history would be the most accurate, yet no one seems to advocate for that.
You'd be surprised. It's only because we understand (and are used to) tool limitations (regarding storage, load, etc) that we don't advocate for that, not because some other way is philosophically better.
I'd absolutely like to have "every single version of the file which ever hit disk locally on my machine in the history".
I'd be very surprised if you used such a feature on a daily basis indeed.
I understand the rationale but the balance tilts too far into the "too much details" territory for me and that can slow me down while digging.
What I found most productive for myself is that searching for a problematic piece should happen on a two-tiered tree, not a flat list. What I mean is: first find the big squashed PR commit that introduces the problem, then dig in more details inside of it.
Not claiming my way is better but for almost 20 years of career I observed it was good for many others as well, so I am not exactly an aberration either.
To me a very detailed history is mostly a distraction. Sure `git-bisect` works best on such a detailed micro-history but that's a sacrifice I am willing to make. I first use bisect to find the problematic squashed commit and then work on its details until I narrow down the issue.
I mean, this isn't even really all that far-fetched, other systems do work like that, such as e.g. word's track changes or gdoc's history - or even a database's transaction log.
And while those histories are typically unreadable, it is possible to label (even retroactively) relevant moments in "history"; and in any case just because a consumer-level wordprocessor doesn't export the history in a practical way doesn't mean a technical VCS couldn't do better - it just means we can't take git-as-is or google-docs-as-is and add millions of tiny meaningless "commits" and hope it does anything useful.
All of that is completely valid and I appreciate it. But you are addressing a group of people most of which are already overbooked and have too much on their plate every day. How viable is it to preach this approach to them?
Why not? I do not see any reason to have a history at all for anything except to be able to go back to a specific version to track down a problem. Inaccurate history makes that less useful.
Typo commits and the usual iteration during development isn't "accurate history". Noise in commit logs provides negative value.
Ideally, each commit should be something that you could submit as a stand-alone patch to a mailing list; whether it's a single commit that was perfect from the get-go or fifty that you had to re-order and rewrite twenty times does not matter at all; the final commit message should contain any necessary background information.
It would be needlessly restrictive to prevent users from making intermediate commits if that helps their workflow: I want to be able to use my source-code management tool locally in whichever way I please and what you see publicly does not need to have to have anything to do with my local workflow. Thus, being able to "rewrite history" is a necessary feature.
Patch-perfect commits are an idealistic goal. The truth is that, as already mentioned, many of those typos and “dirty” commits can be the source of bugs that you’re looking for. Hiding them hides the history.
Indeed they are, hence the need to rewrite the history. Managers, tech leads, or users of your OSS project don't care about the "fix typo" comments. They are interested in a meaningful history that tells a bigger story.
And to be frank, I am interested in the same, mid-term and long-term. While I am grappling with a very interesting problem for a month then yes, I'd love my messy history! But after I nail the problem and introduce the feature I'll absolutely rewrite history so the squashed PR commit simply says "add feature X" or "fix bug Y".
Problem is, when you're debugging later on you need to understand what a breaking commit does (or is intended to do) before knowing what it did wrong.
Let alone the code review issue. In any multi-person project, it is just as important that your commit history be readable by a third party for information as that it be useful for your own personal debugging.
Because when debugging with said history a week or a month down the line, or when someone else is debugging with said history, their human comprehension is essential for understanding why a certain event in the history is causing problems and what that event was intended to do.
Leaving aside the readability and code-review concerns, bisection as a process is supremely painful when you have to separate out your targeted bug from the usual intermittent compilation and runtime issues that show up during local development.
You're not saying anything new as far as I can tell. Your grandparent already said what you said. I only disputed the "this costs next to nothing" part, which you don't seem to comment on.
ideally your branch commits would usually build too; but admittedly, I tend to bisect by hand, following the --first-parent. I suppose if the set of commits were truly huge that might be more of an issue. And of course, there are people that have hacked their way to success here: https://stackoverflow.com/a/5652323, but that sounds a little fragile.