Hacker News new | ask | show | jobs
by azornathogron 907 days ago
> What is the point of source control, other than to reliably capture what actually happened in history?

Unless you're committing every keystroke, you're recording a curated history. You choose when to commit, and by choosing you declare some historical states to be worth keeping and the rest to be merely incidental.

I think usually history "rewriting" (eg, rebasing) is much more about curation - choosing which aspects of the history you care to record - than it is about presenting a false record.

2 comments

Exactly. To analogize to history history: OP wants the version control history to look like a collection of primary sources. Here's the president's daily calendar, there's the letter he received on April 24 from a small child in Wisconsin. In this model, it's up to future code historians to piece it all together into a story.

When I go back and look at the git history, I would much rather have had someone do the work of compiling the story for me at the time. Commits are your chance to document what you did for future programmers (including future you). If you insist on them faithfully reflecting every change you made over the course of three days, then future you will have to piece that all back together into a coherent story.

Why not take the chance to tell the story now, so that future you can skip all the false starts and failed experiments and just see the code that actually made it into main?

This isn't a novella. We're talking about executable code. What you're suggesting is the equivalent of using an encyclopedia as a legal reference.

Merge commits tell the coherent story. Commits reveal the messy history that got you there, which is critical exactly when you need to look at history. If you're not trying to track down the source of a problem and how it was introduced, in a deterministic way, why do you bother keeping source history? Publish pretty changelogs instead.

Can you give a concrete example of when you've used the messy details of how a change was introduced at a sub-PR level?

I'm strongly opposed to squashing, but when have you found that a chronological sequence of commits-as-they-were-committed has been helpful where a sequence of heavily-cleaned-up patches would have obscured useful information?

In my experience spelunking through git history, I've only ever been frustrated at the number of different red herrings I've found in a git blame that turned out to be a failed experiment that never got merged in.

Concretely: API changes are a big one, where in the history it looks like we may have once accepted something different than we do now, but then it turns out that that change was reverted before ever making it to production. This information being in the log clutters the git blame (the function was actually last changed in 2016, but someone modified it last month only to revert the change before submitting a PR), without providing an ounce of useful information about the history of the production app.

As a rule, when debugging problems, I don't care about how your private branches changed over time, I care about how the production code changed over time.

> the function was actually last changed in 2016, but someone modified it last month only to revert the change before submitting a PR

I can't think of a specific example from my own history, but something like this is what has happened. A function was changed in order to support a different change elsewhere in the code. That other change was later modified, incompletely, to remove the need to modify the first function, and the change to the first function was subsequently reverted. Down the road, it's discovered that the modification was incomplete, and when reviewing the new code, you wonder, "how could this possibly have ever worked?" The answer is that it didn't, and when it was committed, there was another supporting change that made it work. By erasing the history of that other change, you remove the possibility of discovering the reasoning behind the change and the source of the introduction of a problem.

If I had seen that intermediate state that's been erased, remember it, and try to find it, now I'm being gaslit by source control, because I remember a real change that was there in a commit, but source control now will lie to me and tell me that it never existed.

> I'm strongly opposed to squashing

> As a rule, when debugging problems, I don't care about how your private branches changed over time, I care about how the production code changed over time.

Ironically, squashing is probably the best tool you have to deal with developers who won't clean up their PRs. It's a pretty blunt tool though.

Its better to have the full detail in the case of an audit. It's almost guaranteed to be in the developers benefit.
Can you provide more details of what you're referring to? I understand the importance of an auditable trunk/production branch, but I'm having a hard time imagining why the sequence of commits on feature branches would matter in an audit.

The commit history is not an audit log, it's very easy to make it look like whatever you want it to look like, even if rebasing as such is banned. I have a hard time picturing a scenario where the commit history is trusted as an audit trail and it matters that every detail is present.

I'm referring to an outside certified audit of your code. You can make it look worse for yourself with rebases/squash merges but assuming you are working legitimately those would tend to obscure your work in realtime. What you as a developer would want is to be able to mirror your code changes along with the change requests.
Okay, but rebasing is changing each point in time of that history–that you curated by choosing when to commit–to be something different from what it ever was, retroactively. It's literally creating an entirely new history that nobody has ever actually examined, introducing the possibility that points along that history are inconsistent with what was intended at the point of each commit.
> creating an entirely new history that nobody has ever actually examined

I think the confusion here is that you're assuming that OP's commit history looks like yours, with dozens of commits per PR that no one could possibly examine in detail with each rebase. At least for me, since I'm okay with rewriting history on local branches, I have a very small number of commits that do get examined each time I rebase.

I average 3-4 commits per PR. There's usually one that refactors the existing code to lay the foundation for a new feature, maybe one that just moves a few files around (to ensure git recognizes them as moves and not delete/recreate), and 1-2 that introduce the new feature.

When I rebase on main, I examine the diff for each commit before pushing to my branch. If something has meaningfully changed, then I adjust the commits appropriately.

My commits aren't a history of what actually happened, they're a description of the steps that it takes to add a feature to (or fix a bug in) main. If main changes in a way that introduces a conflict, I want to reevaluate each step that I'd previously laid out.