Hacker News new | ask | show | jobs
by benjamincburns 4794 days ago
Like I said on the other thread, tread carefully friends; there's dogma at work here.

Also, take a step back and look at the history of git. Git was created by Linus Torvalds specifically for Linux kernel development. I'd argue that a key reason that the kernel is so successful is because people are able to maintain history as a first-class entity in their project. The idea the you can 'rebase -i' to build up small, neat commits that will almost always apply cleanly to a sane codebase is wonderful. The fact that I don't need extreme foresight to capture my meaningful units of work into individual commits means that years from now I can look back and see what I was actually doing instead of "wait, was that line deleted as part of the feature, or was he just cleaning up warnings?"

Remember that these features aren't for developers, they're for maintainers. If you want your code in the kernel, you follow the kernel development process or GTFO. Linus doesn't sit around saying "shucks darn, it didn't merge cleanly, I guess I'll go fix it for them." He just doesn't have the time, and neither do his "deputies."

That's not to say that these features don't benefit developers; they do. It's just that you need to have seen them in action to understand why.

And finally, I'm genuinely curious... Why are some people so obsessed with perfect preservation of history? Is this some sense of fear/paranoia? In practice I've never found project history to be useful without modification, so what am I missing? What are people trying to preserve?

2 comments

And finally, I'm genuinely curious... Why are some people so obsessed with perfect preservation of history? Is this some sense of fear/paranoia? In practice I've never found project history to be useful without modification, so what am I missing? What are people trying to preserve?

I think it's a conflation of having something like incremental backups versus having (as you so eloquently put it) a cleaned up log of development. Sure, you can use a VCS to record the minutiae of every little thing that changes so you have a "snapshot" of the code at any point in time. And git will do that if you want it.

But I'd also have to second your thoughts that git is VCS done right, that is, by maintainers. All code will have to be maintained sooner or later, and as someone who has had to maintain plenty of code, I can tell you I don't care at all about every little change that's made. Even when I'm bisecting a bug, I don't want to have to skip over every stupid bit that was twiddled, or see commits that are immediately reverted by the next commit. That's garbage. I want to see conceptual chunks, things that hang together because a human thought of them in the terms of "this is a feature" or "this fixes a bug". Should commits make the Minimum Necessary Change? Yes. Should a new feature or bug fix be split across several commits, possibly separated by other, unrelated commits, because that's the way some sleep deprived programmer thought of them? Do you like to read author's notes about their novels instead of the edited novels?

I can't speak for everyone, but the main reason I'm interested in a reasonably perfect preservation of history is to account for every line of code in the respositoy and why its there. I think there is a difference between the consumer of a library and not caring about the internals, and being actively involved in the development of a library. Being able to look back in time and see what state a file was in when it was change, what was changed, who changed it, and the reason for the change(with possibly more metadata of links to tickets/bugs/stories) is very valuable before I start mucking around and changing code.

To me, its the same as testing code. You don't need tests when things work perfectly. You only need tests/history when things aren't... And then you are seriously happy you have them.

On the topic of `git pull --rebase`, I think if you have a hard-and-fast rule that you employ without thinking about what you are doing to your commits and the state of the repository then you are doing it wrong (whether that is blindly merging or rebasing)... But that's just me.

> to account for every line of code in the repository and why it's there.

I've found that on projects which disallow the modification of history answering this question is more difficult than if each committer was responsible for recomposing their commits before merging their features (preferably a FF-merge, of course). Meaningful/useful code isn't lost as you're not modifying the long-term history of the project, just your own recent commits relative to the task at hand. Authorship isn't lost, as even if the recomposition is handled by another person, you can always set the author for a commit arbitrarily, and indicate your presence as the maintainer by signing.

Put differently, responsible devs never modify other people's history (and unless you're sharing the same machine, git makes this difficult with push vs push -f). They modify their own history in an effort to limit the noise that other devs are exposed to and to make the maintainer's job easier. The goal is to treat the repository as a full-fledged mechanism for communication and coordination with the rest of the team.

I agree. It doesn't matter what order a line of code was added to the system in, it matters why it was added. When I can take the 15 commits I played with solutions (adding code, nuking code, etc) and slim it down the the one set of code that just works, I've saved everybody who looks at it significant effort in figuring out what I was thinking.

There is some information lost in the process, since you can never see what I did that failed, but if you were to add up the amount of time spent redoing failed experiments and subtract it from the amount if time spent wading through experimental, dead commits, my experience says you wind up with a large balance of time wading through junk. Or those experimental changes never get committed, so you the developer wastes time copying files around to make backups and you still don't know the failed experiments.

I think there are plenty of workflows that make sure everything is accounted for, without cluttering things up with unimportant information.

For example, have a central repo that is the source of immutable history, and have every developer clean up their history into a small linear set of commits before they merge into that. You still have just as much accountability -- nothing can get into master without a developer looking at it and tagging it with a commit message. It's just that the commit message comes from a developer looking at and curating the work he just did on a feature or bugfix, instead of the vague assumptions and notions he was working with during development.

If you think people looking back on their recent work will be better at summarizing their motives and achievements than they were while working and experimenting, as I do, then rewriting local history makes a lot of sense. If you don't trust people, and think they are likely to lose relevant information by haphazardly rebasing with messages like "squash for pushing to master, bug #1933" then you might not.

All in all, I think that, for example, 2 clean messages from 2 developers (even relatively uninformative ones) are better to look at than 1 commit from one developer and 13 from the other with messages like "first stab at xyz" and "Oops, forgot to also change the name here".