Hacker News new | ask | show | jobs
by 32bitkid 4798 days ago
I can't speak for everyone, but the main reason I'm interested in a reasonably perfect preservation of history is to account for every line of code in the respositoy and why its there. I think there is a difference between the consumer of a library and not caring about the internals, and being actively involved in the development of a library. Being able to look back in time and see what state a file was in when it was change, what was changed, who changed it, and the reason for the change(with possibly more metadata of links to tickets/bugs/stories) is very valuable before I start mucking around and changing code.

To me, its the same as testing code. You don't need tests when things work perfectly. You only need tests/history when things aren't... And then you are seriously happy you have them.

On the topic of `git pull --rebase`, I think if you have a hard-and-fast rule that you employ without thinking about what you are doing to your commits and the state of the repository then you are doing it wrong (whether that is blindly merging or rebasing)... But that's just me.

2 comments

> to account for every line of code in the repository and why it's there.

I've found that on projects which disallow the modification of history answering this question is more difficult than if each committer was responsible for recomposing their commits before merging their features (preferably a FF-merge, of course). Meaningful/useful code isn't lost as you're not modifying the long-term history of the project, just your own recent commits relative to the task at hand. Authorship isn't lost, as even if the recomposition is handled by another person, you can always set the author for a commit arbitrarily, and indicate your presence as the maintainer by signing.

Put differently, responsible devs never modify other people's history (and unless you're sharing the same machine, git makes this difficult with push vs push -f). They modify their own history in an effort to limit the noise that other devs are exposed to and to make the maintainer's job easier. The goal is to treat the repository as a full-fledged mechanism for communication and coordination with the rest of the team.

I agree. It doesn't matter what order a line of code was added to the system in, it matters why it was added. When I can take the 15 commits I played with solutions (adding code, nuking code, etc) and slim it down the the one set of code that just works, I've saved everybody who looks at it significant effort in figuring out what I was thinking.

There is some information lost in the process, since you can never see what I did that failed, but if you were to add up the amount of time spent redoing failed experiments and subtract it from the amount if time spent wading through experimental, dead commits, my experience says you wind up with a large balance of time wading through junk. Or those experimental changes never get committed, so you the developer wastes time copying files around to make backups and you still don't know the failed experiments.

I think there are plenty of workflows that make sure everything is accounted for, without cluttering things up with unimportant information.

For example, have a central repo that is the source of immutable history, and have every developer clean up their history into a small linear set of commits before they merge into that. You still have just as much accountability -- nothing can get into master without a developer looking at it and tagging it with a commit message. It's just that the commit message comes from a developer looking at and curating the work he just did on a feature or bugfix, instead of the vague assumptions and notions he was working with during development.

If you think people looking back on their recent work will be better at summarizing their motives and achievements than they were while working and experimenting, as I do, then rewriting local history makes a lot of sense. If you don't trust people, and think they are likely to lose relevant information by haphazardly rebasing with messages like "squash for pushing to master, bug #1933" then you might not.

All in all, I think that, for example, 2 clean messages from 2 developers (even relatively uninformative ones) are better to look at than 1 commit from one developer and 13 from the other with messages like "first stab at xyz" and "Oops, forgot to also change the name here".