Hacker News new | ask | show | jobs
by sofal 4575 days ago
I know a lot of people who routinely edit their local history before pushing changes to a shared repository because they don't want other people to see their true "dirty" history. This is insane.

This is no more insane than editing a source code file before you save it to the file system. Git is used as a development tool as well as version control, and developers are therefore encouraged to commit often, even if the code does not actually compile yet. There is no more need to fill the published history with all of these WIP commits than there is for me to know about every goddamn keystroke you made while you were dicking around with that config file.

1 comments

Is the history stored as a text file somewhere that you can just edit? I sometimes wish git were a bit more transparent and less of a black box.
I suggest that you pick up any git tutorial out there. It will soon become less of a black box.
I've read a lot about git. The docs generally don't pick apart what's inside the .git directory.
- The history items are stored as commit objects that are identified as a SHA-1 sum of the contents (including meta-data like Authored By, Committed By, etc).

- One of those meta-data items is "Parent Commit," so if you change one item in history, it changes the SHA-1 sum of all subsequent items (because at the very least they all need to be re-parented).

- All of the commit objects are stored under .git/objects.

- Branches are just files under .git/refs/ that contain the SHA-1 sum of the most recent commit on that branch. This is why they are called 'branch pointers.' That's basically all they are.

- If you have a history of 5 commits, and make a change to the initial commit, you now have 10 commits in your .git/ directory. Your (e.g.) 'master' branch will point to the most recent 'tree' of 5 commits. The other commits will still exist in .git/objects, but there will be no branches pointing them. You can use 'git reflog' to find them, or access them by their SHA-1 sum.

- Eventually 'git gc' (gc = garbage collect) will clean out the unreferenced commits, but this happens rarely if you don't explicitly run the command.

- When you 'git push,' you are only pushing branches to the remote repo, so commits that are stored locally, which are not referenced by one of those branches you are pushing, will not be pushed out. If you have commits that you don't want to end up in limbo like this, you should 'git tag' them or create a branch (e.g. 'archive/master-2013-12' that points to them).

It looks like .git/logs contains the history. It looks like the file format is a space-separated list, with the format "$parentcommitsha1 $newcommitsha1 ... $commitmessage". That's fairly comprehensible. What are the SHA-1 sums of? Are they of the entire snapshot, or the delta? I went into objects/ and ran `sha1sum $objfile`, and the sum did not match the file name. So that remains obscure. `file $objfile` could not identify the format; it gave nonsense.

Thanks for the help.

>One of those meta-data items is "Parent Commit," so if you change one item in history, it changes the SHA-1 sum of all subsequent items (because at the very least they all need to be re-parented).

What sequence of operations would change a history item in that way?

> It looks like .git/logs contains the history. It looks like the file format is a space-separated list, with the format "$parentcommitsha1 $newcommitsha1 ... $commitmessage". That's fairly comprehensible.

I've never looked at .git/logs, but it looks like that is used by the `git reflog` command. It's basically a history (or log) of every commit that a particular reference has pointed to[1]. For example, I cloned the git source code:

  user@host ~/src/git % cat .git/logs/HEAD
  0000000000000000000000000000000000000000 d7aced95cd681b761468635f8d2a8b82d7ed26fd First Last <user@example.com> 1387237920 -0500	clone: from https://github.com/git/git.git

  user@host ~/src/git % git reflog
  d7aced9 HEAD@{0}: clone: from https://github.com/git/git.git
Note: `HEAD` is a reference to the current branch. E.g.:

  ~/src/git $ cat .git/HEAD
  ref: refs/heads/master

  ~/src/git $ cat .git/refs/heads/master
  d7aced95cd681b761468635f8d2a8b82d7ed26fd
It's also of note that branches are referred to as 'references' too, hence storing them under `.git/refs/`.

> What are the SHA-1 sums of? Are they of the entire snapshot, or the delta? I went into objects/ and ran `sha1sum $objfile`, and the sum did not match the file name. So that remains obscure.

See: http://stackoverflow.com/questions/5290444/why-does-git-hash...

[1]: Since the local repository was created. This information does not sync between local and remote.