| > I think it's more or less the DAG represented as an adjacency list. I'd have to think a bit about why there is a separate log file for each branch. It seems that there's some redundancy in doing that, and I'm wondering what the advantages and disadvantages are of splitting the history up in that way. Think of each branch as a pointer. Then realize that you can make that pointer point anywhere on the DAG, even to parts of the DAG that have no connection to each other. The `reflog` is a (local, non-comprehensive) history of where that pointer has pointed. That's why there is a separate log for each branch. I guess that technically they could have a single log file and add another field to specify the branch, but using the same directory tree structure as under .git/refs/ makes the mental model simpler (and probably a performance improvement not to have to parse the reflog for every branch just to see the reflog for one branch). > I've developed a loathing of excessive hierarchies/trees, so I'd rather see them flattened in a single directory. But that makes sense. I'm not sure what branches living under .git/refs has to do with excessive hierarchies/trees. There are enough things stored in the .git directory, that if you mashed them all together it wouldn't make any sense. > What's in an object? If you really care to dive deeper, you can check objects here: https://github.com/git/git/blob/master/object.h You can get a shorter version towards the bottom of the git manpage (e.g. `man git`): IDENTIFIER TERMINOLOGY
<object>
Indicates the object name for any
type of object.
<blob>
Indicates a blob object name.
<tree>
Indicates a tree object name.
<commit>
Indicates a commit object name.
<tree-ish>
Indicates a tree, commit or tag
object name. A command that takes a
<tree-ish> argument ultimately wants
to operate on a <tree> object but
automatically dereferences <commit>
and <tag> objects that point at a
<tree>.
<commit-ish>
Indicates a commit or tag object
name. A command that takes a
<commit-ish> argument ultimately
wants to operate on a <commit> object
but automatically dereferences <tag>
objects that point at a <commit>.
<type>
Indicates that an object type is
required. Currently one of: blob,
tree, commit, or tag.
<file>
Indicates a filename - almost always
relative to the root of the tree
structure GIT_INDEX_FILE describes.
|
>Think of each branch as a pointer. Then realize that you can make that pointer point anywhere on the DAG, even to parts of the DAG that have no connection to each other. The `reflog` is a (local, non-comprehensive) history of where that pointer has pointed.
I got that branches were pointers. Now that I'm aware that the DAG is fully represented inside objects, I can see that what's inside logs/ is actually just logs. Each log corresponds to a subgraph of the full DAG. Getting history from a log would be more efficient than from the objects themselves, because to get it from objects, you'd have to dereference a lot of object references.
>I'm not sure what branches living under .git/refs has to do with excessive hierarchies/trees. There are enough things stored in the .git directory, that if you mashed them all together it wouldn't make any sense.
Having to descend through layers of subdirectories makes things harder. I'd reduce the depth of the directory tree to the absolute minimum. It's hard to tell if this is the minimum without knowing exactly what all the implementation constraints might have been.
I can see that the real meat of this system is the object store. It's useful to know about `git cat-file` for inspecting it.