|
|
|
|
|
by xxbondsxx
4869 days ago
|
|
Thanks a ton for catching this. I guess there is a distinction to be made -- the compression might use delta's, but a commit specifies the entire state of the repository. It's a tricky line to walk though, because commands like "git show" and "git patch" clearly show the delta-like nature of a single commit. I also don't want newcomers to think that commits are heavy and should be used sparingly. I'm totally down to discuss this on a github issue with you, we could go over the wording. Maybe something like "a commit specifies the entire state of a repository, but is usually stored on disk as a set of changes"? EDIT: moving discussion to:
https://github.com/pcottle/learnGitBranching/issues/6 EDIT: fixed in:
https://github.com/pcottle/learnGitBranching/commit/168852b2... |
|
The first step to committing is staging what should be included.
The staging process specifies an index of files that are to be added to the next commit. When the commit is recorded, git checks every file/chunk in the index; a hash is calculated per each of these blobs, each blob and hash are stored in a key<->object store, the object store, and the hashes are written into the index of the commit.
If a blob already exists in the store then it is not added again.
When changes are made and committed after this point, the resulting blobs are then hashed and stored again. Any unchanged blob does not need to be stored again; any changed blobs are stored.
When a commit is recreated, it's index is evaluated. Each blob is retrieved from the object store and placed into the tree in the appropriate location.
The most important thing to take out of this? Rebuilding a tree from blobs is fast. Second thing to take out? Git only stores each version of a blob (be it a file or a chunk) once, so most 'unpacked' repositories are still quite small.
Now, this is obviously not the smallest representation of the repository, so git has a packed format which calculates deltas between blob files. This will calculate blob-deltas even if they are completely separated in the history; deltas are not between commits, instead they are between objects. Unpacking deltas recreates the blob objects required to build the tree.
The packing process happens every now and then, but it is definitely not done every time a commit is made (by default). The most visible place it is used is when transferring over network protocols (I can't recall if it is done for every network transfer, but I suspect it is). It is done when running garbage collection as well.
----
The reason why all this is important is as I laid out before: rebuilding trees is fast, which makes fast branching possible, and the object store allows this without exploding the size of the repository.