| This is still not quite correct. Let me outline the structure, and then I will try and submit a PR addressing it. The first step to committing is staging what should be included. The staging process specifies an index of files that are to be added to the next commit. When the commit is recorded, git checks every file/chunk in the index; a hash is calculated per each of these blobs, each blob and hash are stored in a key<->object store, the object store, and the hashes are written into the index of the commit. If a blob already exists in the store then it is not added again. When changes are made and committed after this point, the resulting blobs are then hashed and stored again. Any unchanged blob does not need to be stored again; any changed blobs are stored. When a commit is recreated, it's index is evaluated. Each blob is retrieved from the object store and placed into the tree in the appropriate location. The most important thing to take out of this? Rebuilding a tree from blobs is fast. Second thing to take out? Git only stores each version of a blob (be it a file or a chunk) once, so most 'unpacked' repositories are still quite small. Now, this is obviously not the smallest representation of the repository, so git has a packed format which calculates deltas between blob files. This will calculate blob-deltas even if they are completely separated in the history; deltas are not between commits, instead they are between objects. Unpacking deltas recreates the blob objects required to build the tree. The packing process happens every now and then, but it is definitely not done every time a commit is made (by default). The most visible place it is used is when transferring over network protocols (I can't recall if it is done for every network transfer, but I suspect it is). It is done when running garbage collection as well. ---- The reason why all this is important is as I laid out before: rebuilding trees is fast, which makes fast branching possible, and the object store allows this without exploding the size of the repository. |
Do you think it's important for beginners to understand all these subtleties? I think I could maybe eventually introduce them, but for the first level on the first screen, I don't think throwing a bunch of concepts at them will help with learning. Feel free to re-open the task if you disagree