| > See https://stackoverflow.com/a/25028688/8272371 for detailed explanation. There is a disconnect somewhere. The linked answer says: > Now, git is different. Git stores references to complete blobs and this means that with git, only one commit is sufficient to recreate the codebase at that point in time. Git does not need to look up information from past revisions to create a snapshot. > So if that is the case, then where does the delta compression that git uses come in? > Well, it is nothing but a compression concept - there is no point storing the same information twice, if only a tiny amount has changed. Therefore, represent what has changed, but store a reference to it, so that the commit that it belongs to, which is in effect a tree of references, can still be re-created without looking at past commits. You can recreate a file that is stored as a root blob plus some series of diffs without looking at information from past commits. But you can't recreate it without doing the diffs! You have to look at the root blob. This is, internally, tracked separately from the commit which created it. But your conclusion: > when you checkout stuff, git doesn't do diffs to give you the working directory at that point. cannot be true. If the working directory at that point corresponds to a blob which has only diff information stored, git must apply that diff to a separate blob in order to give you the working directory. |
But when you create a new commit, that commit object is stored as a loose object, with any new file blobs and tree objects. This represents a complete snapshot of your working tree.
But git does not need to make a complete copy of the working tree on each commit. Because objects are referred to by the hash of their contents (with a git specific header), git only needs to store each version of a file once.