| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by thaumasiotes 2045 days ago

> See https://stackoverflow.com/a/25028688/8272371 for detailed explanation.

There is a disconnect somewhere. The linked answer says:

> Now, git is different. Git stores references to complete blobs and this means that with git, only one commit is sufficient to recreate the codebase at that point in time. Git does not need to look up information from past revisions to create a snapshot.

> So if that is the case, then where does the delta compression that git uses come in?

> Well, it is nothing but a compression concept - there is no point storing the same information twice, if only a tiny amount has changed. Therefore, represent what has changed, but store a reference to it, so that the commit that it belongs to, which is in effect a tree of references, can still be re-created without looking at past commits.

You can recreate a file that is stored as a root blob plus some series of diffs without looking at information from past commits. But you can't recreate it without doing the diffs! You have to look at the root blob. This is, internally, tracked separately from the commit which created it. But your conclusion:

> when you checkout stuff, git doesn't do diffs to give you the working directory at that point.

cannot be true. If the working directory at that point corresponds to a blob which has only diff information stored, git must apply that diff to a separate blob in order to give you the working directory.

2 comments

_ikke_ 2045 days ago

What you are missing is the difference between gits object model (with loose objects) and packfiles. The delta compression happens when you run `git gc` (git does this automatically as well on occasion), and packfiles is how git fetches and pushes history.

But when you create a new commit, that commit object is stored as a loose object, with any new file blobs and tree objects. This represents a complete snapshot of your working tree.

But git does not need to make a complete copy of the working tree on each commit. Because objects are referred to by the hash of their contents (with a git specific header), git only needs to store each version of a file once.

link

Bjartr 2044 days ago

Everyone here is arguing past each other because one side defines "what git does" as the literal implementation details of git, and the other side defines "what git does" as the model it presents to the end user. I suspect the reason for this disconnect is partly due to the emphasis on understanding the "internals" of git and the fact that this is about between the internal implementation as it exists in code and the internal model/interface.

link

fanf2 2045 days ago

When git makes packfiles using delta compression, it ignores the history. It roughly sorts blobs for similarity, completely ignoring their filenames or which commits they appear in. This sorting helps to make the packfile delta compression more efficient. The deltas on disk are completely unrelated to the diffs you see from `git show`.

link