Hacker News new | ask | show | jobs
by thaumasiotes 2044 days ago
Fair enough. The blob stores a diff. The commit stores a reference to... the diff. This is a division between the concept of the object and the implementation. But it's not an example of a diff-storing model failing to model git as it is; git as it is is storing diffs.

If a commit references a "blob", and the "blob" that it references is, in fact, a diff, why would we say that the commit "does not reference a diff"?

3 comments

Each commit consists of a structured collection of hash IDs for every file in the entire repo. The hash ID is generated by hashing the contents of the entire file. Not the diff.

The "diff" you're referring to is an implementation detail of the compression. It's not even always there; it depends on which commits are present in your clone. It's also not even the same "diff" you work with when you use git to generate or apply patches. Using the same word only leads to confusion.

And that's part of the problem with git! It requires you to get a mental model of how it works internally, but only part of how it works internally is important, and there are terminology conflicts. So people get incorrect ideas about how it works, then get surprised when something unexpected happens.
There're revision control systems based on diffs, but git's power (and durability) is that every commit references only blobs, which are (content addressable) files, not diffs. All the diffs used in git log presentation or git-diff command or git rebase command are computed on the fly from the two stored versions. And yes, if you commit a giant file, and then delete it in next commit it's there forever, until you remove or rewrite a history of a branch that references this file somewhere in history.

There're optimizations on the storage level, compression etc, but on logic level those are transparent

I would understand "a commit is/contains a diff" as the commit referencing the difference to its parent commit(s), whereas in a packfile the diff might be against a blob belonging to an entirely different branch of the repository, if that's a better diff. Which might be different for each file. And the blob doesn't have to be a diff, it only is if the packer found a good candidate.