Hacker News new | ask | show | jobs
by jdmichal 3119 days ago
> But git never stores diffs between old and new content; it just creates a new blob every time the content of a file changes.

Git pack files compress objects by storing them as diff files going backwards. That is, it stores the most recent state in full, then uses patches to go backwards. Because you're more likely to need a recent version in full than an older one.

https://git-scm.com/book/en/v2/Git-Internals-Packfiles

1 comments

This is true but packfiles are an implementation detail.

It's still useful and more accurate conceptually to consider every commit as a complete snapshot of the state of code that point.

That can be said of every version control system. Restoration of state to any given version is their defining feature. How they achieve that is always an implementation detail, but those details can still be important and interesting.
Git commits are composed of all of the files in the commit, it’s parent and the commit message. This is an important guarantee that each checkout is valid without the rest of the repo. This allows you to have a lot of exotic implementations guarantee consistency between them. Meaning if your GitHub you can distribute commits across many servers. Or your Microsoft and you build partial checkouts for Gvfs. It’s what allows Git LFS to keep many of git’s core guarantees while making tradeoffs to improve areas where git is traditionally weak.
Sorta true but see what bbatha said.

There are people who distinguish changeset oriented and snapshot oriented and will hotly debate that one or the other is better.

But as you say, restoration of state is a necessary and defining feature.