Hacker News new | ask | show | jobs
by rakoo 3928 days ago
Not specific to the post, but:

> we're sending few objects, all from the tip of the repository, and these objects will usually be delta'ed against older objects that won't be sent. Therefore, Git tries to find new delta bases for these objects.

Why is this the case ? git can send thin packs if the receiver already has the objects, why does it still need to find a full base to diff against ? (Not counting when initial base objects are from another fork -- I don't know if it's often the case)

On top of that as far as I understood from the discussion about heuristics (https://git.kernel.org/cgit/git/git.git/tree/Documentation/t...) it seems like the latest objects are full and the earlier objects are diffed against them (double benefits: you usually want access to the last object which is already full, and earlier objects tend to be only remove stuff, not add because "stuff grows over time). So if objects are still stored as packs, things should already be in a pretty good shape to be sent as-is... or not ?

1 comments

Their initial test case was a full clone, in which case you can't really send a shallow pack.

The problem, as I understand it, was that when you requested a full clone of fork1/repo.git, they'd find all objects reachable from refs in that repo, but git would by default generate deltas for those objects that referred to objects from other forks. When it noticed those objects were not going to be sent to the client, nor did client know about them, git recovered by doing expensive matching across the objects that it was sending, and without having traversed the graph before its heuristics weren't working properly so this took forever.