|
It's like GVFS, but for pieces of a file at a time as well: rows, columns, or cells. A snapshot is recreated by putting those pieces together. If you have ten million rows in one file and only add a thousand rows daily, each commit will only contain those thousand rows in its tree, not the sum total that will then be diffed by your favorite diff tool, but really just the bidirectional diff. It is the low-level materialization of the diff Git paradigm whereby during merges and rebases the 3-way difference between object trees is taken and acted upon, but placing the diffs themselves (data-level diffs on top of file-level ones) under those trees, overriding Git semantics in that Git will now deduplicate the diffs, not the entire original files, in order to recognize them as new objects and commit the new tree. In git, you can see in the video the same file diff being overwritten, representing a new piece in every commit. While you don't need the ten million rows to commit each new thousand rows, they are needed upon merging in order to detect conflicts. Object content referenced by S3 pointers is fetched if and when needed, but the git objects themselves are fetched since they are really small. It is neither partial nor shallow clone strictly, as all the objects and trees are downloaded in the current implementation, but the S3 pointers enable similar delaying and filtering behavior, like with DVC. Sorry if the repo size is unclear, hope this is better:
~180 kB: current state at the branch head, includes pointers to S3 (exact size depends upon packs and indices), plus full history, also with pointers
~890 MB: current state at the branch head, after downloading all files referenced by pointers in the Git history from S3, plus full history, with pointers
~130 GB: commit history, this is what the repo would weigh on DVC or Git LFS, this repo corresponds to a use case with many small updates With increasing repo size (even when the 2nd, 890 MB state, increases in size, let alone the fully materialized history), this enables working on the 1st (kBs) and still commit changes. |