|
|
|
|
|
by wrfopk
1139 days ago
|
|
Looks like you've just reinvented GVFS (https://github.com/microsoft/VFSForGit) for a specific use case? Or is this just a partial clone? Or a shallow clone? Or both? It's unclear from the video if this is 130 GB of current state at the branch head or 130 GB of commit history. |
|
While you don't need the ten million rows to commit each new thousand rows, they are needed upon merging in order to detect conflicts. Object content referenced by S3 pointers is fetched if and when needed, but the git objects themselves are fetched since they are really small. It is neither partial nor shallow clone strictly, as all the objects and trees are downloaded in the current implementation, but the S3 pointers enable similar delaying and filtering behavior, like with DVC.
Sorry if the repo size is unclear, hope this is better: ~180 kB: current state at the branch head, includes pointers to S3 (exact size depends upon packs and indices), plus full history, also with pointers ~890 MB: current state at the branch head, after downloading all files referenced by pointers in the Git history from S3, plus full history, with pointers ~130 GB: commit history, this is what the repo would weigh on DVC or Git LFS, this repo corresponds to a use case with many small updates
With increasing repo size (even when the 2nd, 890 MB state, increases in size, let alone the fully materialized history), this enables working on the 1st (kBs) and still commit changes.