|
|
|
|
|
by korijn
1139 days ago
|
|
LFS stores files by content hash, so deduplication happens that way. But you're right that if you frequently make small changes to a single large file, it's wasteful. In our case though we don't frequently change files, we just get lots and lots of new big files coming in all the time. Moving head, as in, to check out another branch locally? Somewhat regularly I guess. I suppose you're wondering about performance in that scenario? It's usually quite good since git-lfs does some local caching as well. I've never needed to wait longer than a couple of seconds. I'm usually on a wired 1000/1000 Mbit optic fibre connection, and transfers are directly to and from an azure blob storage container (the LFS API server only generates download and upload URLs, it intentionally doesn't transfer any data), with parallel connections and chunking etc, so it doesn't really get any better than that. And all of that is out of the box functionality too. :) |
|
Yes I meant either checking out other branches locally, or in the general case pointing to another branch to indicate to any services to make data from that branch available to wherever it's consumed. I am assuming that each incoming new file is then added to data pipelines, possibly just a few. Sounds like you are in the sweet spot where you have the speed you want and, given unfrequent changes, you are fine with the versions taking up terabytes on Azure, since they are mostly new data.