| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rkangel 1870 days ago

Some combination of the following two features:

Partial clones (https://docs.gitlab.com/ee/topics/git/partial_clone.html)

Shallow clones (see the --depth argument: https://linux.die.net/man/1/git-clone)

The problem with large files is not so much that putting a 1Gb file in Git is a problem. If you just have one revision of it, you get a 1Gb repo, and things run at a reasonable speed. The problem is when you have 10 revisions of the 1Gb file and you end up dealing with 10Gb of data when you only want one, because the default git clone model is to give you the full history of everything since the beginning of time. This is fine for (compressible) text files, less fine for large binary blobs.

Git-lfs is a hack and it has caused me pain every time I've used it, despite Gitlab having good support for it. Some of this is more implementation detail - the command line UI has some wierdness to it, there's no clear error if someone doesn't have git-lfs when cloning and so something in your build process down the line breaks with a weird error because you've got a marker file instead of the expected binary blob. Some of it is inherent though - the hardest problem is that we now can't easily mirror the git repo from our internal gitlab to the client's gitlab because the config has to hold the http server address with the blobs in. We have workarounds but they're not fun.

The solution is to get over the 'always have the whole repository' thing. This is also useful for massive monorepos because you can clone and checkout just the subfolder you need and not all of everything.

I say this, but I haven't yet used partial clones in anger (unlike git-lfs). I have high hopes though, and it's a feature in early days.

1 comments

snovv_crash 1870 days ago

I found using git-lfs only in a subrepo worked well, since subrepos by default are checked out shallow.

link