| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wickedOne 2043 days ago
	what would be a typical large file you want to have in your codebase? usually this shouldn't be a consideration

4 comments

amelius 2043 days ago

In addition to what others posted here, sometimes it is nice to put generated files under version control with the source code that generated them. For example, simulation results, deep learning models, graphs that you need to include in your LaTeX documents (which can be considered partly source code, partly generated content).

Also, deep learning training data often consists of large image files, and can also be considered "source code", and in any case it can be very useful to put these under version control.

And finally it can be useful to put external dependencies as tar-files into your source tree.

link

watwatinthewat 2043 days ago

I appreciate you laying this out because it's something I have struggled with and thought I just didn't know the right way to handle it.

For writing tests in a deep learning code base, rather than simply including a native data file (image, CSV, whatever), I've taken to writing a fake data creator class. It always feels like overkill when an alternative solution is including a native data file or two that already exists.

link

diffeomorphism 2043 days ago

I want to use it for files not just source code. For instance including graphics, the pdfs generated by latex or just anything. Or storing lots of directories. Currently, I use dropbox for that, but if git could do that...

link

wickedOne 2043 days ago

though i agree that there isn't a proper vcs out there for large files (adobe bridge was a nice attempt) git wasn't designed for that and one might wonder whether you want git to be _that_ multi purpose.

link

_ph_ 2043 days ago

If you have a significant project, you want to store at least a reasonable amount of media with it. Images, documentation. Git doesn't necessarily have to be the best system for handling high-gigabyte sized binaries, but should at least deal gracefully, with small and medium sized binary files. I am also not sure, why not more effort was spent making Git support large files even well.

link

amelius 2042 days ago

By the way, if you're storing large files under version control in Git it is often useful to use the "--depth=1" flag when cloning or pulling repositories. That way you only download the stuff you really need and leave the rest of the history on the server until you need it.

link

sofixa 2043 days ago

Git LFS can do that.

link

war1025 2043 days ago

Git LFS is basically the CVS of binary file version control.

It is a pile of garbage, but it's better than nothing.

link

_ph_ 2043 days ago

I recently had to deal with a PowerPoint document which is slightly larger than 50 megabytes, which in todays terms, isn't very much. Before, I had kept it in SVN and that has no issues storing larger files. It is a bit shocking that Git has issues with not so tiny files.

link

xiphias2 2043 days ago

It doesn't matter actually. Tools should be easy to learn and as free of edge cases as possible. An example happening is constant propagation in Rust: it's the same feature, but with every release it can cover more of the code base.

link