Hacker News new | ask | show | jobs
by amelius 2002 days ago
In addition to what others posted here, sometimes it is nice to put generated files under version control with the source code that generated them. For example, simulation results, deep learning models, graphs that you need to include in your LaTeX documents (which can be considered partly source code, partly generated content).

Also, deep learning training data often consists of large image files, and can also be considered "source code", and in any case it can be very useful to put these under version control.

And finally it can be useful to put external dependencies as tar-files into your source tree.

1 comments

I appreciate you laying this out because it's something I have struggled with and thought I just didn't know the right way to handle it.

For writing tests in a deep learning code base, rather than simply including a native data file (image, CSV, whatever), I've taken to writing a fake data creator class. It always feels like overkill when an alternative solution is including a native data file or two that already exists.