Hacker News new | ask | show | jobs
by dwpdwpdwpdwpdwp 993 days ago
Source control is all about managing diffs. Large files are fine, binary doesn’t make sense. Most of the time binary file diffs aren’t human readable.

I store binary files outside of git but keep build logs containing binary file CRCs on git

2 comments

> Source control is all about managing diffs. Large files are fine, binary doesn’t make sense

In git, diffs are literally just a UI thing.

That's not really true, is it? Surely Git does have an internal concept of diffing changes, specifically so it knows whether two commits can be merged automatically or if they conflict (because they changed the same lines in the same file).
> That's not really true, is it?

It is.

> Surely Git does have an internal concept of diffing changes

Not in the data model. Packing has deltas, but they're not textual diffs, and they would work fine with binary data... to the extent that the binary data doesn't change too much and the delta-ification algorithms are tuned for that (both of which are doubtful).

> specifically so it knows whether two commits can be merged automatically or if they conflict (because they changed the same lines in the same file).

Conflict generation & resolution is performed on the fly.

Most binary files that people want to store in a VCS are stuff like .psd, .xlsx, .docx, and the like - data that's created by people by hand, but not stored as text.
Xlsx and docx are just zipped up xml text. You can store it as text if you like and I think there are many git modules to handle this. But the xml isn’t really that diffable so I don’t bother.