Hacker News new | ask | show | jobs
by londons_explore 1676 days ago
I'd like to see a version of this built into things like IPFS.

It seems obvious that whenever something is saved into IPFS, there might be a similar object already stored. If there is, go make a diff, and only store the diff.

1 comments

It should be possible to do this in IPFS already if you use the go-ipfs --chunker option with a content-sensitive chunking algorithm like rabin or buzhash [1]. With this there's a good chance that a file with small changes from something already on IPFS will have some chunks that hash identically, so they'll be shared.

[1] https://en.wikipedia.org/wiki/Rolling_hash#Content-based_sli...

But that isn't quite as good as something like this that can 'understand' diffs in files, rather than simply relying on the fact a bunch of bytes in a row might be the same.
I don't think elfshaker actually does do any binary diffing (e.g. xdelta or bsdiff). It works well because it uses pre-link objects which are built to change as little as possible between versions. Then when it compresses similar files together in a pack, Zstandard can recognize the trivial repeats.
Author here. This is correct, we set out to do binary diffing but we soon discovered that if you put similar enough object files together in a stream, and then compress the stream, zstandard does a fantastic job at compressing and decompressing quickly with a high compression ratio. The existing binary diffing tools can produce small patches, but they are relatively expensive both to compute the delta and to apply the patches.