Hacker News new | ask | show | jobs
by genpfault 2626 days ago
> Julie's Webcast - Episode 3, with Russian subtitles

> peers and seeders from both distributions can share data for the shared content

So does IPFS have "plugins" for different archive/container formats so it can "see" that the underlying video/audio streams are identical between "Julie's Webcast - Episode 3.mp4" and "Julie's Webcast - Episode 3, with Russian subtitles.mkv"?

Otherwise container stream interleaving will play holy hell with any sort of "dumb" block hashing :(

1 comments

Last I checked it was dumb. Possibly breaking block boundaries based on a rolling hash.
https://github.com/ipfs/go-ipfs-chunker

> go-ipfs-chunker provides the Splitter interface. IPFS splitters read data from a reader an create "chunks". These chunks are used to build the ipfs DAGs (Merkle Tree) and are the base unit to obtain the sums that ipfs uses to address content.

> The package provides a SizeSplitter which creates chunks of equal size and it is used by default in most cases, and a rabin fingerprint chunker. This chunker will attempt to split data in a way that the resulting blocks are the same when the data has repetitive patterns, thus optimizing the resulting DAGs.

I think they should use the rolling hash based chunking by default

https://github.com/ipfs/go-ipfs-chunker/issues/13