Hacker News new | ask | show | jobs
by Scaevolus 3241 days ago
Right, in BitTorrent v1 the size of the .torrent file is O(number of files) + O(number of bytes), but with this it's just O(number of files) with a higher constant factor.

Piece size is still baked into the file (as piece length), and is used for presence bitsets, which are a crucial part of the swarm algorithm. Clients download the rarest pieces first to boost efficiency, and this information is handled as bitsets shared between clients indicating "I have chunk {1, 2, 3, ... 50, 52, ... }".

Merkle tree roots will only be unique for each piece length. Piece length should still correlate with total size, to prevent huge bitsets-- a 16KB piece length on a 64GB torrent would have a 4 million item / 500KB bitset (!), so it could take 500KB of RAM per connected peer to maintain state-- or maybe compressed bitsets make this problem irrelevant in practice?

1 comments

v1: O(path-depth * number of files + number of bytes)

v2: O(log(path-depth) * number of files)

that is assuming some constantish branching factor in your directory structure

> Merkle tree roots will only be unique for each piece length.

Merkle trees are independent of piece size, which means you can use them to dedup across torrents.

Oh, neat! I missed the part where larger piece sizes correspond to higher layers of the tree.

Presumably clients still reconstruct (and store, somewhere) the full Merkle tree to do incremental validation and support queries.