Hacker News new | ask | show | jobs
Thinking about 'meta' torrent file format (gist.github.com)
37 points by mattengi 4573 days ago
7 comments

I've actually been thinking about this a bit as well.

I think you can just avoid the torrent file completely and use a merkle tree hash like how new torrent files work and then you end up with just one torrent file per file. And have peer acquisition work through DHT

Directories would be simple and just a matter of creating a new "file" with hashes and names of the contents like how git directories (extending on this you can have a version control system like git).

A noticeable change is that each individual file is uniquely shared. This I believe is both a feature (avoiding duplicate torrents for the same file) as well as means that anyone can see whos downloading a file a solution would be another key hash which causes the dht id to be hashed again to allow individual darknets.

I agree that advertising single file Merkle tree roots on the DHT is a good thing, and that one could nicely build git-like directory structures, but why force the leaves of the tree to be singleton torrent files?

Why not instead advertise individual files on the DHT by their Merkle tree roots, and put the Merkle tree roots in each entry of the "files" section of the torrent file? This doesn't force re-packaging of existing torrents into singleton torrents. Seeders can advertise single files from old torrents and clients with new torrents can take advantage of this advertising.

I disagree with the munged-key darknet idea. If you want a darknet, run it on a non-public DHT, with cryptographic handshakes and encrypted traffic. Cryptographically munging the DHT keys on a public DHT only creates a "light grey net" that's trivially circumvented and provides a false sense of privacy.

> Why not instead advertise individual files on the DHT by their Merkle tree roots, and put the Merkle tree roots in each entry of the "files" section of the torrent file?

You can't do that because torrents aren't file delimited, they are block delimited. You can't check 2 files are the same across torrents without first downloading both torrents.

You misunderstand. This has nothing to do with comparing two torrents. Comparing two torrents solves the wrong problem.

Clients that have downloaded all of the data for a single file (but may or may not have downloaded all of the data for the full torrent) have the data for the file and can calculate the Merkle tree root for that file, and advertise availability on the DHT.

Clients with new style torrent files that included Merkle tree roots in file descriptions would then be able to download those files. This has nothing to do with comparing torrent files.

https://en.wikipedia.org/wiki/Metalink

It's in there somewhere...

Edit: Here's a more relevant use case:

https://wiki.debian.org/Metalink

To see one method that is used to work around this sort of thing: The folks over at http://www.tlmc.eu/ have been expanding the same 1.2TB collection of files for a while, just by stopping the old torrent, running a Python script to patch the changes, and then rechecking and starting the new torrent from the old directory.
but doesn't this mean that all the other peers would need to manually upgrade their copy of the torrent file?
Yes. It works reasonably well in this specific case because it's such a niche thing (you don't download 1.25 TB of Touhou music if you don't really care about Touhou music), but it doesn't benefit from people who continue to seed things they've long forgotten about.
> 1.25 TB of Touhou music

I am going to have nightmares tonight...

Private trackers will say no. Public trackers may welcome this...
There are many types of private tracker that would love to see this - for instance consider gaming trackers, where you may have a single .torrent for a large collection of ROMs, or DLC for a game. Consider TV trackers, tracking a whole TV season with a single .torrent file, or music trackers with discographies.

More importantly, a key concern on private trackers is swarm size - an extension like this would have the potential to expand the available peers on a given file, if the file exists in other swarms on the same tracker. Not a very common use case, but one to consider nonetheless.

Is this basically an append-only torrent file? This could actually be implemented without having to do many changes to the torrent format. You can just have the client de-dupe based on file length + hash.
Couldn't you also hash the root of the hash tree for the new appended data with the root of the hash tree for the old torrent? It would be like a hash chain of hash trees, but pointing backward in time.
Another problem with torrents is compression of files. Compressing a torrent makes it impossible to select only 1 file from a big collection.
This is true. But afaik it's frowned upon and it's not really a big deal in serious communities.
I would think this is a failure of the client, which should support compression formats well enough to be able to fish around inside of the compressed file once it got the metadata portion (zip directory or whatever).

http://en.wikipedia.org/wiki/Zip_%28file_format%29#Design: A directory is placed at the end of a .ZIP file. This identifies what files are in the .ZIP and identifies where in the .ZIP that file is located. This allows .ZIP readers to load the list of files without reading the entire .ZIP archive.

This is a great idea! I wonder why it hasn't already been implemented
I think it is because most torrent client developers are focused on the protocol rather than the end-user experience.

Here is a tool that makes it possible to preview video/audio quality by getting the first and last .rar file: http://techzil.com/play-rar-files-without-extracting-uisng-d...

Perhaps we could make trackers more intelligent and have them combine peer pools, so they create something like a venn diagram of torrents. In addition to telling you which peers are available, it'll tell you what to request from them. You already have all of the file hashes in the torrent anyway, so any wrongdoing here will get discarded.
Unfortunately it's not as simple as that - when asking each other for data, the individual peers ask for a particular 'piece' of the torrent, where that piece isn't relative to a given file, but the torrent as a whole.

The files are concatenated into one long stream, and the piece number is an index to that, with no guarantees about alignment.

For instance, if you have a torrent (we'll call it 'X') with three files: the 4mb file 'a', the 3mb file 'b' and the 1mb file 'c', and two separate torrents ('Y' and 'Z') describing files 'b' and 'c' seperately, then the pieces would map something like this:

'Y' piece 1 -> 'X' piece 17 'Z' piece 1 -> 'X' piece 29

That's an absolute best case scenario though - in most cases, file sizes aren't quite as perfect as that (each being a multiple of the default piece size, 256kb). If 'b' just happened to be 1373kb, or anything else that wasn't a multiple of 256kb, then any files after it aren't addressable from other torrents.

Why not?

You just have at most two blocks of additional overhead.

You would have to have where the file begins and ends within the blocks downloaded, but that's already in the torrent file.

Because the hashes that are stored in the .torrent operate on that unaligned data.

In practice, what this means is that you can't verify that two files of the same name and size but at different alignments within the consolidated data stream are identical; you can't compare hashes, can't do anything without first downloading. This opens the door to mass poisoning of swarms without even having to enter them in the first place.

There are potential solutions (including providing a broader hash per-file, as opposed to per-piece), but my statement was only that it's not that simple, not that it's impossible.

Why do you want to be completely backwards compatible with classic Torrents? Torrent2 can dump some features of classic torrenting, like folder structure, and mandate that each "subtorrenat" is basically a single Torrent1 containing only 1 file and no folder structure.