Hacker News new | ask | show | jobs
by mattcopp 3631 days ago
Great work! Quite annoying actually. I finished my own implementation in Python at about 10pm last night, this would have been most useful. I'm no C# coder, but it's nicely readable, and this is a much better write up than I'm sure I could do.

If anyone who hasn't tried doing this before, the "official" BitTorrent spec docs, namely BEP-3 (http://bittorrent.org/beps/bep_0003.html), seem little more than a vague blog post turned in to a "spec". However, somewhat conversely, this has lead to is a wealth of articles describing how to do it.

The three guides I used were:

- A 2 part blog post which has a bit of a Python bent http://www.kristenwidman.com/blog/33/how-to-write-a-bittorre...

- The unofficial specs https://wiki.theory.org/BitTorrentSpecification, and

- An incomplete Python client https://github.com/JosephSalisbury/python-bittorrent

I didn't know of the RFC mentioned in the post, that would have also been really useful.

A lot of BitTorrent stuff for Python is remarkably hard to find in all the noise of Deluge, the original client, and libtorrent wrappers, but none that existed were sophisticated (or at least well documented) enough for my experiments, they have different focuses.

I never went as far as implementing my own BEncoder library, a billion seem to exist in multiple languages and install any BitTorrent Python library and it seems to come with their own copy. (I suspect due to the way BEncoder was bundled in the original client, see: https://pypi.python.org/pypi/bencode)

I also found a Rust implementation which seems not to compile, but is useful as I'm trying to teach myself Rust https://github.com/kenpratt/rusty_torrent I think the work to get it to compile might be minimal.

2 comments

" this would have been most useful. I'm no C# coder, but it's nicely readable, and I'm sure this is a lot better written up than I could do."

I agree. I don't do C# but mostly can follow it. It also is well-organized presentation of much of a protocol all kinds of people keep re-implementing. They need the help more often than not. A great write-up.

> I also found a Rust implementation which seems not to compile, but is useful as I'm trying to teach myself Rust https://github.com/kenpratt/rusty_torrent I think the work to get it to compile might be minimal.

There is also another project in Rust, it looks more active: https://github.com/GGist/bip-rs It is a collection of libraries.

> If anyone who hasn't tried doing this before, the "official" BitTorrent spec docs, namely BEP-3 (http://bittorrent.org/beps/bep_0003.html), seem little more than a vague blog post turned in to a "spec".

Doesn't look vague at all. What do you think is missing from it?

> There is also another project in Rust, it looks more active: https://github.com/GGist/bip-rs

Thanks, I had seen that one, but forgot about it. I think it's a great project, but it's really just a collection of libraries that don't really tell you how it all fits together, which when I was picking stuff up wasn't very helpful. Hopefully now I have a better understanding of the client design I can make something from that.

> Doesn't look vague at all. What do you think is missing from it?

For a comparison I would recommend reading a few (what I would consider) good protocol docs. Docs that you could read and implement, and probably get working very quickly, for example:

- XMPP's XEPs (one picked for similarity in usage to BitTorrent) https://xmpp.org/extensions/xep-0020.html - Lots of examples in there for what messages should look like, which is always helpful.

- The BitTorrent RFC doc (linked in the original post) http://jonas.nitro.dk/bittorrent/bittorrent-rfc.html - Sums the situation up nicely with the layout of messages and value lengths.

I think the main thing that makes the biggest difference is adhering to a language spec such as RFC 2119 which recommends using "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", etc. which makes it really clear what you're meant to do or not to.

Specifically for the vagueness of BEP-3, how about this example that made me rage on IRC. In the description for the info_hash field in the Tracker section.

    This value will almost certainly have to be escaped.
ALMOST CERTAINLY?? Will it, or won't it? Then, escaped? Escaped how?

What this turned out to mean was that the 20-bit binary sha1 hash MUST be URL encoded, and not hex encoded.

I would love to see someone try to build a BitTorrent client for the first time based solely on this doc.

---

BEP-3 also seems more interested in implementation detail, than describing the protocol. Take the last paragraph (before Copyright) as an example.

Something else which occurred to me today is that BitTorrent is not a spec, it's not been developed, it has evolved. Along with being built in a very modular way, i.e.: DHTs can replace trackers and simply dropped in, magnet URIs can replace Torrent files. This probably contributes it's success and longevity, but what this also means is that there is a lot of stuff, like metainfo, trackers, bencoding, that SHOULD belong in their own spec docs, which form a collective whole.