Hacker News new | ask | show | jobs
by ieee8023 2352 days ago
There is metadata. It is stored in bibtex along with every torrent. This format allows it to be a freeform database where the user can add fields as they want. We (Academic Torrents) can then build new ways to display this metadata. Also the "abstract" part of the metadata is rendered as markdown on the details page of a torrent. Here is a good example: https://academictorrents.com/details/d52ccc21455c7a82fd6e589...
1 comments

Ok, I see that there is code provided there. Better than nothing but geez, it is not really what metadata should be like

  def get_labels(rightside):
    met = {}
    met['brain'] = (
        1. * (rightside != 0).sum() / (rightside == 0).sum())
    met['tumor'] = (
        1. * (rightside > 2).sum() / ((rightside != 0).sum() + 1e-10))
    met['has_enough_brain'] = met['brain'] > 0.30
    met['has_tumor'] = met['tumor'] > 0.01
    return met
I will say that it is very handy to know exactly how the labels were computed.

What I really meant is a way to search and select data based on metadata. For example has_tumor.

Also note how everything is still one single blob, to get one line of any of the files, one would need to download everything.

Bittorrent does support partial downloads that request only some files or byte ranges out of a torrent. Some of the torrents are just compressed zip's but for the others you could look at the code / documentation to see which files were relevant before downloading 10GB of data.

I think the abstract is sufficient for searching data; expecting some kind of smart database that can handle all the weird formats science uses is a bit much.

There are even torrent clients that export a FUSE VFS so you can use your standard tools.
| one would need to download everything

Just download it then. We got mp3 albums off Napster on modems back in the day, surely getting that torrent is easier and faster today.