| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ieee8023 2352 days ago
	There is metadata. It is stored in bibtex along with every torrent. This format allows it to be a freeform database where the user can add fields as they want. We (Academic Torrents) can then build new ways to display this metadata. Also the "abstract" part of the metadata is rendered as markdown on the details page of a torrent. Here is a good example: https://academictorrents.com/details/d52ccc21455c7a82fd6e589...

1 comments

glofish 2352 days ago

Ok, I see that there is code provided there. Better than nothing but geez, it is not really what metadata should be like

  def get_labels(rightside):
    met = {}
    met['brain'] = (
        1. * (rightside != 0).sum() / (rightside == 0).sum())
    met['tumor'] = (
        1. * (rightside > 2).sum() / ((rightside != 0).sum() + 1e-10))
    met['has_enough_brain'] = met['brain'] > 0.30
    met['has_tumor'] = met['tumor'] > 0.01
    return met

I will say that it is very handy to know exactly how the labels were computed.

What I really meant is a way to search and select data based on metadata. For example has_tumor.

Also note how everything is still one single blob, to get one line of any of the files, one would need to download everything.

link

Mathnerd314 2352 days ago

Bittorrent does support partial downloads that request only some files or byte ranges out of a torrent. Some of the torrents are just compressed zip's but for the others you could look at the code / documentation to see which files were relevant before downloading 10GB of data.

I think the abstract is sufficient for searching data; expecting some kind of smart database that can handle all the weird formats science uses is a bit much.

link

rakoo 2351 days ago

There are even torrent clients that export a FUSE VFS so you can use your standard tools.

link

mtone 2352 days ago

| one would need to download everything

Just download it then. We got mp3 albums off Napster on modems back in the day, surely getting that torrent is easier and faster today.

link