Hacker News new | ask | show | jobs
by glofish 2345 days ago
Ok, I see that there is code provided there. Better than nothing but geez, it is not really what metadata should be like

  def get_labels(rightside):
    met = {}
    met['brain'] = (
        1. * (rightside != 0).sum() / (rightside == 0).sum())
    met['tumor'] = (
        1. * (rightside > 2).sum() / ((rightside != 0).sum() + 1e-10))
    met['has_enough_brain'] = met['brain'] > 0.30
    met['has_tumor'] = met['tumor'] > 0.01
    return met
I will say that it is very handy to know exactly how the labels were computed.

What I really meant is a way to search and select data based on metadata. For example has_tumor.

Also note how everything is still one single blob, to get one line of any of the files, one would need to download everything.

2 comments

Bittorrent does support partial downloads that request only some files or byte ranges out of a torrent. Some of the torrents are just compressed zip's but for the others you could look at the code / documentation to see which files were relevant before downloading 10GB of data.

I think the abstract is sufficient for searching data; expecting some kind of smart database that can handle all the weird formats science uses is a bit much.

There are even torrent clients that export a FUSE VFS so you can use your standard tools.
| one would need to download everything

Just download it then. We got mp3 albums off Napster on modems back in the day, surely getting that torrent is easier and faster today.