Hacker News new | ask | show | jobs
by marginalia_nu 1489 days ago
If I can solve the logistics of publishing that data, then sure. In its most compressed form it's still of order 100 Gb.

The intermediate goal is to have some standardized testing dataset of a couple of hundred megabytes to a gigabyte or so.

2 comments

Like another commenter suggested, torrents might be a good solution once it's seeded
Cool. Looking forward to see the intermediate dataset.

I think you should post a ToDo list on the git repo. People can then contribute their skills.

Yeah, that's a good idea. I'm looking at a bunch of ideas for reducing the friction to contributing, still a bit of work that needs doing in that area.