Hacker News new | ask | show | jobs
by zeroclick 1154 days ago
Seems like the easiest thing to do is just start creating large torrent files with data to train on.

Wikipedia already has torrents.. a Usenet archive might be a good addition, maybe some public medical journals, and so on