|
|
|
|
|
by nl
4580 days ago
|
|
You don't usually download this data - you process it on AWS to your requirements. Seriously - they give you an easy way to create these subsets yourself[1]. That is a much better solution than them trying to anticipate the exact needs of every potential client. [1] http://commoncrawl.org/get-started/ |
|
There is definitely a benefit in using the community to identify valuable subsets and then individually putting your energy towards building discovery/search products around that subset.