|
|
|
|
|
by boyter
4586 days ago
|
|
I love common crawl, but as I commented before I still want to see a subset available for download, something like the top million sites or something like that. Certainly a few steps of data, say 50GB 100GB and 200GB. I really think a subset like this will increase the value as it would allow people writing search engines (for fun or profit) to suck a copy down locally and work away. Its something I would like to do for sure. |
|