|
|
|
|
|
by GigabyteCoin
4582 days ago
|
|
Can anyone give me a quick rundown on how exactly one gains access to all of this data? I have heard about this project numerous times, and am always dissuaded by the lack of download links/torrents/information on their homepage. Perhaps I just don't know what I'm looking at? |
|
http://commoncrawl.org/get-started/
I haven't tried that one, but I've poked at other of the Amazon Common Datasets collection:
http://aws.amazon.com/datasets
If you're already familiar with using Amazon's virtual servers, it's pretty straightforward.
I also note that the Common Crawl project publishes code here:
https://github.com/commoncrawl/commoncrawl