Hacker News new | ask | show | jobs
by wpietri 4581 days ago
Did you try this?

http://commoncrawl.org/get-started/

I haven't tried that one, but I've poked at other of the Amazon Common Datasets collection:

http://aws.amazon.com/datasets

If you're already familiar with using Amazon's virtual servers, it's pretty straightforward.

I also note that the Common Crawl project publishes code here:

https://github.com/commoncrawl/commoncrawl