Hacker News new | ask | show | jobs
by ccleve 4586 days ago
Is there any way to get incrementals? It would be extremely valuable is to get the pages that were added/changed/deleted each day. Some kind of a daily feed of a more limited size.
1 comments

  s3cmd ls s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2013-20/segments/
That should get you about 90% on your way.