Common Crawl is the data set to master if someone wants to use the fruits of web scraping without actually doing the web scraping.