|
|
|
|
|
by secondtimeuse
3813 days ago
|
|
This was written in 2012, Its even easier these days by using SQS and Cloud Formation. 250 Million is a small number you are better of first going through Common Crawl and then use data from crawls to build a better seed list. Common Crawl now contains repeated crawls conducted every few months and also urls donated by blekko. https://groups.google.com/forum/m/#!msg/common-crawl/zexccXg... |
|