|
|
|
|
|
by PeterStuer
95 days ago
|
|
So how would you avoid this specific situation as a web-crawler that tries to be well behaved? You strictly adhere to robots.txt as specified by each domain. The problem is not with any of the sites but the density (1000-10000) by which the hoster packed them. If e.g. the crawler had a 1 sec between page governor even if robots.txt had no rate specified, which to be fair is very reasonable, this packing could still lead to high server load. |
|