|
|
|
|
|
by troels
3930 days ago
|
|
I have a crawler setup the pulls a few million pages per day. The main constraint is not in the crawler setup, but rather in how much load the subject sites can withstand. If I don't throttle down the traffic, the sites will be dos'ed very quickly. Of course, this is mainly a problem because I crawl a lot of pages from each site - if you have a crawler that crawls a few pages from a lot of sites, you would have a different scenario. |
|