|
|
|
|
|
by outpan
3566 days ago
|
|
I'm not sure how he manages to crawl with this speed using such low amount of resources. We did a benchmark on Nutch and couldn't really pass the 10-14 M(B)ps on a $1200/month machine. Even though we hired a professional to optimize the setup. The same is roughly true about Heritrix. Just wondering if there is something missing in his setup, such as domain/ip rate limiting. |
|