|
|
|
|
|
by fabiandesimone
3931 days ago
|
|
I'm working on a project that involves lot's of web crawling. I'm not technical at all (I'm hiring freelancers). While I do have access to great general technology related advice, this post is bound to bring people well versed in crawling. My question is: in terms of crawling speed (and I know this is dependent of several factors) what's a decent amount of pages a good crawler could do per day? The crawler I built is doing about 120K pages per day which to our initial needs is not bad at all, but wonder if in the crawling world this is peanuts or a decent chunk of pages? |
|
http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-bil...
http://blog.semantics3.com/how-we-built-our-almost-distribut...
http://engineering.bloomreach.com/crawling-billions-of-pages...
http://engineering.bloomreach.com/crawling-billions-of-pages...