Hacker News new | ask | show | jobs
by walshemj 3186 days ago
The problem is for big crawls( and 500k is not large) you probably don't want to use your desktop for example my home adsl is only 3.5 as we are 6kyards from the exchange.

And I would not want to get my works 100Mb banned by google. This is where services like deep crawl come in to play I can set up my sites to be crawled at night and look at the reports in the morning.

And another problem I found is desktop crawlers are very resource hungry at one small agency we had two striped down dedicated machines just to run crawls as the risk of causing a crash was to high

1 comments

Yeah for really big crawls your probably better off sticking it on a server or AWS, as much as anything so you don't need to leave your computer on for ages.

But Sitebulb is not resource hungry in the same was as other desktop crawlers. It saves to disk instead of using RAM, so you don't experience the same limitations.

I'm not sure what you mean about Google. There is no link between Sitebulb and Google - it doesn't visit Google at all, so there is no risk of banning. Using it on your 100 Mb work line would be ideal.