Nutch (http://lucene.apache.org/nutch/) is a project to create a search engine, with a big crawling component. You can also find a list of crawlers here: http://en.wikipedia.org/wiki/Web_crawler#Open-source_crawler...