Hacker News new | ask | show | jobs
by yan 6486 days ago
Does it have to be Python? I'm sure you can use any webcrawler to actually crawl, and use Python to analyze the results.

Nutch (http://lucene.apache.org/nutch/) is a project to create a search engine, with a big crawling component. You can also find a list of crawlers here: http://en.wikipedia.org/wiki/Web_crawler#Open-source_crawler...

1 comments

+1 for Nutch. We're using it at my startup HubSpot and it has worked well for us.