|
|
|
Ask HN: Open source focused crawler?
|
|
6 points
by cookerware
4515 days ago
|
|
Is there an open source crawler/library that will recursively follow only links under a certain xpath and ignore the rest? I don't want to do an exhaustive crawl of every single link, I want something that will only follow links under a main content area. |
|
From their site:
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.