Hacker News new | ask | show | jobs
by jdq 5464 days ago
I concur, but technically Scrapy is an entire web scraping/crawling framework for writing crawlers, not just XML/HTML parsing like BeautifulSoup or lxml. You don't even have to use Scrapy's built in processor, you can use BeautifulSoup (or whatever else) if you want. What Scrapy gets you is all the logic for crawling of the web pages (requesting pages, reacting to html errors, etc.). You basically just tell it what urls to parse, what to parse from the pages, and what to do with the parsed data. It handles all the rest. I used it Scrapy just recently on an online movie site (shameless plug: www.qwink.com).