Hacker News new | ask | show | jobs
by cdr 5533 days ago
Even better use Scrapy, which is a whole framework designed specifically for scraping and is built on top of libxml2 like lxml.
1 comments

Scrapy is overkill for nearly everything. You'll probably have under a page of code using lxml and urllib.
I have under a page of code with Scrapy for simple projects, and more advanced features when I need them.

That's like saying "jQuery is overkill for just about everything, you should use plain javascript".

No, it's like saying "The full YUI suite is overkill for just about everything, you should just use the core or jQuery".

'scrapy startproject' creates a couple nested directories, with maybe seven files. Are you writing a scraper that you're going to run regularly? Does it need to be super robust and maintainable? Or are you writing something that you'll run once, maybe twice?

I seem to be missing why you think using a framework is a bad thing. With say django or YUI there are performance and abstraction issues that can bite you, but I don't see those mattering for so lightweight a framework and tightly scoped a problem.