Hacker News new | ask | show | jobs
by konsl 6651 days ago
If you're looking at building your own crawler in Python from scratch, here's a benchmark of SGML parsers:

http://72.14.205.104/search?q=cache:LYoRD1GTP2UJ:www.oluyede...

We've been playing with sgmlop (http://effbot.org/zone/sgmlop-index.htm) for parsing and urllib2 (http://docs.python.org/lib/module-urllib2.html) for fetching.