Hacker News new | ask | show | jobs
by groovyone 6649 days ago
Thanks. Still would like to use Python to be honest (any python suggestions?), but I'll give this a go. Going to do some more research and might post back findings if anyone would be interested in critiquing them. I'm creating this startup from scratch so if there is anyone interested in the crawler side of things I'd be happy to chat either about collaboration or sharing ideas.
1 comments

If you're looking at building your own crawler in Python from scratch, here's a benchmark of SGML parsers:

http://72.14.205.104/search?q=cache:LYoRD1GTP2UJ:www.oluyede...

We've been playing with sgmlop (http://effbot.org/zone/sgmlop-index.htm) for parsing and urllib2 (http://docs.python.org/lib/module-urllib2.html) for fetching.