Hacker News new | ask | show | jobs
by jimmy2times 5232 days ago
I haven't run the crawler so I'm not sure what else it does, but if it only parses the home page and fetches the external links, why not read http://news.ycombinator.com/rss (you can use the feedparser module) and download the pages with urllib? No scraping involved.