|
|
|
|
|
by jimmy2times
5232 days ago
|
|
I haven't run the crawler so I'm not sure what else it does, but if it only parses the home page and fetches the external links, why not read http://news.ycombinator.com/rss (you can use the feedparser module) and download the pages with urllib? No scraping involved. |
|