| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jeremybmerrill 4722 days ago

AFAICT, YQL can only handle scraping individual pages that way.

Upton can scrape a whole set of pages. If you have a page that lists the pages you're interested in; suppose you're interested in HN commenters on front page posts, you could specify the front page URL and a selector for links to comment pages, and Upton would automatically scrape those pages and return them to you.

Upton could even write the commenter names to a CSV for you with just a filename and a CSS selector/XPath expression.

It's not stuff you couldn't do with YQL or Python/BeautifulSoup. But it's stuff that I didn't want to have to write over and over each time I wrote a new scraper.

1 comments

hobonumber1 4722 days ago

Makes sense! Thanks for clarifying that.

link