Hacker News new | ask | show | jobs
by tsergiu 4292 days ago
Crawling already works :)

One of the things that has been heavily marketed by other web scrapers is "crawling" as a separate feature.

With ParseHub, all the tools easily combine, so you don't need that distinction. You can use the navigate tool to jump to another page (see our interactive navigation tutorial in the extension for the details).

And you can combine multiple navigations to go as deep in the website structure as you like. For example, say you have a forum that links to subforums that link to posts that link to users. You can easily model the structure of such a site by using a few navigation nodes (one from forum to its subforums, another from subforum to posts, etc.). The result would be a big json (or csv) dump of all the data on the forum, in the proper hierarchy.

We've really tried to make our tools as general as possible. A side effect of the navigate tool is that you can use it to get "pagination" for free as well (another feature that's been heavily marketed).