Hacker News new | ask | show | jobs
by jerriep 1220 days ago
I've had a look at a number of these "simple" (i.e ones where I don't have to write a complex script) scraping tools recently and none of them seem to support what I consider to be a fairly common scenario of navigating to sub pages.

In my case I have a landing page (with pagination) with a list of records I want to extract. However, to extract the full information I need for each record, I need to click on each item and navigate to a detail page to extract further info.

Looking at your app and docs you don't seem to support this either. Is this something you are considering?

5 comments

Hi there,

I'm currently working on standard pagination (click next page button) and click button + infinite scroll.

What you comment is not currently possible with a single scraper, you would need to send one to collect links and then scrape those links. But I'm also working on "nesting data" feature, and what you comment should be possible in an ETA 2-3 weeks max.

Thanks for commenting!

I had a nice experience with https://simplescraper.io for a similar use-case. Was able to scrape a few thousand URLs without too much fuss.

The biggest complication with visual scrapers is all the edge cases. The selector algorithms usually become a mess on any complex website especially if there's uneven data.

Then you have css selectors no longer working and so on. Very brittle.

You might want to try https://www.kadoa.com (disclaimer: I'm one of the founders)
for an (unlimited) free local option, https://webscraper.io/ may do what you want. It is simpler than this one (no proxy/scheduling/API...) but the scraping rules are quite elaborate.
I'm the founder of webscraper.io. The paid version includes proxy, scheduling, data export, data parsing, data quality notifications and much more.
Try browserflow.
Ooh yeah, works great, thanks! It's a pity I have to buy a subscription as my needs are more of a once-off.
I 100% recommend browserflow. It's fucking awesome!