Hacker News new | ask | show | jobs
by motoboi 4379 days ago
Some months ago I found https://import.io/ and it just blow my mind.

I remember the pain it was to write custom scrapers every time (I used to do it with Perl, btw).

They have a custom browser with a nice interface, but the biggest thing are the so called "Connectors": you instruct the system into how to query and parse results and Import.IO will give you an API endpoint for this query, now automatized.

One can, say, create a "connector" which can query Airbnb and parse results, then create another "connector" which queries booking.com. Now it is possible to use the API to make a query for Boa Vista, Roraima (my city) and get the dataset.

I am not affiliated with them in any way, just a very happy old-school scrapper.

Nice walkthrough: http://www.youtube.com/watch?v=_16O10Wx2W4

UPDATE:

Unsurprisingly, import.io was Hacker News stuff in the past: https://news.ycombinator.com/item?id=7582858

1 comments

Other browser based screen scrapers that are in the space are 80 legs, kiminolabs, Mozenda and OutWit Hub, I'm sure there are more. Last time I checked, import.io was a fairly lightweight browser wrapper.

I also write web scrapers using Perl and Python, recently have been gravitating towards Python as the code looks more readable. I don't use browser based scrapers because the sites I scrape are usually more complex so it is just easier to write my own code, and they lack functionality and control of the data, and there is the overhead of learning the terminology and how it works.