| Some months ago I found https://import.io/ and it just blow my mind. I remember the pain it was to write custom scrapers every time (I used to do it with Perl, btw). They have a custom browser with a nice interface, but the biggest thing are the so called "Connectors": you instruct the system into how to query and parse results and Import.IO will give you an API endpoint for this query, now automatized. One can, say, create a "connector" which can query Airbnb and parse results, then create another "connector" which queries booking.com. Now it is possible to use the API to make a query for Boa Vista, Roraima (my city) and get the dataset. I am not affiliated with them in any way, just a very happy old-school scrapper. Nice walkthrough: http://www.youtube.com/watch?v=_16O10Wx2W4 UPDATE: Unsurprisingly, import.io was Hacker News stuff in the past: https://news.ycombinator.com/item?id=7582858 |
I also write web scrapers using Perl and Python, recently have been gravitating towards Python as the code looks more readable. I don't use browser based scrapers because the sites I scrape are usually more complex so it is just easier to write my own code, and they lack functionality and control of the data, and there is the overhead of learning the terminology and how it works.