|
Full disclosure, I work for Scrapinghub. Our tools are Scrapy and Portia, both open source and both free as in beer. Scrapy is for those who want fine-tuned manual control and who have a background in Python. Portia is the visual web scraper for those who are non-technical to technical but don't want to bother with code. Web scraping is everywhere, even if it's not necessarily spoken openly about or acknowledged. The publicized perception of web scraping is fairly negative, but doesn't take into account the benefits of data used in machine-learning or democratized data extraction (as in the case of this article or for building public service apps like transportation notifications), or the simple realities of competitive pricing and monitoring the activities of resellers. Researchers, academics, data scientists, marketers, the list goes on for those who use web scraping daily. Glad you enjoyed the article! I'm hoping that more examples of ethical data extraction will start to turn the tide of public perception. |
I completely accept how important scraping is as a data source, but that doesn't make it any more legal. It's in a space right now where only big companies can take unmitigated advantage of the tool, because it'd cost millions of dollars to successfully defend a CFAA suit.