Hacker News new | ask | show | jobs
by mxvzr 2928 days ago
Are you able to share some details? How often did you have to get new IP addresses? What about user agent? Were the scapers "straight to the point" like amazon2csv (ie: make a request directly to the search page) or did they have randomized behavior (eg: re-use sessions from time to time; click a random link on the page; start from the homepage...)? Did the scrapers ever went against amz's robots.txt directives (eg: interacting with the cart page)? Ever heard from amz itself about your employer's activities on their site?
1 comments

There are services dedicated to scrapping which can take care of proxy-ing your requests so you don't have to worry about IP bans.

For example, Scrapinghub's Crawlera (the guys behind the Scrapy python lib)