Hacker News new | ask | show | jobs
by ChemSpider 1789 days ago
But I would guess that a significant number of your users are doing illegal web scraping (Otherwise, why use proxy rotation). So you advise them how to get around website protections?
4 comments

I'm not a ScrapingBee customer (but I've looked into and was interested in it). A lot of websites have started blocking any datacenter IP, even if you are just scraping it once. In order to get the contents of those sites, you need to use residential proxies.
When is it illegal to scrape a website?
There are bad scrapers out there. Plenty of common problems:

- Denial of Service by queries - they hit search pages with complex or slow queries, diving to 1000th page of results . This kills the db.

- Denial of Service by parallelization - they hit 1000s of pages at once, causing server to run out of memory or other issues . This kills the web.

- Denial of Service by bugs - their code is buggy, slight change to page causes their scrapes to repeat ad nauseum.

- bad URL/cookie scrapes - they hit URLs that perform actions (say add to cart) against websites. This causes sites to track more data in abandoned carts, managing sessions, item popularity.

If scraping wouldn't affect server negatively, then it would feel less illegal.

Let's not forget data mining. People build whole businesses on this and a lot of them are parastic. They are profiting on data that is sometimes costly to obtain.

All the companies that scrape linkedin tied it to other socialmedia and build power profiles on people such that CIA is clapping with smile.

How is it relevant to the topic of "establishing and maintaining relationships with your users" what the users are doing by using the service in their jurisdiction?
It can be relevant, though I don’t know whether GP meant this.

It’s usually much easier to get users for shadier services, because they have a dearth of good options (since not many legitimate businesses are interested in addressing the problem).

So much so that the lessons learned in this space may not be applicable at all to the market as a whole.

I'm not sure if all web scraping that's undesired by the website you're scraping is illegal.