Hacker News new | ask | show | jobs
by css 2472 days ago
how do you avoid getting banned by the companies you scrape? Most ToSs have a clause like:

> We prohibit crawling, scraping, caching or otherwise accessing any content on the Service via automated means... [etc]

4 comments

This may now be moot after the LinkedIn Vs hiq labs case a couple of days ago which appears to have blanket legalised we scraping.
hiQ v. LinkedIn means you probably aren't going to jail for scraping LinkedIn's website. It doesn't mean LinkedIn can't IP ban you.
Agreed, some websites are really reticent about scraping. But let's think about Google, they are scraping the whole web regardless of the ToS of the websites, so it all boils down to one question : do you create value for the website owner ? That's why we want to focus on use-cases where we create value for both, our users and the website owner. If you think about Yodlee/Plaid in the banking sector, they built partnerships with the Banks but continued scraping them because most of them didn't provide an API.
Google respects robots.txt.
BYOP (bring your own proxies)