Hacker News new | ask | show | jobs
by ttsda 2020 days ago
If your data is reasonably valuable, it will be impossible to deter a minimally invested developer armed with puppeteer (with some modifications), residential proxy services, and captcha solving (way less than a cent per captcha). Most sites that attempt to do it hinder their users more than they do the scrapers.
1 comments

Agreed 100%.

Basically anything that would stop a scraper from downloading or interpreting a single page will also get in the way of search engines and screen readers. The former is generally not desirable and the latter may actually be illegal for some organizations. Just give up if this is your goal.

Rate limiting and other approaches to restrict bulk downloading of your entire dataset are more practical, but are still generally easy to work around if one is sufficiently determined unless you require authentication for all access.