Hacker News new | ask | show | jobs
by twblalock 3156 days ago
CDNs like Distil Networks and Cloudflare make scraping more difficult than it used to be. If you get caught by them, you can end up blocked from all of the sites they protect, not just the one you were scraping.
1 comments

Writing some scrapers this week, I noticed it's also common for the origin server to just check if the request is coming from VPN/VPS IP address range.

For example, the exact same request will work from your home connection where it doesn't work from EC2.