Hacker News new | ask | show | jobs
by guluarte 244 days ago
the hardest part of scrapping is bypassing Cloudflare/captchas/fingerprinting etc
2 comments

The hardest part is not telling anyone how you're bypassing it!
I can talk about this bypass because they've fixed it: a site I was scraping rolled their own custom captcha that was just multiple choice. But they didn't have a nonce, so I would just attempt all the choices, and one of them would let me in.
The captcha put you on notice that your scraping wasn't authorized. Depending on the details and circumstances, bypassing it and scraping anyways may have been a crime.
Definitely. What are your thoughts on the CloudFlare agent identity