|
|
|
|
|
by jsheard
640 days ago
|
|
The problem is that soft technical measures like HTTP 402 and robots.txt aren't legally binding, so there's nothing stopping scrapers from just ignoring them. Cloudflares value proposition here is they will play the cat-and-mouse game of detecting things like spoofed user agents and residential proxies on your behalf, and actively block what appears to be scraper traffic unless they pay up. Unfortunately this probably means even more CAPTCHAs for people using VPNs and other privacy measures as they ramp up the bot detection heuristics. |
|
> there's nothing stopping scrapers from just ignoring them
Feel free to ignore HTTP errors, but those pages don't contain the content you're looking for
(For the record, I don't use HTTP 402, but I noncommercially host stuff and know what bots people are complaining about.)