| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jsheard 640 days ago
	The problem is that soft technical measures like HTTP 402 and robots.txt aren't legally binding, so there's nothing stopping scrapers from just ignoring them. Cloudflares value proposition here is they will play the cat-and-mouse game of detecting things like spoofed user agents and residential proxies on your behalf, and actively block what appears to be scraper traffic unless they pay up. Unfortunately this probably means even more CAPTCHAs for people using VPNs and other privacy measures as they ramp up the bot detection heuristics.

2 comments

Aachen 640 days ago

Sure it's not legally binding, but if I see >100000 requests coming from 1 IP address within a week, I'm also not legally bound to make that 402 error go away. By having an automated payment mechanism, the two parties could come to an agreement they're both happy about

> there's nothing stopping scrapers from just ignoring them

Feel free to ignore HTTP errors, but those pages don't contain the content you're looking for

(For the record, I don't use HTTP 402, but I noncommercially host stuff and know what bots people are complaining about.)

link

jsheard 640 days ago

I mean it's not legally binding in the sense that if you start sending 402s or 403s to a scraper it can just take that as a signal to try again from a different IP address until it works - your servers clearly stated intent that the bot should pay up or go away isn't legally actionable. With enough effort you can chase the bots until they run out of resources, but few people have time to win that battle by themselves, hence delegating it to Cloudflare or similar.

link

TZubiri 640 days ago

"Unfortunately this probably means even more CAPTCHAs for people using VPNs and other privacy measures as they ramp up the bot detection heuristics"

Yeah. You can't have it both ways. Similar dilemma for requiring identification vs disallowing immigrants.

link