Hacker News new | ask | show | jobs
by manigandham 3902 days ago
Since there's no easy way to always reliably identify the requester, this gets complicated.

Most scrapers - including this one - advertise how they use multiple servers/locations/ips/etc to get around this.

1 comments

I fail to see a problem you are trying to present.

Even if identification was hard, which is not true because of how HTTP works, it is irrelevant because HTTP doesn't discriminate. If someone does, that is their problem, and should be solved by them, and not a committee or law.

> If you don't want someone to access your page, then don't respond to their request

> there's no easy way to always reliably identify the requester

That's the problem: you can't identify the person to block them in the first place.

Robots.txt is actually an explicit signal of intention for reputable search engines but that's all we have today and is easily ignored and does not work with these scrapers or anyone else.

Not sure what your last sentence means.