| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yxdfasdjkljasdf 3902 days ago
	That is not how HTTP works; your analogy is not correct. Nobody is taking anything. If you don't want someone to access your page, then don't respond to their request.

2 comments

manigandham 3902 days ago

Since there's no easy way to always reliably identify the requester, this gets complicated.

Most scrapers - including this one - advertise how they use multiple servers/locations/ips/etc to get around this.

link

yxdfasdjkljasdf 3901 days ago

I fail to see a problem you are trying to present.

Even if identification was hard, which is not true because of how HTTP works, it is irrelevant because HTTP doesn't discriminate. If someone does, that is their problem, and should be solved by them, and not a committee or law.

link

manigandham 3901 days ago

> If you don't want someone to access your page, then don't respond to their request

> there's no easy way to always reliably identify the requester

That's the problem: you can't identify the person to block them in the first place.

Robots.txt is actually an explicit signal of intention for reputable search engines but that's all we have today and is easily ignored and does not work with these scrapers or anyone else.

Not sure what your last sentence means.

link

dsjoerg 3902 days ago

At a high enough frequency, scraping is indistinguishable from a DDoS attack. Do you believe DDoS attacks are OK? How do you draw the line?

link

cookiecaper 3901 days ago

DDoS attacks are malicious events that disrupt service. In almost 100% of cases, scrapers don't want to disrupt service, because they need the data they're scraping. They want to be able to continue to get it, so they won't do things that may harm their ability to do that (including presenting honest IPs and user agents).

Services like this one actually make scraper-related unavailability, which IMO is already greatly exaggerated, less likely, since there will be fewer amateurs trying to write their own bots and accidentally breaking things.

To the extent that a scraper harms the other business, the scraping company can be held civilly liable on several accounts without specifically bringing scraping as a practice into the picture. All that matters is that they damaged the target site's ability to operate, not that they were saving [portions of] the pages (that'd be a separate copyright claim, unrelated to the disruption of service).

link

yxdfasdjkljasdf 3901 days ago

There is a clear distinction in the two. You are presenting a straw-man argument.

link

dsjoerg 3901 days ago

You haven't quite laid out your argument so I have to guess what it is.

When you say "That is not how HTTP works" it suggests that your claim is that anything that HTTP allows is ethically OK to do. However that is clearly a ridiculous stance, since a DDoS attack is a stream of valid HTTP requests and that's clearly not OK.

So I'm left wondering what your argument actually is for why unwelcome scraping is ethically OK.

I find this an interesting question, because while I would love for protcols to also define ethics, I feel that would be scope creep for the poor protocol designers. There's a wide variety of conduct and ethics questions that a protocol cannot address.

Where I myself draw the line is at protocol behavior intentionally designed to obscure my intentions. For example, sending my requests from a wide variety of IP addresses is behavior that is specifically designed to obscure where I'm coming from; my only intent in doing so would be to circumvent the intent of the serving machine from providing lots of content to a single requestor. At that point I'm engaging in deceptive behavior; I've crossed an ethical line.

link

yxdfasdjkljasdf 3901 days ago

That wasn't a response made to your comment, and you are mixing two different arguments there. You guess in not correct.

So I'm left wondering what your argument actually is for why unwelcome scraping is ethically OK.

I never even suggested such an argument.

The behavior you described in the last paragraph is only deceptive from the eyes of an information and privacy surveillant state actor. Anonymity is not unethical, it is a human right.

link