Hacker News new | ask | show | jobs
by yxdfasdjkljasdf 3902 days ago
That's just wrong.

You haven't explained how is it wrong and why. None of those thing are "wrong" by itself. It is the malicious use, of any tool, that is unethical.

Is there a website where we can blacklist IP addresses of such violators ?

What exactly do you think is being violated here?

3 comments

CAPTCHA are used to separate real people from automated scripts like here so all of your registered users are real human beings.

If this has automated CAPTCHAs, there will be work into making much more difficult CAPTCHas. I don't know if this is sustainable. I remember 2 different sites where I havent been able to solve their captchas

No offense but ...

First thing, it's just a suggestion. Don't use duplicate account to reply to the comments. Do it from your original account.

I think what the user "chdir" means by wrong is that you're not honoring "robots.txt" and how do you account for ethical scraping ( eg. running 50 concurrent connection on a single website and overloading the website, technically its DoS attack)

Websites use captchas & IP based limits to prevent abuse of their resources & make it harder for copycats to mirror their data. There are often cases where copycats outrank original content in search rankings. (see this example : https://news.ycombinator.com/item?id=10103545 ).

If I were a content owner/producer and I see automated scraping from IP addresses owned by Cloudscrape that violate the ToS, I would sadly treat the entire pool of IPs as violators (even though some might be genuine users who are respecting the limits).

I'd like to know what's a legitimate use case of auto-resolving captchas and IP rotation other than circumventing limits imposed by webmaster.

P.S. Why the throwaway ?

There's already a tool to stop "copycats". It's called copyright (and for inventions, patents). You can and should use that to enforce your rights to your IP. It's not too hard to start issuing DMCA requests, and it's not even that expensive to have a lawyer do it if you're making money. It doesn't matter whether the illegal copy is obtained by a bot or a human.

While I agree that captchas and IP blocks can be employed by target sites, I don't agree that it should be illegal to circumvent them. I also don't agree that it's necessarily unethical (though in some cases, it may be). If you have public information posted on the public web, I don't think you have the right to mandate that it only be accessed by certain tools. You should plan and expect that it will be accessed by every tool capable of doing so.

If something is disrupting your business by "clogging the tubes" or whatever, that's another thing, and they can be held liable for that. But it doesn't matter that they clogged the tubes with one type of program or another; what matters is that the tubes were clogged by their actions, and that's the part that should be focused on in the subsequent legal proceedings. The specific tool or tools used to clog the tubes is at most a tangential curiosity. We don't want to make certain programs illegal.

Maybe we need a new amendment with "the right to bear code". We do not want to get down a rabbit hole where certain programs are legal and certain programs are not (at least not anymore than we already are with the DMCA et al). Down with code control!

(Each line responds to a paragraph in order)

Appeal to fear.

No comment.

No being able to determine a right cause doesn't prove a wrong.

Loaded question.