Hacker News new | ask | show | jobs
by ClumsyPilot 1777 days ago
"most websites prohibit someone from scraping"

This weired mindset where corporations make law shouws up again.

The websites have no power to prohibit anything. If they make bytes avaliable, we may do with them as we please so long as its legal.

2 comments

They can't throw you in jail over it, but it's within their rights to stop sending you these bytes or kick you off their platform altogether.

If they'd try to prevent you from scraping third-party sites it would be making laws; setting up ground rules with their ToS and enforcing them is absolutely fine.

> but it's within their rights to stop sending you these bytes or kick you off their platform altogether.

Actually no.

If a platform provides a generally available service they are (in many countries, idk. about the US) not allowed to arbitrary exclude some people they don't like without a legal valid reason.

And braking legally not valid/binding terms in a ToS is not a legal valid reason. Just because you write something in your ToS doesn't mean it has any legal relevant meaning, there are limits to what you can put in ToS. And limiting (properly done, privacy respecting) research is often not valid. (Through depends a lot on the country.)

The FTC is a US government entity.
Imagine I am scraping Twitter - maybe I never accepted their TOS and don't even have an account.
Interesting that you mentioned Twitter. Twitter requires a user account to access content. Try accessing Twitter while logged out.
It’s within anyone’s rights as website maintainers to block malicious IP addresses that scrape or otherwise within their discretion.

Nobody is legally forcing websites to allow access to everyone, and accordingly, nobody is altering the law by blocking access to people (crawlers, hackers, spammers, malcontents, or anybody really) that they feel are not welcome. So exercising one’s existing rights isn’t an act of making or altering laws.

I suggest reading up on what robots.txt is to further understand this.

Hacking and other malicious behavious are actually illegal.

Either Crawling does not belong on that list, or google exects should be in jail.

Given that crawling is not malicious, what we are discussing now is 'someone is crawling my website in a way I dont like' which is a different gripe.

It mighthave some merit, but robots txt is not legally binding.

As I stated before, Illegal or not, it is within the website owners’ rights to restrict access.